NVLM 1.0: Open Frontier-Class Multimodal LLMs achieving state-of-the-art results on vision-language tasks, rivaling GPT-4o, Llama 3-V 405B, and InternVL 2. Powerful, open-source, and ready for your next project.
NVLM 1.0 is a family of frontier-class multimodal large language models (LLMs) that achieve state-of-the-art results on vision-language tasks, rivaling the leading proprietary models (e.g., GPT-4o) and open-access models (e.g., Llama 3-V 405B and InternVL 2). Remarkably, after multimodal training, NVLM 1.0 shows improved accuracy on text-only tasks over its LLM backbone. We are open-sourcing the model weights and training code in Megatron-Core for the community.
NVLM 1.0 is used by researchers and developers interested in building multimodal applications.