NVLM 1.0

NVLM 1.0Open-source, frontier-class multimodal LLMs for state-of-the-art vision-language tasks.

NVLM 1.0: Open Frontier-Class Multimodal LLMs achieving state-of-the-art results on vision-language tasks, rivaling GPT-4o, Llama 3-V 405B, and InternVL 2. Powerful, open-source, and ready for your next project.

NVLM 1.0 screenshot

NVLM 1.0

NVLM 1.0 is a family of frontier-class multimodal large language models (LLMs) that achieve state-of-the-art results on vision-language tasks, rivaling the leading proprietary models (e.g., GPT-4o) and open-access models (e.g., Llama 3-V 405B and InternVL 2). Remarkably, after multimodal training, NVLM 1.0 shows improved accuracy on text-only tasks over its LLM backbone. We are open-sourcing the model weights and training code in Megatron-Core for the community.

Product Highlights

  • Feature 1: Achieves state-of-the-art results on vision-language tasks.
  • Feature 2: Improved accuracy on text-only tasks.
  • Feature 3: Open-sourced.

Use Cases

  • Use case 1: NVLM 1.0 is used to answer questions related to images and text.
  • Use case 2: NVLM 1.0 is used to generate descriptive text for images.
  • Use case 3: NVLM 1.0 is used to analyze text and images and perform logical reasoning.

Target Audience

NVLM 1.0 is used by researchers and developers interested in building multimodal applications.

Weekly Top 10 Products