yorickvp/llava-13b
Visual instruction tuning towards large language and vision models with GPT-4 level capabilities
Pricing
yorickvp/
llava-13b
Pricing for Synexa AI models works differently from other providers. Instead of being billed by time, you are billed by input and output, making pricing more predictable.
For example, generating 100 texts should cost around $0.05.
Check out our docs for more information about how per-request pricing works on Synexa.
Provider | Price ($) | Saving (%) |
---|---|---|
Synexa | $0.0005 | - |
replicate | $0.0010 | 50.0% |
Readme

Visual instruction tuning towards large language and vision models with GPT-4 level capabilities.
[Project Page] [Demo] [Data] [Model Zoo]
Improved Baselines with Visual Instruction Tuning
[Paper]
Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee
Visual Instruction Tuning (NeurIPS 2023, Oral)
[Paper]
Haotian Liu*, Chunyuan Li*, Qingyang Wu, Yong Jae Lee (*Equal Contribution)
Summary
LLaVA represents a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking spirits of the multimodal GPT-4 and setting a new state-of-the-art accuracy on Science QA.