yorickvp/llava-13b

Visual instruction tuning towards large language and vision models with GPT-4 level capabilities

$0.0005 / request
GPU: H100

Pricing

yorickvp/

llava-13b

Pricing for Synexa AI models works differently from other providers. Instead of being billed by time, you are billed by input and output, making pricing more predictable.

Output
$0.0005 / text
or
2000 texts / $1

For example, generating 100 texts should cost around $0.05.

Check out our docs for more information about how per-request pricing works on Synexa.

ProviderPrice ($)Saving (%)
Synexa$0.0005-
replicate$0.001050.0%

Readme

yorickvp/llava-13b Synexa Example

Visual instruction tuning towards large language and vision models with GPT-4 level capabilities.

[Project Page] [Demo] [Data] [Model Zoo]

Improved Baselines with Visual Instruction Tuning

[Paper]
Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee

Visual Instruction Tuning (NeurIPS 2023, Oral)

[Paper]
Haotian Liu*, Chunyuan Li*, Qingyang Wu, Yong Jae Lee (*Equal Contribution)

Summary

LLaVA represents a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking spirits of the multimodal GPT-4 and setting a new state-of-the-art accuracy on Science QA.