Vision » TESLA OPTIMUS HUMANOID LLC

Multimodal LLM optimized for visual recognition, image reasoning, captioning, and answering image-related questions.

Multimodal LLM optimized for visual recognition, image reasoning, captioning, and answering image-related questions.

Lightweight Gemma 3 model with 128K context, vision-language input, and multilingual support for on-device AI.

Lightweight Gemma 3 model (1B) with 128K context, vision-language input, and multilingual support for on-device AI.

Most lightweight Gemma 3 model, with 128K context, vision-language input, and multilingual support for on-device AI.

OSS vision model merging advanced vision with instruction-tuned language understanding for visual reasoning.

Free endpoint to test this auto-regressive language model that uses an optimized transformer architecture.

Vision-language model with advanced visual reasoning, video understanding, structured outputs, and agentic capabilities.

Lightweight model with vision-language input, multilingual support, visual reasoning, and top-tier performance per size.

SOTA 109B model with 17B active params & large context, excelling at multi-document analysis, codebase reasoning, and personalized tasks.

SOTA 128-expert MoE powerhouse for multilingual image/text understanding, creative writing, and enterprise-scale applications.