SOTA 109B model with 17B active params & large context, excelling at multi-document analysis, codebase reasoning, and personalized tasks.
Chat
Vision
AiTradeStore AI offers day 1 support for the new Llama 4 multilingual vision models that can analyze multiple images and respond to queries about them.
Register for a AiTradeStore AI account to get an API key. New accounts come with free credits to start. Install the AiTradeStore AI library for your preferred language.
API Usage
Endpoint
meta-llama/Llama-4-Scout-17B-16E-Instruct
RUN INFERENCE
curl -X POST https://aitradestore.com/api/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer $AITRADESTORE_API_KEY" -d '{ "model":"meta-llama/Llama-4-Scout-17B-16E-Instruct", "messages":[{
"role":"user",
"content":[
{"type":"image_url","image_url":{"url":"https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png"}},
{"type":"text","text":"Describe this image."}
] }] }'
JSON RESPONSE
RUN INFERENCE
from aitradestore import AiTradeStore client = AiTradeStore()
response = client.chat.completions.create(
model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {"url": "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png"}
},
{
"type": "text",
"text": "Describe this image."
}
]
}
] )
print(response.choices[0].message.content)
JSON RESPONSE
RUN INFERENCE
import AiTradeStore from "ai-tradestore"; const aitradestore = new AiTradeStore(); const response = await aitradestore.chat.completions.create({
model: "meta-llama/Llama-4-Scout-17B-16E-Instruct",
messages: [{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {"url": "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png"}
},
{
"type": "text",
"text": "Describe this image."
}
]
}] });
console.log(response.choices[0].message.content);
JSON RESPONSE
Model Provider:
Meta
Type:
Chat
Variant:
Parameters:
109B
Deployment:
✔️ Serverless ✔️ Dedicated
Quantization
Context length:
1M
Pricing:
Input: $0.18 / Output: $0.59
How to use Llama 4 Scout
Input
from aitradestore import AiTradeStore client = AiTradeStore()
# API key via api_key param or AITRADESTORE_API_KEY env var # Query image with Llama 4 Maverick model response = client.chat.completions.create(
model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What can you see in this image?"},
{"type": "image_url", "image_url": {"url": "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png"}}
]
}] )
print(response.choices[0].message.content)
Output
The image depicts a serene landscape of Yosemite National Park, featuring a river flowing through a valley surrounded by towering cliffs and lush greenery. *
**River:**
*
The river is calm and peaceful, with clear water that reflects the surrounding scenery.
*
It flows gently from the bottom-left corner to the center-right of the image.
*
The riverbank is lined with rocks and grasses, adding to the natural beauty of the scene. *
**Cliffs:**
*
The cliffs are massive and imposing, rising steeply from the valley floor.
*
They are composed of light-colored rock, possibly granite, and feature vertical striations.
*
The cliffs are covered in trees and shrubs, which adds to their rugged charm. *
**Trees and Vegetation:**
*
The valley is densely forested, with tall trees growing along the riverbanks and on the cliffsides.
*
The trees are a mix of evergreen and deciduous species, with some displaying vibrant green foliage.
*
Grasses and shrubs grow in the foreground, adding texture and color to the scene. *
**Sky:**
*
The sky is a brilliant blue, with only a few white clouds scattered across it.
*
The sun appears to be shining from the right side of the image, casting a warm glow over the scene.
In summary, the image presents a breathtaking view of Yosemite National Park, showcasing the natural beauty of the valley and its surroundings. The calm river, towering cliffs, and lush vegetation all contribute to a sense of serenity and wonder.
Function Calling
Input
import os import json import openai client = openai.OpenAI(
base_url = "https://aitradestore.com/api/v1",
api_key = os.environ['AITRADESTORE_API_KEY'], )
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
},
"unit": {
"type": "string",
"enum": [
"celsius",
"fahrenheit"
]
}
}
}
}
} ]
messages = [
{"role": "system", "content": "You are a helpful assistant that can access external functions. The responses from these function calls will be appended to this dialogue. Please provide responses based on the information from these function calls."},
{"role": "user", "content": "What is the current temperature of New York, San Francisco and Chicago?"} ]
response = client.chat.completions.create(
model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
messages=messages,
tools=tools,
tool_choice="auto", )
print(json.dumps(response.choices[0].message.model_dump()['tool_calls'], indent=2))
Output
[
{
"id": "call_1p75qwks0etzfy1g6noxvsgs",
"function": {
"arguments": "{"location":"New York, NY","unit":"fahrenheit"}",
"name": "get_current_weather"
},
"type": "function"
},
{
"id": "call_aqjfgn65d0c280fjd3pbzpc6",
"function": {
"arguments": "{"location":"San Francisco, CA","unit":"fahrenheit"}",
"name": "get_current_weather"
},
"type": "function"
},
{
"id": "call_rsg8muko8hymb4brkycu3dm5",
"function": {
"arguments": "{"location":"Chicago, IL","unit":"fahrenheit"}",
"name": "get_current_weather"
},
"type": "function"
} ]
Query models with multiple images
Currently this model supports 5 images as input.
Input
# Multi-modal message with multiple images response = client.chat.completions.create(
model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
messages=[{
"role": "user",
"content": [
{
"type": "text",
"text": "Compare these two images."
},
{
"type": "image_url",
"image_url": {
"url": "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png"
}
},
{
"type": "image_url",
"image_url": {
"url": "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/slack.png"
}
}
]
}] ) print(response.choices[0].message.content)
Output
The first image is a collage of multiple identical landscape photos showing a natural scene with rocks, trees, and a stream under a blue sky. The second image is a screenshot of a mobile app interface, specifically the navigation menu of the Canva app, which includes icons for Home, DMs (Direct Messages), Activity, Later, Canvases, and More. ### Comparison: 1. **Content**:
- The first image focuses on a natural landscape.
- The second image shows a digital interface from an app.
2. **Purpose**:
- The first image could be used for showcasing nature, design elements in graphic work, or as a background.
- The second image represents the functionality and layout of the Canva app's navigation system.
3. **Visual Style**:
- The first image has vibrant colors and realistic textures typical of outdoor photography.
- The second image uses flat design icons with a simple color palette suited for user interface design.
4. **Context**:
- The first image is likely intended for artistic or environmental contexts.
- The second image is relevant to digital design and app usability discussions.
Model details
- Model String: meta-llama/Llama-4-Scout-17B-16E-Instruct
- Specs:
- 17B active parameters (109B total)
- 16-expert MoE architecture
- 327,680 context length (will be increased to 10M)
- Support for 12 languages: Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese
- Multimodal capabilities (text + images)
- Support Function Calling
- Best for: Multi-document analysis, codebase reasoning, and personalized tasks
- Knowledge Cutoff: August 2024
Prompting Llama 4 Scout
Applications & Use Cases
- Multi-document summarization for legal/financial analysis: Analyze multiple legal contracts or financial statements simultaneously, identifying key terms, inconsistencies, and patterns across documents to generate comprehensive summaries and risk assessments.
- Personalized task automation using years of user data: Create tailored automation workflows by analyzing an individual's historical data patterns, communication style, and preferences, enabling highly personalized digital assistants that adapt to specific user needs.
- Efficient image parsing for multimodal applications: Process and understand image content in conjunction with text to power applications like visual search, content moderation, and accessibility features that require understanding the relationship between visual and textual elements.