Skip to content

Vision

Models that support vision can accept images alongside text in messages[].content. Provide one or more image_url parts, optionally with a detail hint, and the model will describe, read, or reason about the visual content.

Endpoint: POST https://api.aifoundryhub.com/v1/chat/completions


Terminal window
curl -X POST "https://api.aifoundryhub.com/v1/chat/completions" \
-H "Authorization: Bearer $AI_FOUNDRY_HUB_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5",
"max_tokens": 200,
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "Describe this room in two sentences." },
{ "type": "image_url", "image_url": { "url": "https://images.example.com/living-room.jpg", "detail": "auto" } }
]
}
]
}'

Tip: detail: "high" is best for small text (fine print). Use detail: "low" when speed/cost matter more than precision.


Pass several image_url parts in a single message to compare or reason across images.

Terminal window
curl -X POST "https://api.aifoundryhub.com/v1/chat/completions" \
-H "Authorization: Bearer $AI_FOUNDRY_HUB_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5",
"messages": [{
"role": "user",
"content": [
{"type":"text","text":"Are these outfits the same color palette? Explain differences."},
{"type":"image_url","image_url":{"url":"https://images.example.com/look1.jpg"}},
{"type":"image_url","image_url":{"url":"https://images.example.com/look2.jpg"}}
]
}]
}'

If you cannot host the image, send a Data URL as the image_url.url.

import fs from "node:fs";
const b64 = fs.readFileSync("./chart.png", "base64");
const dataUrl = `data:image/png;base64,${b64}`;
const out = await client.chat.completions.create({
model: "gpt-5",
messages: [{
role: "user",
content: [
{ type: "text", text: "Summarize the main trend in this chart." },
{ type: "image_url", image_url: { url: dataUrl, detail: "auto" } },
]
}],
max_tokens: 200,
});

A chat.completion object.

The same return shape as regular chat completions. Use message.content[0].text to read the model’s textual answer.