Back to API Docs
Vision OCR (Claude AI)
The highest accuracy OCR engine, powered by Claude AI vision models. Handles handwriting, complex multi-column layouts, forms, tables, and degraded scans that traditional OCR engines struggle with. Supports text, Markdown, and JSON output formats.
POST
https://app.alternapdf.com/api/v1/ocr/extract/vision~$0.01/page (Sonnet)~$0.002/page (Haiku)
Content-Type: multipart/form-data
Documents over 3 pages are automatically processed asynchronously. Append ?async=true to force async processing for any request.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
file | file | Yes | PDF or image file (PNG, JPEG, WebP) |
output | string | No | simple or detailed. Default: simple |
format | string | No | Output format: text, markdown, or json. Default: text |
model | string | No | claude-sonnet-4-20250514 (higher accuracy) or claude-haiku-4-20250514 (faster, cheaper). Default: claude-sonnet-4-20250514 |
Code Examples
cURL
curl -X POST "https://app.alternapdf.com/api/v1/ocr/extract/vision" \
-H "X-API-Key: YOUR_API_KEY" \
-F "file=@handwritten-form.pdf" \
-F "format=markdown" \
-F "model=claude-sonnet-4-20250514"cURL (Haiku - faster, cheaper)
curl -X POST "https://app.alternapdf.com/api/v1/ocr/extract/vision" \
-H "X-API-Key: YOUR_API_KEY" \
-F "file=@receipt.png" \
-F "format=json" \
-F "model=claude-haiku-4-20250514"Python
import requests
url = "https://app.alternapdf.com/api/v1/ocr/extract/vision"
headers = {"X-API-Key": "YOUR_API_KEY"}
with open("handwritten-form.pdf", "rb") as f:
files = {"file": ("handwritten-form.pdf", f, "application/pdf")}
data = {
"format": "markdown",
"model": "claude-sonnet-4-20250514",
}
response = requests.post(url, headers=headers, files=files, data=data)
result = response.json()
print(result["data"]["text"])
print(f"Pages: {result['data']['pages']}")
print(f"Words: {result['data']['word_count']}")JavaScript
const fs = require("fs");
const FormData = require("form-data");
const form = new FormData();
form.append("file", fs.createReadStream("handwritten-form.pdf"));
form.append("format", "markdown");
form.append("model", "claude-sonnet-4-20250514");
const response = await fetch("https://app.alternapdf.com/api/v1/ocr/extract/vision", {
method: "POST",
headers: {
"X-API-Key": "YOUR_API_KEY",
...form.getHeaders(),
},
body: form,
});
const result = await response.json();
console.log(result.data.text);
console.log("Pages:", result.data.pages);
console.log("Words:", result.data.word_count);Response
The confidence field is always null for vision OCR since Claude AI does not provide a numeric confidence score.
JSON Response
{
"success": true,
"data": {
"text": "# Patient Intake Form\n\n**Name:** Dr. Jane Smith\n**Date:** January 15, 2024\n**DOB:** 03/22/1985\n\n## Medical History\n\n| Condition | Year Diagnosed | Current Medication |\n|-----------|---------------|-------------------|\n| Hypertension | 2019 | Lisinopril 10mg |\n| Asthma | 2005 | Albuterol PRN |\n\n## Notes\n\nPatient reports occasional shortness of breath during exercise.\nNo known drug allergies.",
"pages": 1,
"confidence": null,
"word_count": 42
},
"metadata": {
"engine": "vision",
"processing_time_ms": 4850,
"filename": "handwritten-form.pdf"
}
}Response Fields
| Field | Type | Description |
|---|---|---|
data.text | string | Extracted text in the requested format (text, markdown, or json) |
data.pages | integer | Number of pages processed |
data.confidence | null | Always null for vision OCR (no numeric confidence from AI models) |
data.word_count | integer | Total number of words extracted |
metadata.engine | string | Always "vision" for this endpoint |
metadata.processing_time_ms | integer | Processing time in milliseconds |
OCR Engine Comparison
Choose the right OCR engine for your use case. All three engines accept the same file types (PDF, PNG, JPEG, WebP) and return the same response structure.
| Feature | Simple (Tesseract) | Advanced (DocTR) | Vision (Claude AI) |
|---|---|---|---|
| Speed | Fastest (~1-2s/page) | Medium (~3-5s/page) | Slowest (~4-8s/page) |
| Accuracy | Good for clean prints | Better for varied layouts | Best overall accuracy |
| Cost | Standard API credits | Standard API credits | ~$0.01/page (Sonnet), ~$0.002/page (Haiku) |
| Handwriting | Poor | Limited | Excellent |
| Bounding boxes | No | Yes (word-level) | No |
| Output formats | Text only | Text only | Text, Markdown, JSON |
| Sync page limit | 20 pages | 10 pages | 3 pages |
| Best for | High-volume, clean printed docs | Structured docs needing position data | Handwriting, complex layouts, forms, tables |