Back to API Docs

Vision OCR (Claude AI)

The highest accuracy OCR engine, powered by Claude AI vision models. Handles handwriting, complex multi-column layouts, forms, tables, and degraded scans that traditional OCR engines struggle with. Supports text, Markdown, and JSON output formats.

POSThttps://app.alternapdf.com/api/v1/ocr/extract/vision
~$0.01/page (Sonnet)~$0.002/page (Haiku)

Content-Type: multipart/form-data

Documents over 3 pages are automatically processed asynchronously. Append ?async=true to force async processing for any request.

Parameters

ParameterTypeRequiredDescription
filefileYesPDF or image file (PNG, JPEG, WebP)
outputstringNosimple or detailed. Default: simple
formatstringNoOutput format: text, markdown, or json. Default: text
modelstringNoclaude-sonnet-4-20250514 (higher accuracy) or claude-haiku-4-20250514 (faster, cheaper). Default: claude-sonnet-4-20250514

Code Examples

cURL
curl -X POST "https://app.alternapdf.com/api/v1/ocr/extract/vision" \
  -H "X-API-Key: YOUR_API_KEY" \
  -F "file=@handwritten-form.pdf" \
  -F "format=markdown" \
  -F "model=claude-sonnet-4-20250514"
cURL (Haiku - faster, cheaper)
curl -X POST "https://app.alternapdf.com/api/v1/ocr/extract/vision" \
  -H "X-API-Key: YOUR_API_KEY" \
  -F "file=@receipt.png" \
  -F "format=json" \
  -F "model=claude-haiku-4-20250514"
Python
import requests

url = "https://app.alternapdf.com/api/v1/ocr/extract/vision"
headers = {"X-API-Key": "YOUR_API_KEY"}

with open("handwritten-form.pdf", "rb") as f:
    files = {"file": ("handwritten-form.pdf", f, "application/pdf")}
    data = {
        "format": "markdown",
        "model": "claude-sonnet-4-20250514",
    }
    response = requests.post(url, headers=headers, files=files, data=data)

result = response.json()
print(result["data"]["text"])
print(f"Pages: {result['data']['pages']}")
print(f"Words: {result['data']['word_count']}")
JavaScript
const fs = require("fs");
const FormData = require("form-data");

const form = new FormData();
form.append("file", fs.createReadStream("handwritten-form.pdf"));
form.append("format", "markdown");
form.append("model", "claude-sonnet-4-20250514");

const response = await fetch("https://app.alternapdf.com/api/v1/ocr/extract/vision", {
  method: "POST",
  headers: {
    "X-API-Key": "YOUR_API_KEY",
    ...form.getHeaders(),
  },
  body: form,
});

const result = await response.json();
console.log(result.data.text);
console.log("Pages:", result.data.pages);
console.log("Words:", result.data.word_count);

Response

The confidence field is always null for vision OCR since Claude AI does not provide a numeric confidence score.

JSON Response
{
  "success": true,
  "data": {
    "text": "# Patient Intake Form\n\n**Name:** Dr. Jane Smith\n**Date:** January 15, 2024\n**DOB:** 03/22/1985\n\n## Medical History\n\n| Condition | Year Diagnosed | Current Medication |\n|-----------|---------------|-------------------|\n| Hypertension | 2019 | Lisinopril 10mg |\n| Asthma | 2005 | Albuterol PRN |\n\n## Notes\n\nPatient reports occasional shortness of breath during exercise.\nNo known drug allergies.",
    "pages": 1,
    "confidence": null,
    "word_count": 42
  },
  "metadata": {
    "engine": "vision",
    "processing_time_ms": 4850,
    "filename": "handwritten-form.pdf"
  }
}

Response Fields

FieldTypeDescription
data.textstringExtracted text in the requested format (text, markdown, or json)
data.pagesintegerNumber of pages processed
data.confidencenullAlways null for vision OCR (no numeric confidence from AI models)
data.word_countintegerTotal number of words extracted
metadata.enginestringAlways "vision" for this endpoint
metadata.processing_time_msintegerProcessing time in milliseconds

OCR Engine Comparison

Choose the right OCR engine for your use case. All three engines accept the same file types (PDF, PNG, JPEG, WebP) and return the same response structure.

FeatureSimple (Tesseract)Advanced (DocTR)Vision (Claude AI)
SpeedFastest (~1-2s/page)Medium (~3-5s/page)Slowest (~4-8s/page)
AccuracyGood for clean printsBetter for varied layoutsBest overall accuracy
CostStandard API creditsStandard API credits~$0.01/page (Sonnet), ~$0.002/page (Haiku)
HandwritingPoorLimitedExcellent
Bounding boxesNoYes (word-level)No
Output formatsText onlyText onlyText, Markdown, JSON
Sync page limit20 pages10 pages3 pages
Best forHigh-volume, clean printed docsStructured docs needing position dataHandwriting, complex layouts, forms, tables