Back to API Docs
Simple OCR (Tesseract)
Fast text extraction from scanned PDFs and images using Tesseract. Good baseline accuracy for clean, printed documents. No additional cost beyond standard API credits.
POST
https://app.alternapdf.com/api/v1/ocr/extract/simpleContent-Type: multipart/form-data
Documents over 20 pages are automatically processed asynchronously. Append ?async=true to force async processing for any request.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
file | file | Yes | PDF or image file (PNG, JPEG, WebP) |
output | string | No | simple or detailed. Default: simple |
enhance | boolean | No | Enable AI post-processing to clean up OCR output. Default: false |
Code Examples
cURL
curl -X POST "https://app.alternapdf.com/api/v1/ocr/extract/simple" \
-H "X-API-Key: YOUR_API_KEY" \
-F "file=@scanned-document.pdf" \
-F "output=simple" \
-F "enhance=false"Python
import requests
url = "https://app.alternapdf.com/api/v1/ocr/extract/simple"
headers = {"X-API-Key": "YOUR_API_KEY"}
with open("scanned-document.pdf", "rb") as f:
files = {"file": ("scanned-document.pdf", f, "application/pdf")}
data = {"output": "simple", "enhance": "false"}
response = requests.post(url, headers=headers, files=files, data=data)
result = response.json()
print(result["data"]["text"])
print(f"Pages: {result['data']['pages']}")
print(f"Confidence: {result['data']['confidence']}")JavaScript
const fs = require("fs");
const FormData = require("form-data");
const form = new FormData();
form.append("file", fs.createReadStream("scanned-document.pdf"));
form.append("output", "simple");
form.append("enhance", "false");
const response = await fetch("https://app.alternapdf.com/api/v1/ocr/extract/simple", {
method: "POST",
headers: {
"X-API-Key": "YOUR_API_KEY",
...form.getHeaders(),
},
body: form,
});
const result = await response.json();
console.log(result.data.text);
console.log("Pages:", result.data.pages);
console.log("Confidence:", result.data.confidence);Response
Returns extracted text with page count, confidence score, and word count.
JSON Response (simple output)
{
"success": true,
"data": {
"text": "Invoice #12345\nDate: 2024-01-15\n\nBill To:\nAcme Corporation\n123 Main Street\nNew York, NY 10001\n\nItems:\n- Widget A: $50.00\n- Widget B: $75.00\n\nTotal: $125.00",
"pages": 1,
"confidence": 0.92,
"word_count": 24
},
"metadata": {
"engine": "simple",
"processing_time_ms": 1250,
"filename": "scanned-document.pdf"
}
}Response Fields
| Field | Type | Description |
|---|---|---|
data.text | string | Extracted text from all pages |
data.pages | integer | Number of pages processed |
data.confidence | float | OCR confidence score (0.0 to 1.0) |
data.word_count | integer | Total number of words extracted |
metadata.engine | string | Always "simple" for this endpoint |
metadata.processing_time_ms | integer | Processing time in milliseconds |
metadata.filename | string | Original uploaded filename |