Back to API Docs
POST
https://app.alternapdf.com/api/v1/convert/pdf-to-textPDF to Text
Extract text from PDF documents with multiple extraction modes. Automatically detects scanned documents and applies OCR when needed. Supports plain text, positional blocks, word-level, and structured JSON output.
Content-Type: multipart/form-data
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
file required | file | — | The PDF file to extract text from. |
extraction_mode | string | text | Extraction mode. Options: text, blocks, words, json. |
preserve_layout | boolean | true | Attempt to preserve the original spatial layout of text on each page. |
include_metadata | boolean | false | Include PDF metadata (author, title, creation date, etc.) in the response. |
page_separator | string | \n\n---\n\n | String inserted between pages in the extracted text output. |
enable_ocr | boolean | true | Automatically apply OCR to scanned or image-based pages. |
ocr_engine | string | auto | OCR engine to use. Options: tesseract, openai, auto. |
Code Examples
cURL
curl -X POST "https://app.alternapdf.com/api/v1/convert/pdf-to-text" \
-H "X-API-Key: YOUR_API_KEY" \
-F "file=@document.pdf" \
-F "extraction_mode=text" \
-F "preserve_layout=true" \
-F "enable_ocr=true" \
-F "ocr_engine=auto"Python
import requests
url = "https://app.alternapdf.com/api/v1/convert/pdf-to-text"
headers = {"X-API-Key": "YOUR_API_KEY"}
with open("document.pdf", "rb") as f:
files = {"file": ("document.pdf", f, "application/pdf")}
data = {
"extraction_mode": "text",
"preserve_layout": "true",
"enable_ocr": "true",
"ocr_engine": "auto",
}
response = requests.post(url, headers=headers, files=files, data=data)
result = response.json()
print(result["text"])JavaScript
const formData = new FormData();
formData.append("file", fs.createReadStream("document.pdf"));
formData.append("extraction_mode", "text");
formData.append("preserve_layout", "true");
formData.append("enable_ocr", "true");
formData.append("ocr_engine", "auto");
const response = await fetch("https://app.alternapdf.com/api/v1/convert/pdf-to-text", {
method: "POST",
headers: {
"X-API-Key": "YOUR_API_KEY",
},
body: formData,
});
const result = await response.json();
console.log(result.text);Response
Returns a JSON object containing the extracted text and processing details.
JSON Response
{
"status": "success",
"filename": "document.pdf",
"extraction_mode": "text",
"text": "Extracted text content from the PDF document...\n\n---\n\nPage 2 content...",
"text_length": 4523,
"ocr_used": true,
"ocr_engine": "tesseract",
"ocr_confidence": 0.94,
"total_pages": 5,
"processing_time_seconds": 2.31
}| Field | Type | Description |
|---|---|---|
status | string | Processing status. Always success on 200. |
filename | string | Name of the uploaded file. |
extraction_mode | string | The extraction mode that was used. |
text | string | The extracted text content. |
text_length | integer | Character count of the extracted text. |
ocr_used | boolean | Whether OCR was applied to any pages. |
ocr_engine | string? | OCR engine used, if OCR was applied. |
ocr_confidence | number? | OCR confidence score (0-1), if OCR was applied. |
total_pages | integer? | Number of pages processed. |
processing_time_seconds | number? | Total processing time in seconds. |