Back to API Docs
POSThttps://app.alternapdf.com/api/v1/convert/pdf-to-text

PDF to Text

Extract text from PDF documents with multiple extraction modes. Automatically detects scanned documents and applies OCR when needed. Supports plain text, positional blocks, word-level, and structured JSON output.

Content-Type: multipart/form-data

Parameters

ParameterTypeDefaultDescription
file requiredfileThe PDF file to extract text from.
extraction_modestringtextExtraction mode. Options: text, blocks, words, json.
preserve_layoutbooleantrueAttempt to preserve the original spatial layout of text on each page.
include_metadatabooleanfalseInclude PDF metadata (author, title, creation date, etc.) in the response.
page_separatorstring\n\n---\n\nString inserted between pages in the extracted text output.
enable_ocrbooleantrueAutomatically apply OCR to scanned or image-based pages.
ocr_enginestringautoOCR engine to use. Options: tesseract, openai, auto.

Code Examples

cURL
curl -X POST "https://app.alternapdf.com/api/v1/convert/pdf-to-text" \
  -H "X-API-Key: YOUR_API_KEY" \
  -F "file=@document.pdf" \
  -F "extraction_mode=text" \
  -F "preserve_layout=true" \
  -F "enable_ocr=true" \
  -F "ocr_engine=auto"
Python
import requests

url = "https://app.alternapdf.com/api/v1/convert/pdf-to-text"
headers = {"X-API-Key": "YOUR_API_KEY"}

with open("document.pdf", "rb") as f:
    files = {"file": ("document.pdf", f, "application/pdf")}
    data = {
        "extraction_mode": "text",
        "preserve_layout": "true",
        "enable_ocr": "true",
        "ocr_engine": "auto",
    }
    response = requests.post(url, headers=headers, files=files, data=data)

result = response.json()
print(result["text"])
JavaScript
const formData = new FormData();
formData.append("file", fs.createReadStream("document.pdf"));
formData.append("extraction_mode", "text");
formData.append("preserve_layout", "true");
formData.append("enable_ocr", "true");
formData.append("ocr_engine", "auto");

const response = await fetch("https://app.alternapdf.com/api/v1/convert/pdf-to-text", {
  method: "POST",
  headers: {
    "X-API-Key": "YOUR_API_KEY",
  },
  body: formData,
});

const result = await response.json();
console.log(result.text);

Response

Returns a JSON object containing the extracted text and processing details.

JSON Response
{
  "status": "success",
  "filename": "document.pdf",
  "extraction_mode": "text",
  "text": "Extracted text content from the PDF document...\n\n---\n\nPage 2 content...",
  "text_length": 4523,
  "ocr_used": true,
  "ocr_engine": "tesseract",
  "ocr_confidence": 0.94,
  "total_pages": 5,
  "processing_time_seconds": 2.31
}
FieldTypeDescription
statusstringProcessing status. Always success on 200.
filenamestringName of the uploaded file.
extraction_modestringThe extraction mode that was used.
textstringThe extracted text content.
text_lengthintegerCharacter count of the extracted text.
ocr_usedbooleanWhether OCR was applied to any pages.
ocr_enginestring?OCR engine used, if OCR was applied.
ocr_confidencenumber?OCR confidence score (0-1), if OCR was applied.
total_pagesinteger?Number of pages processed.
processing_time_secondsnumber?Total processing time in seconds.