POSThttps://app.alternapdf.com/api/v1/convert/pdf-to-text

PDF to Text

Extract text from PDF documents with multiple extraction modes. Automatically detects scanned documents and applies OCR when needed. Supports plain text, positional blocks, word-level, and structured JSON output.

Content-Type: multipart/form-data

Parameters

Parameter	Type	Default	Description
`file` required	file	—	The PDF file to extract text from.
`extraction_mode`	string	`text`	Extraction mode. Options: `text`, `blocks`, `words`, `json`.
`preserve_layout`	boolean	`true`	Attempt to preserve the original spatial layout of text on each page.
`include_metadata`	boolean	`false`	Include PDF metadata (author, title, creation date, etc.) in the response.
`page_separator`	string	`\n\n---\n\n`	String inserted between pages in the extracted text output.
`enable_ocr`	boolean	`true`	Automatically apply OCR to scanned or image-based pages.
`ocr_engine`	string	`auto`	OCR engine to use. Options: `tesseract`, `openai`, `auto`.

Code Examples

cURL

curl -X POST "https://app.alternapdf.com/api/v1/convert/pdf-to-text" \
  -H "X-API-Key: YOUR_API_KEY" \
  -F "file=@document.pdf" \
  -F "extraction_mode=text" \
  -F "preserve_layout=true" \
  -F "enable_ocr=true" \
  -F "ocr_engine=auto"

Python

import requests

url = "https://app.alternapdf.com/api/v1/convert/pdf-to-text"
headers = {"X-API-Key": "YOUR_API_KEY"}

with open("document.pdf", "rb") as f:
    files = {"file": ("document.pdf", f, "application/pdf")}
    data = {
        "extraction_mode": "text",
        "preserve_layout": "true",
        "enable_ocr": "true",
        "ocr_engine": "auto",
    }
    response = requests.post(url, headers=headers, files=files, data=data)

result = response.json()
print(result["text"])

JavaScript

const formData = new FormData();
formData.append("file", fs.createReadStream("document.pdf"));
formData.append("extraction_mode", "text");
formData.append("preserve_layout", "true");
formData.append("enable_ocr", "true");
formData.append("ocr_engine", "auto");

const response = await fetch("https://app.alternapdf.com/api/v1/convert/pdf-to-text", {
  method: "POST",
  headers: {
    "X-API-Key": "YOUR_API_KEY",
  },
  body: formData,
});

const result = await response.json();
console.log(result.text);

Response

Returns a JSON object containing the extracted text and processing details.

JSON Response

{
  "status": "success",
  "filename": "document.pdf",
  "extraction_mode": "text",
  "text": "Extracted text content from the PDF document...\n\n---\n\nPage 2 content...",
  "text_length": 4523,
  "ocr_used": true,
  "ocr_engine": "tesseract",
  "ocr_confidence": 0.94,
  "total_pages": 5,
  "processing_time_seconds": 2.31
}

Field	Type	Description
`status`	string	Processing status. Always `success` on 200.
`filename`	string	Name of the uploaded file.
`extraction_mode`	string	The extraction mode that was used.
`text`	string	The extracted text content.
`text_length`	integer	Character count of the extracted text.
`ocr_used`	boolean	Whether OCR was applied to any pages.
`ocr_engine`	string?	OCR engine used, if OCR was applied.
`ocr_confidence`	number?	OCR confidence score (0-1), if OCR was applied.
`total_pages`	integer?	Number of pages processed.
`processing_time_seconds`	number?	Total processing time in seconds.