OCR

Optical character recognition to extract text from documents and images

Document source

                

Introduction

The AlphaEdge OCR API extracts usable text from files whose format is accepted by the endpoint (see Supported file formats). The service targets strong recognition accuracy and response times suited to production integrations.

This document describes the request and response contract (multipart, image field), the canonical list of input extensions, the returned JSON structure, and integration best practices.

Base URL, host, and documentation

Use the gateway’s public base URL, e.g. https://api-endpoints.alphaedge-ai.com. Do not call the gateway by raw IP when a public hostname is required (otherwise 403). This documentation is hosted at https://api-docs.alphaedge-ai.com/; the gateway does not expose interactive Swagger / OpenAPI online (OpenAPI disabled on the server).

OCR model slugs

The URL must use a registered slug: alpha-digit-max or alpha-digit-medium (kebab-case). GET /models returns model_slug and type for each entry. An unknown slug yields a 404 error.

Quick start

Minimal example: multipart POST request with an X-API-Key header and an image file field.

Basic example

python
import requests

url = "https://api-endpoints.alphaedge-ai.com/models/alpha-digit-max/ocr"
headers = {"X-API-Key": "TA_CLE"}

with open("/chemin/image.png", "rb") as f:
    files = {"image": ("image.png", f, "image/png")}
    r = requests.post(url, headers=headers, files=files, timeout=300)

print(r.status_code)
print(r.json())
bash
curl https://api-endpoints.alphaedge-ai.com/models/alpha-digit-max/ocr \
  -H "X-API-Key: TA_CLE" \
  -F "image=@/chemin/image.png"
javascript
import fs from "node:fs";

const form = new FormData();
form.append("image", new Blob([fs.readFileSync("/chemin/image.png")]), "image.png");

const res = await fetch("https://api-endpoints.alphaedge-ai.com/models/alpha-digit-max/ocr", {
  method: "POST",
  headers: { "X-API-Key": "TA_CLE" },
  body: form
});

console.log(res.status, await res.json());

API parameters

Parameters for the POST /models/{model_slug}/ocr endpoint.

The body must be sent as multipart/form-data. The file field must be named exactly image. The model identifier is passed only in the URL path; a model field in the form or using file instead of image is not accepted and returns HTTP 422 (Unprocessable Entity).

Do not set the Content-Type header manually for this request: with multipart, the HTTP library must add multipart/form-data and the boundary (otherwise you risk errors or an unreadable body).

PARAMETER TYPE REQUIRED DEFAULT DESCRIPTION
image File Yes - Image or PDF file to analyze (multipart). Exact field name: image.
pdf_password string No - Optional. Password for a protected PDF.

Any other field name is rejected; do not send a model field in the form (the model is only in the URL path).

Supported file formats

Admission is based on file extension: only the values below are recognized by the decoding pipeline for this endpoint. Any other extension must be treated as unsupported.

Native office formats (for example DOCX, XLSX, PPTX) and databases are not ingested by this service. Convert them beforehand to PDF or to an image in an accepted format if you need to extract text via OCR.

allowed_extensions
.apng
.avif
.avifs
.blp
.bmp
.bufr
.bw
.cur
.dcx
.dds
.dib
.emf
.eps
.fit
.fits
.flc
.fli
.ftc
.ftu
.gbr
.gif
.grib
.h5
.hdf
.icb
.icns
.ico
.iim
.im
.j2c
.j2k
.jfif
.jp2
.jpc
.jpe
.jpeg
.jpf
.jpg
.jpx
.mpeg
.mpg
.mpo
.msp
.palm
.pbm
.pcd
.pcx
.pdf
.pfm
.pgm
.png
.pnm
.ppm
.ps
.psd
.pxr
.qoi
.ras
.rgb
.rgba
.sgi
.tga
.tif
.tiff
.vda
.vst
.webp
.wmf
.xbm
.xpm

Input quality and recommended formats

  • Scanning and photos: prefer PNG, TIFF, or JPEG in high definition (target ≥ 300 ppi).
  • Production: PNG, JPEG, WEBP, TIFF, and PDF generally offer the best balance between interoperability and OCR quality. Other listed extensions may be decoded by the pipeline without guaranteeing the same relevance for every use case.
  • Reduce destructive compression and overly low resolutions to limit recognition errors.

Response format

HTTP 200 response: JSON (OcrResponse schema). global_confidence and image_filename may be null; the internal gateway_wall_ms field is not returned to the client. Illustrative example:

json
{
  "model_slug": "alpha-digit-max",
  "text": "26 rue Honore de Balzac",
  "inference_seconds": 0.055,
  "global_confidence": 0.82,
  "words": [
    {
      "w": "26",
      "confidence": 1
    },
    {
      "w": "rue",
      "confidence": 0.97
    },
    {
      "w": "Honore",
      "confidence": 0.47
    },
    {
      "w": "de",
      "confidence": 0.99
    },
    {
      "w": "Balzac",
      "confidence": 0.99
    }
  ],
  "image_filename": "test_manuscrit7.png"
}

Confidence scores

The OCR response includes a global confidence score and a per-word score. These values are between 0 and 1: the closer the score is to 1, the more confident the model is in the recognized text.

  • global_confidence - average confidence over the extracted text (or null).
  • words[].confidence - confidence associated with each detected word, useful for identifying areas that should be checked manually.

Advanced examples

Extraction with structured format

Extract structured data from a form:

python
import requests

url = "https://api-endpoints.alphaedge-ai.com/models/alpha-digit-max/ocr"
headers = {"X-API-Key": "TA_CLE"}

with open("/chemin/image.png", "rb") as f:
    files = {"image": ("image.png", f, "image/png")}
    r = requests.post(url, headers=headers, files=files, timeout=300)

print(r.status_code)
print(r.json())

Error handling

Here is how to handle errors properly:

python
import requests

url = "https://api-endpoints.alphaedge-ai.com/models/alpha-digit-max/ocr"
headers = {"X-API-Key": "TA_CLE"}

with open("/chemin/image.png", "rb") as f:
    files = {"image": ("image.png", f, "image/png")}
    r = requests.post(url, headers=headers, files=files, timeout=300)

print(r.status_code)
print(r.json())
javascript
import fs from "node:fs";

const form = new FormData();
form.append("image", new Blob([fs.readFileSync("/chemin/image.png")]), "image.png");

const res = await fetch("https://api-endpoints.alphaedge-ai.com/models/alpha-digit-max/ocr", {
  method: "POST",
  headers: { "X-API-Key": "TA_CLE" },
  body: form
});

console.log(res.status, await res.json());

Use cases

Here are some common use cases for the OCR API:

1. Document digitization

Convert paper documents to digital text for archiving and search.

2. Form data extraction

Automatically extract information from scanned forms (invoices, contracts, etc.).

3. Text recognition in images

Extract text from images, screenshots or document photos.

Limitations and best practices

Limitations

  • Maximum size — 25 MB per file (documented value; a different limit may apply depending on your plan or environment).
  • File types — Restricted to the extensions listed under Supported file formats.
  • Rate limiting — Quotas and throttling according to your plan and the service policy.

Integration best practices

  • Aim for sufficient resolution for scans and photos (e.g. ≥ 300 ppi).
  • Keep strong contrast between text and background on scanned documents.
  • Handle HTTP status codes and errors returned in the JSON response body explicitly.
  • Implement retries with exponential backoff for transient errors (e.g. 5xx responses or overload).
  • Cache results when the source document is unchanged to reduce latency and cost.
  • Monitor usage and quotas in line with your agreement.

Available models

To view all available OCR models with their detailed specifications, visit the Our models and filter by type.

Useful HTTP status codes

Short reference for integration (typical codes returned by the gateway):

HTTP status Typical case
401Missing or invalid X-API-Key.
403Forbidden host (access by IP or wrong domain).
404Unknown model or resource (invalid OCR slug).
422Invalid multipart (missing file field, forbidden field name, etc.).
503Service unavailable or starting up (e.g. GET /status).
500Internal error — often a generic message on the client.

For a detailed list of error codes, see also Error codes.