Audio & Transcription

Real-time audio transcription and audio file analysis

chef amond 1989 est un système cbr qui réalise des recettes de cuisine.

Introduction

The AlphaEdge Audio & Transcription API lets you transcribe audio files to text. This feature is optimized for high performance and accuracy.

This page guides you through using the Audio & Transcription API, from the basics to advanced use cases.

Quick start

Here is a minimal example to get started with the Audio & Transcription API:

Basic example

python
import requests

url = "https://api-endpoints.alphaedge-ai.com/models/alpha-audio-v1/transcript"
headers = {"X-API-Key": "TA_CLE"}

with open("/chemin/audio.wav", "rb") as f:
    files = {"audio": ("audio.wav", f, "audio/wav")}
    data = {
        "enable_diarization": "true",
        "enable_postcorrect": "true",
    }
    r = requests.post(url, headers=headers, files=files, data=data, timeout=300)

print(r.status_code)
print(r.json())
bash
curl https://api-endpoints.alphaedge-ai.com/models/alpha-audio-v1/transcript \
  -H "X-API-Key: TA_CLE" \
  -F "file=@audio.mp3" \
  -F "model=alphaedge-audio-3"
javascript
import fs from "node:fs";

const form = new FormData();
form.append("audio", new Blob([fs.readFileSync("/chemin/audio.wav")]), "audio.wav");
form.append("enable_diarization", "true");
form.append("enable_postcorrect", "true");

const res = await fetch("https://api-endpoints.alphaedge-ai.com/models/alpha-audio-v1/transcript", {
  method: "POST",
  headers: { "X-API-Key": "TA_CLE" },
  body: form
});

console.log(res.status, await res.json());

API parameters

Here are the available parameters for the Audio & Transcription API:

PARAMETER TYPE REQUIRED DEFAULT DESCRIPTION
model string Yes - Le nom du modèle à utiliser (ex: alphaedge-audio-3)
file File Yes - Le fichier audio à transcrire
enable_diarization boolean No false Active la diarisation des locuteurs.
enable_postcorrection boolean No false Active la post-correction de la transcription.

Supported file formats

The AlphaEdge Audio & Transcription API supports a wide variety of audio formats for transcription. Here is the full list of supported formats:

Compressed audio formats

  • MP3 (.mp3) - Most common format, lossy compression
  • AAC (.aac, .m4a) - Apple format, good quality at low bitrate
  • OGG Vorbis (.ogg) - Open source format, efficient compression
  • OPUS (.opus) - Voice-optimized format, excellent for calls
  • WMA (.wma) - Windows Media Audio

Uncompressed audio formats

  • WAV (.wav) - Uncompressed PCM format, maximum quality
  • FLAC (.flac) - Lossless compression, high quality
  • AIFF (.aiff, .aif) - Uncompressed Apple format

Streaming audio formats

  • WebM Audio (.webm) - Modern web format
  • M4A (.m4a) - Apple container format

Technical specifications

  • Sampling rate: 8 kHz to 48 kHz (recommended: 16 kHz or 44.1 kHz)
  • Bit depth: 16 bit or 24 bit
  • Channels: Mono, stereo, or multi-channel (auto-converted to mono)
  • Durée maximale : 25 minutes par fichier
  • Maximum size: 25 MB per file

Video formats (audio extraction)

The API can also extract and transcribe audio from video files:

  • MP4 (.mp4) - Video with audio track
  • AVI (.avi) - Video container format
  • MOV (.mov) - QuickTime format
  • MKV (.mkv) - Open source container format
  • WebM (.webm) - Web video format

Recommendations

  • For voice: MP3 at 128 kbps or WAV 16 kHz mono offer a good quality/size trade-off
  • For music with vocals: WAV or FLAC to preserve quality
  • For phone calls: OPUS or MP3 at 64 kbps mono
  • Avoid very low quality audio files (< 16 kHz) for best results
  • Pour les fichiers longs (> 25 min), divisez-les en segments

Response format

L'API Audio & Transcription retourne une réponse au format JSON. Voici un exemple de structure de réponse :

json
{
  "id": "req_abc123",
  "object": "audio.response",
  "created": 1677652288,
  "model": "alphaedge-audio-3",
  "text": "Le texte transcrit depuis l'audio...",
  "usage": {
    "total_tokens": 60
  }
}

Advanced examples

Transcription avec timestamps

Obtenez une transcription avec des timestamps pour chaque segment :

python
import requests

url = "https://api-endpoints.alphaedge-ai.com/models/alpha-audio-v1/transcript"
headers = {"X-API-Key": "TA_CLE"}

with open("/chemin/audio.wav", "rb") as f:
    files = {"audio": ("audio.wav", f, "audio/wav")}
    data = {
        "enable_diarization": "true",
        "enable_postcorrect": "true",
    }
    r = requests.post(url, headers=headers, files=files, data=data, timeout=300)

print(r.status_code)
print(r.json())

Error handling

Here is how to handle errors properly:

python
import requests

url = "https://api-endpoints.alphaedge-ai.com/models/alpha-audio-v1/transcript"
headers = {"X-API-Key": "TA_CLE"}

with open("/chemin/audio.wav", "rb") as f:
    files = {"audio": ("audio.wav", f, "audio/wav")}
    data = {
        "enable_diarization": "true",
        "enable_postcorrect": "true",
    }
    r = requests.post(url, headers=headers, files=files, data=data, timeout=300)

print(r.status_code)
print(r.json())
javascript
import fs from "node:fs";

const form = new FormData();
form.append("audio", new Blob([fs.readFileSync("/chemin/audio.wav")]), "audio.wav");
form.append("enable_diarization", "true");
form.append("enable_postcorrect", "true");

const res = await fetch("https://api-endpoints.alphaedge-ai.com/models/alpha-audio-v1/transcript", {
  method: "POST",
  headers: { "X-API-Key": "TA_CLE" },
  body: form
});

console.log(res.status, await res.json());

Use cases

Here are some common use cases for the Audio & Transcription API:

1. Meeting transcription

Automatically transcribe meetings for archiving and search.

2. Video subtitling

Generate automatic subtitles for your video content.

3. Podcast transcription

Create transcriptions to improve accessibility and SEO.

Limitations and best practices

Limitations

  • File size : Files must not exceed 25 MB
  • Supported formats : MP3, WAV, M4A, FLAC, AAC, OGG, OPUS, WMA, AIFF, WebM, et formats vidéo (MP4, AVI, MOV, MKV, WebM)
  • Durée maximale : 25 minutes par fichier
  • Rate limiting : 60 requêtes par minute par défaut (peut être augmenté selon votre plan)
  • Tokens : 4096 token limit for combined prompts and responses

Best practices

  • Use good quality audio files (minimum 16 kHz) for best results
  • Pour les fichiers longs, divisez-les en segments de 25 minutes maximum
  • Handle errors properly with try/except blocks
  • Implement a retry mechanism to handle temporary errors
  • Cache results when possible to reduce costs
  • Monitor your usage to avoid exceeding your limits

Available models

To view all available audio & transcription models with their detailed specifications, visit the Our models and filter by type.