Audio Transcription Simple

Tutorial Video

Pipeline

Pipeline Components

🌐 Web Hook

Receives uploaded audio files via HTTP

📄 Data – Parser

Extracts audio content from various file formats

🎵 Audio – Transcribe

Converts audio to text using Whisper models

🧠 Text – Summarization: LLM

Creates intelligent summaries using Gemini

📤 HTTP Results

Returns structured JSON output

How to Use the Pipeline

Start the Pipeline

Log into Aparavi and create a new project
Run your pipeline in the Aparavi Engine
Look for the Webhook URL message in the Project Log
Copy the webhook URL (e.g., http://localhost:8080/webhook/...)

Configure API Testing Tool

We recommend using Talend API Tester for easy testing:

Download: Talend API Tester Extension
Alternative: Postman, curl, or any HTTP client

Request Configuration

Field	Value	Description
Method	`PUT`	HTTP method for file upload
URL	Your webhook URL	The URL from Step 1
Content-Type	`Auto`	Automatically set based on file type
Authorization	Your API key	Found in the webhook URL parameters

Body Configuration

Type: File
Upload Method: Drag & drop or click to browse
Supported Formats: MP3, WAV, M4A, FLAC, MP4, AVI, MOV

Send and Process

Upload your audio file to the request body
Click “Send” to submit the request
Wait for processing (typically 10-60 seconds depending on audio length)
Check response status:

✅ 200 OK
Success

❌ Error codes
Check file format and size

Extract Results

Response Structure

{
  "data": {
    "objects": {
      "cce2fa78-f7fb-5a2e-b391-7c896aeda5cf": {
        "text": "Your processed content here...",
        "summaries": [...],
        "key_points": [...],
        "entities": [...]
      }
    }
  }
}

Extracting Content

Open the response JSON
Navigate to: data/objects/[object-id]/text
Copy the text content – this contains your transcript, summary, key points, and entities
Use a simple script to format the output for better readability

Component Details

1. Web Hook Connector

Purpose: Receives HTTP audio file uploads and triggers pipeline processing

Configuration:

Protocol: webhook://
Class Type: source
Capabilities: noinclude
Register: endpoint

Supported Input Types: tags, text, audio, video, image

2. Data – Parser Connector

Purpose: Extracts structured audio content from uploaded files for downstream processing

Configuration:

Protocol: parse://
Class Type: data
Register: filter

Supported Input/Output:

Input: tags
Output: text, table, image, video, audio

3. Audio – Transcribe Connector

Purpose: Converts audio and video content to text using Whisper models

Configuration:

Protocol: audio_transcribe://
Class Type: audio
Register: filter

Model Options

Model	Speed	Accuracy	Use Case
Tiny	Fastest	Lowest	Quick processing
Base	Fast	Low	General use (Default)
Small	Medium	Medium	Balanced
Medium	Slow	High	Quality focus
Large	Slowest	Highest	Best quality

Advanced Settings

Setting	Default	Description	Effect
Silence Threshold	0.25 seconds	Silence detection threshold	Lower = more sensitive to silence
Minimum Seconds	240 seconds	Minimum audio chunk size	Controls minimum processing batch
Maximum Seconds	300 seconds	Maximum audio buffer size	Controls maximum processing batch
VAD Level	Balanced	Voice Activity Detection	Noise filtering aggressiveness

4. Text – Summarization: LLM Connector

Purpose: Creates intelligent summaries, key points, and entity extraction using Gemini LLM

Configuration:

Protocol: summarization://
Class Type: text
Register: filter
Invoke: Requires LLM connection

Configuration Options

Setting	Description	Default
Number of Summaries	Chunks to summarize after document split	Optional
Summary Words	Words per summary (0 = disabled)	Optional
Key Point Words	Words per key point (0 = disabled)	Optional
Entities	Number of entities to extract (0 = disabled)	Optional

5. HTTP Results Connector

Purpose: Returns structured JSON responses with processed content

Configuration:

Protocol: response://
Class Type: infrastructure
Register: filter

Lane Configuration: Text, Audio, Video, Image, Documents, Questions, Answers

Supported Audio Formats

🎵 Audio Files

MP3 (.mp3) – Most common, good compression
WAV (.wav) – Uncompressed, high quality
M4A (.m4a) – Apple format, good quality
FLAC (.flac) – Lossless compression
OGG (.ogg) – Open source format

🎬 Video Files with Audio

MP4 (.mp4) – Common video format
AVI (.avi) – Windows video format
MOV (.mov) – Apple video format
MKV (.mkv) – Open source video format

📏 File Size Recommendations

Optimal: 1-50 MB
Maximum: 100 MB
Processing Time: 10-60 seconds per file

Output Format

Structured Response

The pipeline outputs four types of content:

1. Transcript

The full text transcription of the audio

2. Summary

Condensed version of the content

3. Key Points

Bullet-point summary of main topics

4. Entities

Named entities mentioned in the content

Example Output

{
  "data": {
    "objects": {
      "cce2fa78-f7fb-5a2e-b391-7c896aeda5cf": {
        "text": "TRANSCRIPT:\nThis is the full transcript of the audio content...\n\nSUMMARY:\nThis audio discusses the main topics covered in the content...\n\nKEY POINTS:\n• First key point about the content\n• Second key point about the content\n• Third key point about the content\n\nENTITIES:\n• Entity 1: Description\n• Entity 2: Description"
      }
    }
  }
}

Error Handling

Common HTTP Status Codes

Code	Meaning	Solution
200	Success	✅ Processing completed
400	Bad Request	Check file format and size
401	Unauthorized	Verify API key
404	Not Found	Check webhook URL
500	Server Error	Restart pipeline

Troubleshooting Tips

File Size: Ensure files are under 100MB
Format: Use supported audio/video formats
API Key: Verify authorization header
Pipeline: Ensure all components are running
LLM Connection: Check LLM configuration and API key
Network: Check connectivity to webhook endpoint

Performance Considerations

Processing Times

Audio Length	File Size	Estimated Time
1-5 minutes	1-10 MB	10-30 seconds
5-15 minutes	10-30 MB	30-60 seconds
15-30 minutes	30-50 MB	1-2 minutes
30+ minutes	50-100 MB	2-5 minutes

Optimization Tips

Audio Quality: Use clear, high-quality audio files
Format Selection: Prefer MP3 or WAV for best compatibility
Size Management: Keep files under 50MB for optimal performance
Batch Timing: Allow 30-60 seconds between requests
Model Selection: Use “Base” model for speed, “Large” for accuracy

Security and Authentication

API Key Management

Location: Found in webhook URL parameters
Format: Long alphanumeric string
Security: Keep private and secure

Request Validation

The pipeline validates:

File format compatibility
File size limits
API key authenticity
Request method (PUT only)