---
name: x402stt.dtelecom.org
description: x402stt.dtelecom.org provides on-chain paid speech-to-text sessions via dTelecom's infrastructure. An agent pays 0.025 USDC on Base (EIP-155:8453) to receive a timed WebSocket session credential for real-time audio transcription streaming.
host: x402stt.dtelecom.org
---

# x402stt.dtelecom.org

This host serves agents that need real-time speech-to-text transcription and can fulfill micropayments in USDC on the Base network. It issues a session key and WebSocket URL per paid session, enabling streaming audio-to-text for a specified duration and language. It is a single-purpose credential-vending endpoint; the actual transcription occurs over the returned WebSocket connection, not through additional API calls on this host.

## When to use this host

Use this host when an agent needs to obtain a real-time streaming speech-to-text session credential and can pay exactly 0.025 USDC on Base (EIP-155:8453). Do not use it for batch or file-based transcription, text-to-speech synthesis, or text translation — those capabilities are not available here. If the agent cannot execute an on-chain USDC payment prior to receiving the session, this host will not work. For non-streaming or offline transcription needs, look for REST-based STT APIs that do not require blockchain payment. For TTS or translation, route to a dedicated synthesis or translation host.

## Capabilities

### STT Session Provisioning

Creates a timed, authenticated WebSocket session for real-time speech-to-text transcription, gated by an on-chain USDC payment on Base.

- **`initiate-paid-stt-session`** — Creates a paid speech-to-text session on dTelecom, returning a session key and WebSocket URL for streaming audio transcription, billed via on-chain USDC payment.

## Skill reference

### `initiate-paid-stt-session`

**Onchain Paid STT Session** — Creates a paid speech-to-text session on dTelecom, returning a session key and WebSocket URL for streaming audio transcription, billed via on-chain USDC payment.

*Use when:* Use when an agent needs to start a real-time speech-to-text streaming session and has the ability to pay 0.025 USDC on Base (EIP-155:8453) in exchange for a timed WebSocket session credential.

*Not for:* Do not use for text translation, text-to-speech synthesis, or non-streaming batch transcription. Not suitable if the agent cannot fulfill an exact on-chain USDC payment before the session is granted.

**Inputs:**

- `minutes` (integer, required) — Duration of the session in minutes. Controls how long the session remains active (e.g., 5 minutes = 300 seconds).
- `language` (string, required) — BCP-47 language code for the speech recognition model to use during the session.

**Returns:** Returns a session_id, JWT session_key, ws_url for WebSocket streaming, remaining_seconds (300), minutes (5), and price_usd (0.025000).

**Example:** `{"minutes": 5, "language": "en"}`

---
