NAME

OpenAPI::Client::OpenAI::Path::realtime-transcription_sessions - Documentation for the /realtime/transcription_sessions path.

OPERATIONS

POST /realtime/transcription_sessions

create-realtime-transcription-session

$client->create_realtime_transcription_session({
    body => { ... },
});

Create an ephemeral API token for use in client-side applications with the Realtime API specifically for realtime transcriptions. Can be configured with the same session parameters as the transcription_session.update client event.

It responds with a session object, plus a client_secret key which contains a usable ephemeral API token that can be used to authenticate browser clients for the Realtime API.

Returns the created Realtime transcription session object, plus an ephemeral key.

Responses

200 - Session created successfully.

Content-Type: application/json

Example:

{
   "client_secret" : null,
   "expires_at" : 1742188264,
   "id" : "sess_BBwZc7cFV3XizEyKGDCGL",
   "input_audio_format" : "pcm16",
   "input_audio_transcription" : {
      "language" : null,
      "model" : "gpt-4o-transcribe",
      "prompt" : ""
   },
   "modalities" : [
      "audio",
      "text"
   ],
   "object" : "realtime.transcription_session",
   "turn_detection" : {
      "prefix_padding_ms" : 300,
      "silence_duration_ms" : 200,
      "threshold" : 0.5,
      "type" : "server_vad"
   }
}

SCHEMAS

AudioTranscription

Properties:

delay (string) - Controls how long the model waits before emitting transcription text. Higher values can improve transcription accuracy at the cost of latency. Only supported with gpt-realtime-whisper in GA Realtime sessions.

Allowed values: minimal, low, medium, high, xhigh
language (string) - The language of the input audio. Supplying the input language in ISO-639-1 (e.g. en ) format will improve accuracy and latency.
model (anyOf) - The model to use for transcription. Current options are whisper-1 , gpt-4o-mini-transcribe , gpt-4o-mini-transcribe-2025-12-15 , gpt-4o-transcribe , gpt-4o-transcribe-diarize , and gpt-realtime-whisper . Use gpt-4o-transcribe-diarize when you need diarization with speaker labels.
prompt (string) - An optional text to guide the model's style or continue a previous audio segment. For whisper-1 , the prompt is a list of keywords . For gpt-4o-transcribe models (excluding gpt-4o-transcribe-diarize ), the prompt is a free text string, for example "expect words related to technology". Prompt is not supported with gpt-realtime-whisper in GA Realtime sessions.

AudioTranscriptionResponse

Properties:

language (string) - The language of the input audio.
model (anyOf) - The model used for transcription. Current options are whisper-1 , gpt-4o-mini-transcribe , gpt-4o-mini-transcribe-2025-12-15 , gpt-4o-transcribe , gpt-4o-transcribe-diarize , and gpt-realtime-whisper .
prompt (string) - The prompt configured for input audio transcription, when present.

NoiseReductionType

Type of noise reduction. near_field is for close-talking microphones such as headphones, far_field is for far-field microphones such as laptop or conference room microphones.

RealtimeTranscriptionSessionCreateRequest

Properties:

include (array of string) - The set of items to include in the transcription. Current available items are: item.input_audio_transcription.logprobs
input_audio_format (string) - The format of input audio. Options are pcm16 , g711_ulaw , or g711_alaw . For pcm16 , input audio must be 16-bit PCM at a 24kHz sample rate, single channel (mono), and little-endian byte order.

Allowed values: pcm16, g711_ulaw, g711_alaw

Default: pcm16
input_audio_noise_reduction (object) - Configuration for input audio noise reduction. This can be set to null to turn off. Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model. Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.

Default: null
input_audio_transcription (AudioTranscription) - Configuration for input audio transcription. The client can optionally set the language and prompt for transcription, these offer additional guidance to the transcription service.

See "AudioTranscription" below for shape.
turn_detection (object) - Configuration for turn detection. Can be set to null to turn off. Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.

RealtimeTranscriptionSessionCreateResponse

Properties:

client_secret (object, required) - Ephemeral key returned by the API. Only present when the session is created on the server via REST API.
input_audio_format (string) - The format of input audio. Options are pcm16 , g711_ulaw , or g711_alaw .
input_audio_transcription (AudioTranscriptionResponse) - Configuration of the transcription model.

See "AudioTranscriptionResponse" below for shape.
modalities (unknown) - The set of modalities the model can respond with. To disable audio, set this to ["text"].
turn_detection (object) - Configuration for turn detection. Can be set to null to turn off. Server VAD means that the model will detect the start and end of speech based on audio volume and respond at the end of user speech.

COPYRIGHT AND LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.0 or, at your option, any later version of Perl 5 you may have available.

To install OpenAPI::Client::OpenAI, copy and paste the appropriate command in to your terminal.

cpanm

cpanm OpenAPI::Client::OpenAI

CPAN shell

perl -MCPAN -e shell
install OpenAPI::Client::OpenAI

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	Go to GitHub issues (only if GitHub is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

OPERATIONS

POST /realtime/transcription_sessions

create-realtime-transcription-session

Responses

SCHEMAS

AudioTranscription

AudioTranscriptionResponse

NoiseReductionType

RealtimeTranscriptionSessionCreateRequest

RealtimeTranscriptionSessionCreateResponse

SEE ALSO

COPYRIGHT AND LICENSE

Module Install Instructions

Keyboard Shortcuts