Transcribe FAQs | Singapore Government Developer Portal
Have feedback? Please


It usually takes less than an hour to transcribe an hour of audio. Depending on the audio quality, the duration may vary. If the audio is soft and mumbled, it could take slightly longer.

Yes, there is a two-hour limit for each live transcription session.

No, Transcribe does not set a limit on the number of transcriptions a user can submit. In general, users should use the service appropriately and with fair usage. If you have large number of transcription tasks to be processed in batch, please contact us separately at

You may refer to the tips for online meeting here for more information on the setup.

The performance and accuracy of speech-to-text transcription is typically measured by Word Error Rate (WER) that summarises the errors in a transcript including insertion, deletion and substitution. In reality, the accuracy varies according to use case scenario and is determined by a number of factors such as the fluency, the enunciation and speakers’ accents. For use case in a formal speech setting with professional audio system and articulate speakers, the WER could be as low as 10%. For meetings and focus group discussions that involve many speakers with different accents and fluency, poor audio system and significant background noise, the WER could go as high as 30% to 40%. Therefore, the accuracy for each transcription task varies and sometimes one transcript could have more errors than the other.

Transcribe does not support auto detection of language and language switching in a task. After the language for the task is selected, the transcript would be in the selected language only. Any other languages used in the same task would not be transcribed

Last updated 21 December 2023

Was this article useful?
Send this page via email
Share on Facebook
Share on Linkedin
Tweet this page