🦜 Audio Transcription
This application transcribes audio files into text using open source AI tools. It supports noisy inputs, timestamps and translation all-in-one!
Features​
- Process any amount of audio files using Whisper + argostranslate
- Transcribe from a wide array of languages, with auto-detection of the source language
- Translate into many languages
- Smart filter by keyword only the relevant parts of your audio
- Timestamps for each sentence!
- Exposes a processor for the Octostar data processing pipeline to transcribe or translate audio files at
/api/nifi
Installation (v0.1.0-dev.2)​
The application requires an App folder (Your Workspace→Create→App→Empty App Folder) with the following minimal file structure:
.
├── manifest.yaml
├── readme.MD (this file)
Please ensure that the manifest points to the right application and declares the correct application version, like so:
image: octostar/app.entity-extract-and-graph:0.1.0-dev.1
The app can be deployed from the App Editor (App Folder→Open With→App Editor→Deploy App).
To open the application, open it as-is or via the context menu option Open With→ 🎤Transcribe Audio on an audio file or a folder.
User Guide​
The app is straightforward to use. Once opened, it will show the list of files among the input ones which are supported for transcription. The user can select the checkbox Include Audio Translation to perform both transcription and translation at the same time.
Once the Transcribe! button is pressed, the AI model will begin transcribing the audio files. The time necessary for this process varies by hardware, typically around 1s per second of audio unless a GPU is available, which expedites transcription greatly.
Once the transcription process is complete, the application will show the transcription (and optionally translation) for the first audio file. The audio itself can be replayed from the player at the top of the screen, and below it are a series of transcriptions paired with timestamps. The transcriptions are fully editable to account for manual editing of mistakes, and clicking on a timestamp will play the audio from the beginning of that timestamp. At the bottom there is also a list of entities extracted with Named Entity Recognition, and which can be saved alongside the transcription.
On the left sidebar, there is a filtering tool. The user can insert any amount of keywords and the audio file(s) will be filtered accordingly on a per-sentence basis. Finally, the user can save either transcription or translation back into the original audio file or as a separate file, with our without timestamps.
FAQ​
I cannot open the app! "App not Available", "404 not found", "500 service not available" Please ensure the application is correctly deployed. If you are not an admin, please contact one.
My files are not loaded into the application? Files may take a while to be recognized by the application, before which the application will still ask the user for files to be uploaded.
The transcription quality is poor/Detected language is wrong/There are missing sentences! This application uses an AI model to detect the source language of the audio, and is therefore subject to variance in quality. Supported languages are (according to https://platform.openai.com/docs/guides/speech-to-text/supported-languages):
Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh.
Any language outside of these will not be recognized and transcription may outright fail. Furthermore, supported languages may still be recognized incorrectly, in which case it is likely most of the audio will not be transcribed, or will be transcribed incorrectly.
Developer Guide​
TODO
Known Limitations​
- No support for multi-language: the application may outright fail, ignore sentences or produce garbage text when auto-detecting language on an audio in which multiple languages are spoken
- No support outside the supported language list: the application may fail, produce garbage text or detect a different language (typically English)
- No support for multi-speaker: the application will transcribe all speakers as if they were one