DoctorAI - Speech to Text and Text to Speech

DoctorAI is a Python-based tool that leverages offline dictionaries to perform speech-to-text (STT) and text-to-speech (TTS) tasks, including support for health-related terminology. The project ensures privacy by functioning offline and incorporates medical-specific vocabulary for enhanced accuracy in healthcare applications.

Requirements

Python 3.10
Offline STT/TTS Libraries
FFmpeg (for audio processing)

Installation

Install Python Dependencies To install the required Python packages, navigate to the project directory and run:

pip install -r requirements.txt

Install FFmpeg

Make sure FFmpeg is installed and available in the current directory or on your system's PATH. You can download FFmpeg from the official site here.

If you're on Linux or macOS, you can install FFmpeg using the package manager:

For Ubuntu/Linux

sudo apt-get install ffmpeg

For macOS (using Homebrew)

brew install ffmpeg

On Windows Download and place the FFmpeg executable in the current directory or add it to your system's PATH.

Usage

To run the DoctorAI STT system, use the following command:

python DocAi_STT.py

This will start the speech-to-text process using offline resources.

Available Models and Languages

This framework offers five model sizes, each designed to balance speed and accuracy based on your application's needs. Four of the models are available in English-only versions for tasks requiring better language-specific performance. The models differ in memory requirements and relative speed, allowing flexibility in deployment based on hardware constraints.

Below is a list of available models, their parameter sizes, memory requirements, and relative speeds:

Model Size	Parameters	English-only Model	Multilingual Model	Required VRAM	Relative Speed
Tiny	39M	`tiny.en`	`tiny`	~1 GB	~32x
Base	74M	`base.en`	`base`	~1 GB	~16x
Small	244M	`small.en`	`small`	~2 GB	~6x
Medium	769M	`medium.en`	`medium`	~5 GB	~2x
Large	1550M	N/A	`large`	~10 GB	1x

For English-only tasks, we recommend using the .en models (e.g., tiny.en, base.en) as they typically offer better performance. The difference in accuracy becomes less significant with the larger models such as small.en and medium.en.

Feel free to adjust based on any additional details you’d like to include.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
DocAi_STT.py		DocAi_STT.py
InnoSetting.iss		InnoSetting.iss
README.md		README.md
diagnoses_symptoms_drugs.txt		diagnoses_symptoms_drugs.txt
icon.ico		icon.ico
requirements.txt		requirements.txt
tokenizer_config.json		tokenizer_config.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DoctorAI - Speech to Text and Text to Speech

Requirements

Installation

Install FFmpeg

Usage

Available Models and Languages

About

Releases

Packages

Languages

javaidiqbal11/DocAi_STT

Folders and files

Latest commit

History

Repository files navigation

DoctorAI - Speech to Text and Text to Speech

Requirements

Installation

Install FFmpeg

Usage

Available Models and Languages

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages