Skip to content

Commit

Permalink
changed description
Browse files Browse the repository at this point in the history
  • Loading branch information
NavodPeiris committed Jan 19, 2024
1 parent 6081738 commit 91b660e
Show file tree
Hide file tree
Showing 3 changed files with 61 additions and 23 deletions.
41 changes: 30 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,49 @@
This library do speaker diarization, speaker recognition, transcription on a single wav file to provide a transcript with actual speaker names. This library will also return an array containing result information. This library also contain audio preprocessor functions.
This library does speaker diarization, speaker recognition, and transcription on a single wav file to provide a transcript with actual speaker names. This library will also return an array containing result information.

Transcriptor takes 4 arguments. file to transcribe, log_folder, language used for transcribing, voices folder
This library contains following audio preprocessing functions:

voices_folder should contain subfolders named with speaker names and their voice samples. This will be used for speaker recognition to identify speaker.
1. convert mp3 to wav

if voice_folder is not provided then speaker tags will be arbitrary.
2. convert stereo wav file to mono

log_folder is to store final transcript as a text file.
3. re-encode the wav file to have 16-bit PCM encoding

Transcriptor method takes 5 arguments.

1. file to transcribe

2. log_folder to store transcription

3. language used for transcribing

4. model size ("tiny", "medium", or "large")

5. voices_folder (contains speaker voice samples for speaker recognition)

voices_folder should contain subfolders named with speaker names. Each subfolder belongs to a speaker and it can contain many voice samples. This will be used for speaker recognition to identify the speaker.

if voices_folder is not provided then speaker tags will be arbitrary.

log_folder is to store the final transcript as a text file.

transcript will also indicate the timeframe in seconds where each speaker speaks.

### Transcription example:

```
from speechlib import Transcriptor
file = "obama.wav"
voice_folder = "voices"
file = "obama1.wav"
voices_folder = "voices"
language = "english"
log_folder = "logs"
modelSize = "medium"
transcriptor = Transcriptor(file, log_folder, language, voice_folder)
transcriptor = Transcriptor(file, log_folder, language, modelSize, voices_folder)
res = transcriptor.transcribe()
print(res)
--> [["start", "end", "text", "speaker"], ["start", "end", "text", "speaker"]...]
res --> [["start", "end", "text", "speaker"], ["start", "end", "text", "speaker"]...]
```

start: starting time of speech
Expand Down
41 changes: 30 additions & 11 deletions library.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,49 @@
This library do speaker diarization, speaker recognition, transcription on a single wav file to provide a transcript with actual speaker names. This library will also return an array containing result information. This library also contain audio preprocessor functions.
This library does speaker diarization, speaker recognition, and transcription on a single wav file to provide a transcript with actual speaker names. This library will also return an array containing result information.

Transcriptor takes 4 arguments. file to transcribe, log_folder, language used for transcribing, voices folder
This library contains following audio preprocessing functions:

voices_folder should contain subfolders named with speaker names and their voice samples. This will be used for speaker recognition to identify speaker.
1. convert mp3 to wav

if voice_folder is not provided then speaker tags will be arbitrary.
2. convert stereo wav file to mono

log_folder is to store final transcript as a text file.
3. re-encode the wav file to have 16-bit PCM encoding

Transcriptor method takes 5 arguments.

1. file to transcribe

2. log_folder to store transcription

3. language used for transcribing

4. model size ("tiny", "medium", or "large")

5. voices_folder (contains speaker voice samples for speaker recognition)

voices_folder should contain subfolders named with speaker names. Each subfolder belongs to a speaker and it can contain many voice samples. This will be used for speaker recognition to identify the speaker.

if voices_folder is not provided then speaker tags will be arbitrary.

log_folder is to store the final transcript as a text file.

transcript will also indicate the timeframe in seconds where each speaker speaks.

### Transcription example:

```
from speechlib import Transcriptor
file = "obama.wav"
voice_folder = "voices"
file = "obama1.wav"
voices_folder = "voices"
language = "english"
log_folder = "logs"
modelSize = "medium"
transcriptor = Transcriptor(file, log_folder, language, voice_folder)
transcriptor = Transcriptor(file, log_folder, language, modelSize, voices_folder)
res = transcriptor.transcribe()
print(res)
--> [["start", "end", "text", "speaker"], ["start", "end", "text", "speaker"]...]
res --> [["start", "end", "text", "speaker"], ["start", "end", "text", "speaker"]...]
```

start: starting time of speech
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

setup(
name="speechlib",
version="1.0.4",
version="1.0.5",
description="speechlib is a library that can do speaker diarization, transcription and speaker recognition on an audio file to create transcripts with actual speaker names. This library also contain audio preprocessor functions.",
packages=find_packages(),
long_description=long_description,
Expand Down

0 comments on commit 91b660e

Please sign in to comment.