Skip to content

Latest commit

 

History

History
21 lines (14 loc) · 907 Bytes

README.md

File metadata and controls

21 lines (14 loc) · 907 Bytes

Panel_discussion_summarization

My Approach

The input of the file is revieved in .mp3 format

The file is converted to .wav format

An .rttm file is generated using 'pyannote/speaker-diarization@2.1' using hugging face to generate the timestamps of each speaker

This then converted to a csv format

Then the audio files are generated for those timestamps

The text is extracted from each audio file, speaker-wise

The overall summary of the file is then obtained using hugging face's "/knkarthick/MEETING_SUMMARY"

How to run the code?

It is easy to execute the code in google colab by just uploading the .mp3 file and running all the cells

Results

In CPU it took 45 minutes to execute the entire code for the audio uploaded

In GPU it took less than a minute to execute the entire code for the audio uploaded