This project aims to develop a system for early detection of Alzheimer's disease through natural language processing (NLP) techniques applied to speech recognition data. By analyzing speech patterns, we aim to identify early signs of cognitive decline associated with Alzheimer's
The app predicts the likelihood of Alzheimer's for potential patients by analysing speech inputs.
- Project Overview
- Installation
- Usage
- Features
- Dataset
- Model
- Results
- Contributing
- License
- Acknowledgments
- Screenshots
To set up the project locally, follow these steps:
-
Clone the repository:
git clone https://github.com/42bismuth/Alzheimer-Detection.git cd Alzheimers-Detection
-
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install the required packages:
pip install -r requirements.txt
-
Install dependencies for the frontend
cd client && npm i
To run the project, follow these steps:
-
To run the backend: Run the
app.py
file. -
To run frontend:
cd client npm start
- Speech Recognition: Transcribe audio recordings to text using Assembly AI voice-to-text model.
- NLP Analysis: Performed detailed analysis of the transcriptions to identify linguistic patterns and markers that are indicative of Alzheimer's Disease. Implement various Natural Language Processing techniques to enhance the accuracy and reliability of the analysis.
- React UI: Developed a modern, intuitive user interface using React to facilitate easy interaction with the system. Ensured the UI is user-friendly, responsive, and capable of displaying results and insights from the speech recognition and NLP analysis clearly.
Details about the dataset used in this project:
- Source: DementiaBank
- Description: DementiaBank contains audio recordings of 117 individuals with Alzheimer's Disease and 93 healthy individuals describing an image. The task is to classify these groups based only on the audio, as no text features are included.
- Data extraction: Extracted plain text from speech transcripts using
pylangacq
Description of the model training process:
- Training: Information on how the model was trained, including hyperparameters, training duration, etc.
- Model Training: Train machine learning models to detect early signs of Alzheimer's from speech data. Models used:
- Random Forest
- SVC+GRID Search
- Naive-Bayes SVC
- LSTM
- Bi-directional LSTM
- Evaluation: Metrics and results from evaluating the model on the test set and assess model performance using accuracy and F1 scores.
Summary of the project's results:
- Accuracy: Overall accuracy of the model is 84%.
- ROC-AUC: The ROC-AUC curve for the NB_SVC model obtained is:
We welcome contributions to improve this project. To contribute:
- Fork the repository.
- Create a new branch.
- Make your changes and commit them.
- Push your changes to your fork.
- Create a pull request.
Please ensure your code adheres to our coding standards and include relevant tests.
This project is licensed under the MIT License. See the LICENSE file for details.
We would like to express our gratitude to the following:
- DementiaBank: Special thanks to Dr. Brian MacWhinney for providing access to the DementiaBank dataset.
- AssemblyAI: For their powerful speech-to-text API SDK, which facilitated the transcription of audio data.
Here are some screenshots of the website UI: