Image to Story and Audio using Hugging Face

Project Overview

This project is designed to take an image as input, generate a caption from the image, create a short story based on the caption, and then convert the story into an audio file. The application is built using Streamlit for the user interface and leverages various AI models from Hugging Face for the tasks of image captioning, story generation, and text-to-speech conversion.

Features

Image Captioning: Uses Hugging Face's Salesforce/blip-image-captioning-base model to generate a caption for the uploaded image.
Story Generation: Utilizes LangChain's ChatGroq model to generate a short story based on the caption.
Text-to-Speech: Converts the generated story into an audio file using Hugging Face's facebook/mms-tts-eng model.
Streamlit UI: Provides an easy-to-use interface for uploading images, viewing the generated caption and story, and listening to or downloading the audio file.

Tech Stack

Hugging Face Transformers: For image-to-text and language models.
LangChain: For LLMChain and ChatGroq integration.
Streamlit: For the user interface.
PIL: For image processing.
Requests: For API calls.
Python Dotenv: For loading environment variables.

Installation

Clone the Repository:

git clone https://github.com/AbhishekSharma-17/HuggingFace.git
cd HuggingFace

Create a Virtual Environment:

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install the Required Packages:
```
pip install -r requirements.txt
```
Set Up Environment Variables: Create a .env file in the project directory and add your API keys:
```
GROQ_API_KEY=your_groq_api_key
HUGGINGFACE_API_KEY=your_huggingface_api_key
```

Usage

Run the Streamlit Application:
```
streamlit run app.py
```
Open the Application: Open the URL provided by Streamlit (usually http://localhost:8501) in your web browser.
Upload an Image:
- Click on the "Upload an Image" button and select an image file (png, jpg, or jpeg).
View the Generated Caption:
- The app will display the generated caption for the uploaded image.
Generate a Story:
- The app will create a short story based on the caption and display it.
Convert Story to Audio:
- The app will convert the story to an audio file. You can listen to it directly in the app or download it.

File Structure

app.py: The main Streamlit application file.
requirements.txt: List of required Python packages.
.env: Environment variables file (not included in the repository, needs to be created).
README.md: This README file.

Acknowledgements

Hugging Face for providing state-of-the-art models.
LangChain for the LLMChain and ChatGroq integration.
Streamlit for the user-friendly UI framework.

Feel free to contribute to this project by submitting issues or pull requests. Happy coding!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
audio.wav		audio.wav
image_story_speech.py		image_story_speech.py
img.jpeg		img.jpeg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image to Story and Audio using Hugging Face

Project Overview

Features

Tech Stack

Installation

Usage

File Structure

Acknowledgements

About

Releases

Packages

Languages

AbhishekSharma-17/HuggingFace

Folders and files

Latest commit

History

Repository files navigation

Image to Story and Audio using Hugging Face

Project Overview

Features

Tech Stack

Installation

Usage

File Structure

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages