Skip to content

🎥 Building a simple RAG (Retrieval-Augmented Generation) application using Pinecone and OpenAI's API. The application will allow you to ask questions about any YouTube video.

Notifications You must be signed in to change notification settings

yanliu1111/youtube-rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Building a RAG application from scratch

This is a step-by-step guide to building a simple RAG (Retrieval-Augmented Generation) application using Pinecone and OpenAI's API. The application will allow you to ask questions about any YouTube video.

Training Video HERE

Tech Stack

  • OpenAI
  • Langchain
  • openai-whisper
  • scikit-learn
  • langchain-pinecone (Vector Store)
  • colab

Setup

  1. In this tutorial, use in-memory vector store, which needs extra installation pip install "langchain[docarray]" and Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/ which I didn't follow, I skip this part and directly use pinecone instead.
  2. Create a virtual environment and install the required packages:
$ python3 -m venv .venv
$ source .venv/bin/activate
$ pip install -r requirements.txt

For whisper installation, use pip install git+https://github.com/openai/whisper.git instead of pip install whisper.

  1. Create a free Pinecone account and get your API key from here. If you don't have choice for regin setting, we are probably same using Iowa, US.
    So here is setting PINECONE_API_ENV="us-central1-gcp"

  2. Create a .env file with the following variables:

OPENAI_API_KEY = [ENTER YOUR OPENAI API KEY HERE]
PINECONE_API_KEY = [ENTER YOUR PINECONE API KEY HERE]
PINECONE_API_ENV = [ENTER YOUR PINECONE API ENVIRONMENT HERE]
  1. Bug fix I did report issue in Author's github, HERE. Instead of using PineconeVectorStore, got unAuth error, I use Pinecone directly.
from langchain_pinecone import Pinecone

import os
os.environ['PINECONE_API_KEY'] = "PINECONE_API_KEY"
index_name = "youtube-index"

pinecone = Pinecone.from_documents( index_name = index_name,
                                    documents = documents,
                                    embedding = embeddings)

💖 Conclusion, this is good tutorial for start learnning Langchain whisper, audio transcription, and RAG. I recommend it.

About

🎥 Building a simple RAG (Retrieval-Augmented Generation) application using Pinecone and OpenAI's API. The application will allow you to ask questions about any YouTube video.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published