From the course: Hands-On AI: Building LLM-Powered Apps

Challenge: Indexing documents into a vector database - Python Tutorial

From the course: Hands-On AI: Building LLM-Powered Apps

Challenge: Indexing documents into a vector database

- [Instructor] In the previous lab, we enabled document loading and developed a chunking strategy to chunk them into smaller sub documents. Now we will need to index them into our search engine in order for us to utilize the process document in our chat with PDF application. This is the retrieval portion of direct architecture. In this lab, we will implement the embedded model and index the documents we processed previously into databases using LangChain. For our search engine of choice, we will be using Chroma DB. Chroma is a lightweight vector database that can live in memory similar to SQLite. Let's go to app/app.py. We have prepared some exercises for you to walk through. Now we are building another function called create_search_engine. This function takes in a list of documents in a embedded model. And it will return a vector database with documents and embeddings indexed. We will also initialize an OpenAIEmbeddings model as the encoder. The standard one to use is text-embedding-ada-002. Please initialize it using LangChain. We can then pass in the model and the list of processed documents into our create_search_engine function and get a vector database with all the data we have inside. Let's go to work.

Contents