From the course: Hands-On AI: Building LLM-Powered Apps
Solution: Enabling load PDF to Chainlit app - Python Tutorial
From the course: Hands-On AI: Building LLM-Powered Apps
Solution: Enabling load PDF to Chainlit app
- [Instructor] Welcome back. Hopefully you enjoyed the exercise to add PDF loading and processing capability to our chat with PDF application. So that's go to app slash app.py. Let's first load the PDF file. We will use PDF plumber loader to load the temporary file that user have uploaded and then we can use loader.load. There are a wide variety of PDF loaders in the lichen library and I am picking PDF plumber only because I am familiar with it. You are free to use other PDF loader as well. After we load the PDF file, the file is long so we will have to chunk them into smaller documents as we mentioned previously. So we will use one of Lichen's document transformer called recursive character text Splitter to split our document into pieces. As we mentioned in the previous videos, we do have to budget and keep in mind about large language models contact length. Since we're using GPT 3.5, 16K, it has 16K context length. So if we set chunk size to 3000 and we are retrieving five tokens per chunk and if we retrieve the top five documents, this means at most we will have 15,000 tokens in the context. And these leaves around 1000 tokens for our input questions and output answers. And just to retain better context, that's at a chunk overlap to, let's say 100, that which means every single chunk is going to include a hundred tokens from the previous and the next chunk. Now we build our text splitter, then let's use the text splitter to split documents that we have loaded previously using our PDF loader. So this concludes the document loading and document chunking process of our chat with PDF application. Since our chat with PDF application, our user to chat with PDF, in order to do so, we have to ask the user to upload a PDF before the chat session starts. So at chat start, we will send user a message, ask file message to ask user for file and then we will say, please upload the content. We will accept PDF documents only, so application PDF, and then we will set the max file size to 10 megabytes just for safety and security. And once we create the ask file message, we will send to the terminal so the user can see our message. With this, we build in the capability to ask user to upload a file and then we load the PDF document and we chunk the document. Now let's give the application a spin. We can do so by chainlit run app slash app.py dash w, and now you can see our prompt. Please upload the PDF file you want to ask questions against. We have provided a simple PDF here, so please download it to your local directory. And what we will do here in our chat application is we will browse the files, we will upload the simple document that we provided and you see that it says processed and it's loading, which means it's done at this point. So now we have uploaded and processed a document. In the next exercise we will try to ingest this into our search engine so we can search against the document.
Contents
-
-
-
-
Retrieval augmented generation3m 30s
-
Search engine basics2m 32s
-
Embedding search3m
-
Embedding model limitations3m 15s
-
Challenge: Enabling load PDF to Chainlit app48s
-
Solution: Enabling load PDF to Chainlit app5m 4s
-
Challenge: Indexing documents into a vector database1m 50s
-
Solution: Indexing documents into a vector database1m 43s
-
Challenge: Putting it all together1m 10s
-
Solution: Putting it all together3m 17s
-
Trying out your chat with the PDF app2m 15s
-
-
-