adporn.net Solution: Enabling load PDF to Chainlit app - Python Video Tutorial | LinkedIn Learning, formerly Lynda.com

LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.

Select Accept to consent or Reject to decline non-essential cookies for this use. You can update your choices at any time in your settings.

Start free trial Sign in

From the course: Hands-On AI: Building LLM-Powered Apps

Solution: Enabling load PDF to Chainlit app - Python Tutorial

From the course: Hands-On AI: Building LLM-Powered Apps

Solution: Enabling load PDF to Chainlit app

“

- [Instructor] Welcome back. Hopefully you enjoyed the exercise to add PDF loading and processing capability to our chat with PDF application. So that's go to app slash app.py. Let's first load the PDF file. We will use PDF plumber loader to load the temporary file that user have uploaded and then we can use loader.load. There are a wide variety of PDF loaders in the lichen library and I am picking PDF plumber only because I am familiar with it. You are free to use other PDF loader as well. After we load the PDF file, the file is long so we will have to chunk them into smaller documents as we mentioned previously. So we will use one of Lichen's document transformer called recursive character text Splitter to split our document into pieces. As we mentioned in the previous videos, we do have to budget and keep in mind about large language models contact length. Since we're using GPT 3.5, 16K, it has 16K context length. So if we set chunk size to 3000 and we are retrieving five tokens per chunk and if we retrieve the top five documents, this means at most we will have 15,000 tokens in the context. And these leaves around 1000 tokens for our input questions and output answers. And just to retain better context, that's at a chunk overlap to, let's say 100, that which means every single chunk is going to include a hundred tokens from the previous and the next chunk. Now we build our text splitter, then let's use the text splitter to split documents that we have loaded previously using our PDF loader. So this concludes the document loading and document chunking process of our chat with PDF application. Since our chat with PDF application, our user to chat with PDF, in order to do so, we have to ask the user to upload a PDF before the chat session starts. So at chat start, we will send user a message, ask file message to ask user for file and then we will say, please upload the content. We will accept PDF documents only, so application PDF, and then we will set the max file size to 10 megabytes just for safety and security. And once we create the ask file message, we will send to the terminal so the user can see our message. With this, we build in the capability to ask user to upload a file and then we load the PDF document and we chunk the document. Now let's give the application a spin. We can do so by chainlit run app slash app.py dash w, and now you can see our prompt. Please upload the PDF file you want to ask questions against. We have provided a simple PDF here, so please download it to your local directory. And what we will do here in our chat application is we will browse the files, we will upload the simple document that we provided and you see that it says processed and it's loading, which means it's done at this point. So now we have uploaded and processed a document. In the next exercise we will try to ingest this into our search engine so we can search against the document.

Contents