From the course: Hands-On AI: RAG using LlamaIndex
Choosing an LLM and embeddings provider - LlamaIndex Tutorial
From the course: Hands-On AI: RAG using LlamaIndex
Choosing an LLM and embeddings provider
- [Instructor] Let's talk about choosing a LLM and embeddings provider. There are a ton of options available to you for choosing a provider. You could choose from companies that build and serve their own LLMs, like OpenAI, Cohere, Mistral, Google, or you could choose from companies that host and serve open source models via an API, for example, Fireworks, Together, Hugging Face, Replicate, so on and so forth. It's up to you what it is that you want to choose. For this course, I'm sticking primarily with Cohere and with OpenAI. There's so many providers entering the market and trying to capture share, but luckily LlamaIndex has integrations with the vast majority of them and you can see all of them right here. And then, you know, if you look here, llama-index-llm, so this is in the LlamaIndex GitHub repository. Look under integrations, look at LLMs, and then you can see all the different LLM providers that are available to you. And the installation process for every one of those is pretty much the same thing. This would be a, pip install llama-index-llms-whatever provider you choose. So for example, like let's say if you wanted to use, I don't know, like Nvidia for example, if you wanted to use Nvidia, it'd be llama-index-llms-nvidia. So it's the same pattern for installing all of them. And when you want to import that LLM, you're going to follow this pattern from llama-index.llms., whatever LLM provider you choose, import that LLM provider class. So it's going to be the same pattern regardless. So again, I'm using OpenAI and Cohere predominantly throughout this course. The first half of the course I use mostly Cohere because it's free, like you don't have to enter in a credit card or put in any information whatsoever. You just sign up with a GitHub, or a Google, or an email, and I'll show you how to do the sign up in the next video, but you just sign up and you're good to go. They've got the Command-R and Command-R-Plus models, which work really well for RAG tasks. There's something to keep in mind though, because it is a free API, there are rate limits. So with Cohere, you cannot send more than, I believe at the time of my recording this video, the limitations that I ran up against with Cohere are no more than five requests per minute, no more than 100 requests per hour, and no more than 1000 requests in a month. So you have those limitations imposed on you. OpenAI also has rate limits. They've got this kind of tiered structure. So if you go to OpenAI usage tiers, you'll see the different tiers that they have. You know, there's a Free tier, Tier 1, 2, 3, so on, so forth. And with the OpenAI tiered usage, all this really means is that as you progress along the tiers, that means you can just do more API calls, basically is what that means. So you get higher rate limits. For better or worse, I've been using OpenAI for a very long time and I am on Tier 4. And so in the second half of this course, I switch gears to using OpenAI because I have to do a lot of calls to the API for embeddings and a lot of calls for generating responses. And when I was trying to create the course using Cohere, I ran up against those limits, and so I just used OpenAI for the later half of the course. And again, that's just because I had more usage that gives me, gave me more flexibility and freedom to experiment. You know, there is probably a better way to think about how to choose the right LLM and embeddings provider rather than which LLM providers can give you more freedom to hack around, and let's talk about that right now. So a few points that I want to give to you to help kind of guide your thinking about which LLM provider to choose, first is just identify your business objective and use case, right? So understand your business's present and future objectives, and just ensure that the LLM that you're using is going to fulfill those needs. So determine what this particular LLM that you're considering is good at. Is it good at content generation, sentiment analysis, so on so forth, and just consider that. You also want to evaluate different LLMs and their capabilities. So try to hack around with this many language models as you can. One thing that I really like is a platform called Poe. So Poe is cool because with one login and in one user interface, you've got access to a wide variety of different language models that you can hack around with. So I highly recommend you to play around with as many different language models as you possibly can. The other thing you want to consider is license, right? So different providers are going to have different restrictions on how you can use the generations from their models. Some licenses might limit the use of a LLM or embeddings to non-commercial applications, they might require attribution, or just restrict the type of applications for which the model can be used. So just make sure you understand the license and the impact this is going to have on your use case. If we scroll back up to the top here and look at some of these big labs, of all of these big labs, the provider that I've noticed has the most permissive license is Mistral. So that's just something to consider. Last, you just want to also consider the language support and multilingual capabilities of a model as well. You also want to evaluate security and privacy compliance, so look for features like data encryption, secure data handling, and compliance with the relevant privacy regulations. Of course, cost is always a issue. So OpenAI for GPT-4o has a blended rate of about $10 per million tokens. So for input tokens, it's $5 per million input tokens, and then I believe it is $15 per output token. So if we go to OpenAI pricing, you can verify that here that the input tokens is $5 per million input, $15 per million output. So it gives you kind of a blended rate of about 10 bucks per million tokens. Mistral is quite cheap as well. Cohere, like I mentioned, is free, but if you want to use Cohere production, you've got to get in touch with their sales team for that. Another thing to consider is just the community support and resources. With everything I do in the dev tools space, I look for framers that have active communities. Generative AI is moving at a very, very fast pace, and it's hard to keep up with things on your own. You're inevitably going to run into issues or run into questions where you don't have the answers, and having a active community, a welcome community is super important. So OpenAI has got their OpenAI Slack channel, they've got the OpenAI Developer Forum. Cohere has a Discord channel. Mistral also has a Discord channel. At the moment, my good friend Sophia Yang is the head of developer relations at Mistral and she's awesome. In any event, just look for an active community, look for a strong community, look for a place that's welcoming, and look for a place that has a number of learning resources for you. So that's all I have to say about choosing the right LLM for your use case. Like I mentioned, throughout this course, we're going to make use of primarily Cohere and OpenAI.