From the course: Introduction to Large Language Models
GitHub Models: Comparing LLMs
- [Instructor] Imagine asking two teachers to explain the same mathematics problem to a class. Now, each of them might have a different approach to solving the problem, and it's like that with large language models. You can provide the same text or prompt and get a different response from the large language model. GitHub Models lets us easily compare two large language models. You need a GitHub account, and you can sign up for one by going to github.com. And once you have an account, head over to github.com/marketplace. And here you can just select Models over on the left, and then you can select models based on the different providers, their capability, and then their functionality, so things like whether you need a model that has low latency, whether you want a model that can handle multiple languages, and so on. Now the OpenAI models in general are pretty good, so I'm going to use the GPT-4o mini as my benchmark, and I want to compare this to one of these smaller models. And so let's pick one from the Low latency list. So I'm going to go ahead and pick the Phi-3.5-mini instruct. I'm going to select Playground, and then I want to go ahead and compare this to the GPT-4o mini model. So let me go ahead and select that. Now, this is the task. I want to get the models to explain a joke. Now this isn't as trivial as it sounds because it's checking the model's capability to work with nuances of the English language. So here's the joke. I was going to fly to visit my family on May the 3rd, and my mom said, "Oh, great, your dad's poetry reading is that night." So now I'm flying in on May the 4th. Now I've intentionally included the reference to May the 4th to see if I can trick some of the language models to see if there's a "Star Wars" reference. The explanation of this joke or this word play is that this person doesn't want to attend their dad's poetry reading. So let's go ahead and see how the different models explain this. Go ahead and pause the video here and compare the different outputs from the two models. After reading them, I think you'll agree with me that the GPT-4o mini produces the best results. Now remember this illustration is only comparing the models for a single task. You'll want to be comparing models on hundreds and thousands of tasks to see which one performs the best for your business needs. Ultimately, you are the best judge for which model works best for your business and your situation. All right, so we've seen that GitHub Models is a great way to prototype with large language models, letting you easily work with and compare large language models from different providers.