From the course: Generative AI: Working with Large Language Models

Going further with Transformers

From the course: Generative AI: Working with Large Language Models

Going further with Transformers

- [Jonathan] We've covered a ton of material in this course. We've looked at many of the large language models since GPT-3. Let's review them really quickly. We saw how Google reduced training and inference costs by using sparse mixtures of expert models with GLaM. A month later, Microsoft teamed up with Nvidia to create the Megatron Turing LG model that was three times larger than GPT-3 with 530 billion parameters. In the same month, the DeepMind team released Gofer and their largest 280 billion parameter model which was their best performing model. A few months later, the DeepMind team introduced Chinchilla, which turned a lot of our understanding of large language models on its head. The main takeaway was that large language models up to this point had been undertrained. Google released the 540 billion parameter modeled PaLM in April training it on their Pathways infrastructure, and this has been the best performing model to date. Up to this point, large language models have been exclusive to big tech. In an attempt to allow other researchers access to these models, Meta release the open Pre-Train model and Hugging Face when one step further with BLOOM sharing data sets, weights, and checkpoints to anyone who is interested. If you haven't had enough of transformers, I've got some more resources for you. If you want a hands on code-centric look at transformers where we train a model to do text classification using BERT, then check out my other course in the LinkedIn library. I hope you've enjoyed this course. Thanks for watching, and I'd love to hear back from you and to connect via LinkedIn.

Contents