From the course: Applied AI: Getting Started with Hugging Face Transformers
Unlock the full course today
Join today to access over 24,400 courses taught by industry experts.
The BERT Transformer
From the course: Applied AI: Getting Started with Hugging Face Transformers
The BERT Transformer
- [Instructor] Having looked at the need for pre-trained transformers, let's quickly review some of the popular pre-trained transformer architectures available today. We start with BERT. BERT stands for Bidirectional Encoder Representations For Transformers. It is a pre-trained transformer architecture that was created by a team at Google and shared with the community. It is extremely popular and has inspired a number of use cases and variants. BERT uses only the encoder stack of the original transformer architecture. It has no decoder stack. Depending on the specific variant of BERT, the number of encoder layers may vary. For training and inference, a classical neural network layer and a softmax layer are added at the end to use the hidden state and predict the desired outcomes. This architecture is then trained using tasks like masked language modeling, and next sentence prediction. BERT results in very large…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.