From the course: Applied AI: Getting Started with Hugging Face Transformers

Unlock the full course today

Join today to access over 24,400 courses taught by industry experts.

The BERT Transformer

The BERT Transformer

- [Instructor] Having looked at the need for pre-trained transformers, let's quickly review some of the popular pre-trained transformer architectures available today. We start with BERT. BERT stands for Bidirectional Encoder Representations For Transformers. It is a pre-trained transformer architecture that was created by a team at Google and shared with the community. It is extremely popular and has inspired a number of use cases and variants. BERT uses only the encoder stack of the original transformer architecture. It has no decoder stack. Depending on the specific variant of BERT, the number of encoder layers may vary. For training and inference, a classical neural network layer and a softmax layer are added at the end to use the hidden state and predict the desired outcomes. This architecture is then trained using tasks like masked language modeling, and next sentence prediction. BERT results in very large…

Contents