From the course: Applied AI: Getting Started with Hugging Face Transformers
Challenges with building Transformers
From the course: Applied AI: Getting Started with Hugging Face Transformers
Challenges with building Transformers
- [Instructor] Machine learning technologies for NLP have grown leaps and bounds in the last few years. Transformers are the state of the art, but do they make building and serving transformers easy? Building transformer models from scratch is not the same as building classical machine learning models. Transformer models pose some unique challenges, and overcoming them is critical for building successful NLP applications. Let's begin with language modeling challenges. Irrespective of the specific application task, all NLP models need to represent human languages in some form. Human languages are complex in terms of how they are spoken and interpreted. While all languages have a syntax or grammar, general usage by humans may not follow them. There are semantic relationships between the words and the language, like synonyms, antonyms, and others. For example, if we have a word, king, what is its relationship with other words like queen, boy, and emperor? Unless they are modeled explicitly, it would be hard to interpret them correctly. Also, the same word may have different contexts based on where they're used. For example, the word file may mean a physical cardboard file, or a computer file. Also, file is a verb. Capturing all these relationships in a single model would usually result in a huge model with extensive training requirements. The next set of challenges is related to training the model itself. First, the training data sizes would be huge, partly because it is text data, as opposed to numeric data, and also because a large corpus of data is needed to capture all contexts and relationships. Labeling text data is also difficult and resource intensive. There are heavy pre-processing and cleansing requirements to prepare text data for machine learning. The models that come out for transformers are usually huge, sometimes a few gigabits in size. Compute requirements, like CPU, memory, and disk are significantly high for both training and inference. Transformer models typically need GPUs to train and predict. In general, NLP use cases with transformers are more expensive to develop and maintain than classical machine learning. Because of these challenges, building every transformer model from scratch is not cost-effective. But then, as all transformer models are trained on general language characteristics, it's possible to develop pre-trained models for them, Using pre-trained models and then customizing them for the specific task through transferred learning is more effective and less time-consuming. It is becoming popular. Hacking phase and its transformers library provides us with these pre-trained models, and we will explore how to use and customize them in the rest of the course.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.