From the course: AI Text Summarization with Hugging Face

Sequence-to-sequence models - Hugging Face Tutorial

From the course: AI Text Summarization with Hugging Face

Sequence-to-sequence models

Deep learning models that perform natural language processing can be of many different types, but the kind of language model that we'll use for text summarization are typically sequence-to-sequence models. Let's consider the essence of a summarization task. A model takes in the original text as input and produces another text at the output, which is much shorter and contains the important concepts from the original text. And this is where the term sequence-to-sequence comes from. We start with a sequence that is the input text, and we produce another sequence that is the output text. Let's say you were using a language model for sentiment analysis, that's an example of a sequence-to-vector model. It takes in an input text that's a sequence and produces a class or a category for the sentiment, that is, sequence-to-vector. We'll keep our focus here on text summarization, which is an NLP task that involves generating a new, shorter sequence from an original longer sequence. And this involves the use of sequence-to-sequence models. In order to get a big picture understanding of how sequence-to-sequence models work, let's take a slightly different example of a sequence-to-sequence model, a model for language translation. It's pretty clear why this is considered to be a sequence-to-sequence model, you feed in an input sequence in one language as an input and you get the translation in another language as the output. Sequence-to-sequence models are generally made up of two separate components referred to as an encoder and decoder. The encoder works on a sequential input and the decoder produces a sequential output. The encoder and the decoder have their own roles to play in different deep learning models. And in fact, there are models that use encoding alone, decoding alone, but sequence-to-sequence models use them both together. An encoder is responsible for learning a representation of the input sentence that you feed in no matter what the sentence, it will try to represent it in some way. And it's this representation that the decoder uses to produce a translation. The encoder is usually a recurrent neural network where you feed in the input words in a sequence, one word at each time instance. So here, the word "I" will be fed in at the first time instance, "ate" at the second time instance, "an" at the third time instance and so on. Just a heads up that in a sequence to sequence model, it's not necessarily the case that the input is fed in as a unit directional sequence, as you see here on screen. Different models work in different ways, but this is a good representation of the overall picture. At the end of the input sequence, you usually feed in a special token, the eos token indicating that the sequence has terminated. The representation of the entire input sentence is captured in the hidden activation of the last layer of the neural network. You can say that this has all of the information contained in the input sequence. It is this representation that is then fed into a second recurrent network, the decoder, which is then responsible for generating the translated sentence. The decoder will make use of this hidden representation to generate the translation. So here is the hidden representation. The decoder will use that as an input and produce a word at each time instance. The word produced at each time instance becomes the input at the next time instance and so on, till the entire sentence has been translated. The translation continues till an eos token is produced by the decoder indicating that the output has terminated. Notice that the decoder starts off with the hidden state obtained from the encoder and then generates words in sequence and uses the previous word in the sequence to produce the next word in the sequence. This is a very high-level overview of how encoder-decoder sequence-to-sequence models work. There are, of course, lots of nitty-gritty details and lots of improvements that have been made to produce the amazing models that you see today.

Contents