From the course: AI Text Summarization with Hugging Face
Generating summaries using BART - Hugging Face Tutorial
From the course: AI Text Summarization with Hugging Face
Generating summaries using BART
We are now ready to use the last of the transformer models that we'll work with in this course, the Bart model for text summarization. The BartTokenizer will allow us to load the tokenizer used by Bart and BartForConditionalGeneration will allow us to load the pretrained model. Now, the path to the Bart model is here as you see on screen, facebook/bart-large-cnn. The Bart model is a sequence-to-sequence encoder-decoder model with a bidirectional encoder. Bidirectional encoders were a huge improvement over directional encoders. Directional encoders read the input text sequentially from left to right or right to left. Bidirectional encoders, on the other hand, read the entire sequence of words at once. Even though we call it bidirectional, it's more non-directional. Bart also has an autoregressive decoder, which means the decoder generates text by looking at what text was generated in the previous timesteps of the decoder. This particular checkpoint of the Bart model has been fine-tuned on the CNN Daily Mail dataset, the dataset that we worked with in one of our previous demos, which means it should work very well for summarizing news articles. I've loaded in both the tokenizer as well as the model. Let's print out the contents of the tokenizer just to get an overview of what it's all about. You can see that its vocab size is smaller, 50265. The model max length is 1024, that's how many tokens it will generate for the summary. Let's take a look at the model itself. Let's print out the string representation of the model to see what the layers look like. You can see it has an embedding layer that's the shared layer. It has an encoder, that's the Bart encoder. And if you scroll down below, it has a decoder layer as well. Let's first generate a summary of the example text that we had set up earlier. If you remember, this is an article about movies from 2004. At this point, you should be completely comfortable instantiating the pipelines for your transformer models and then using it to get predictions. I have a pipeline for summarization here. The model is the Bart model, truncation = True. I invoked the summarizer on the example text. Note that I do not need to specify a prefix for this model, this model has been explicitly created for the purpose of summarization. Now, when I look at the summary generated by this model, I can tell at a glance that it's much better than the summary we got from the Pegasus model for the same text. Let's see if the summary is better using objective measures. Let's take a look at the reference summary that we get from the dataset and let's compute the rouge score by comparing the candidate summary from the model with the reference summary from the dataset. And here you can see rouge1 is 0.57. With the Pegasus model, our rouge1 score was around 0.3334, whereas here is 0.57. Rouge2 is also quite high, 0.55. All of the rouge metrics seem to indicate that this particular summarizer is far better, at least, for this article that we've chosen than the Pegasus model that we used previously.
Contents
-
-
-
-
-
-
-
-
-
Accessing the BBC dataset on Google Drive3m 34s
-
Instantiating and cleaning the BBC News summaries dataset3m 48s
-
Generating summaries using Pegasus4m 55s
-
Generating multiple summaries and computing aggregate ROUGE scores2m 49s
-
Generating summaries using BART3m 19s
-
Computing ROUGE metrics for a set of summaries2m 9s
-
-