adporn.net Generating summaries using BART - Hugging Face Video Tutorial | LinkedIn Learning, formerly Lynda.com

LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.

Select Accept to consent or Reject to decline non-essential cookies for this use. You can update your choices at any time in your settings.

Start free trial Sign in

From the course: AI Text Summarization with Hugging Face

Generating summaries using BART - Hugging Face Tutorial

From the course: AI Text Summarization with Hugging Face

Generating summaries using BART

“

We are now ready to use the last of the transformer models that we'll work with in this course, the Bart model for text summarization. The BartTokenizer will allow us to load the tokenizer used by Bart and BartForConditionalGeneration will allow us to load the pretrained model. Now, the path to the Bart model is here as you see on screen, facebook/bart-large-cnn. The Bart model is a sequence-to-sequence encoder-decoder model with a bidirectional encoder. Bidirectional encoders were a huge improvement over directional encoders. Directional encoders read the input text sequentially from left to right or right to left. Bidirectional encoders, on the other hand, read the entire sequence of words at once. Even though we call it bidirectional, it's more non-directional. Bart also has an autoregressive decoder, which means the decoder generates text by looking at what text was generated in the previous timesteps of the decoder. This particular checkpoint of the Bart model has been fine-tuned on the CNN Daily Mail dataset, the dataset that we worked with in one of our previous demos, which means it should work very well for summarizing news articles. I've loaded in both the tokenizer as well as the model. Let's print out the contents of the tokenizer just to get an overview of what it's all about. You can see that its vocab size is smaller, 50265. The model max length is 1024, that's how many tokens it will generate for the summary. Let's take a look at the model itself. Let's print out the string representation of the model to see what the layers look like. You can see it has an embedding layer that's the shared layer. It has an encoder, that's the Bart encoder. And if you scroll down below, it has a decoder layer as well. Let's first generate a summary of the example text that we had set up earlier. If you remember, this is an article about movies from 2004. At this point, you should be completely comfortable instantiating the pipelines for your transformer models and then using it to get predictions. I have a pipeline for summarization here. The model is the Bart model, truncation = True. I invoked the summarizer on the example text. Note that I do not need to specify a prefix for this model, this model has been explicitly created for the purpose of summarization. Now, when I look at the summary generated by this model, I can tell at a glance that it's much better than the summary we got from the Pegasus model for the same text. Let's see if the summary is better using objective measures. Let's take a look at the reference summary that we get from the dataset and let's compute the rouge score by comparing the candidate summary from the model with the reference summary from the dataset. And here you can see rouge1 is 0.57. With the Pegasus model, our rouge1 score was around 0.3334, whereas here is 0.57. Rouge2 is also quite high, 0.55. All of the rouge metrics seem to indicate that this particular summarizer is far better, at least, for this article that we've chosen than the Pegasus model that we used previously.

Contents