From the course: AI Text Summarization with Hugging Face
Computing ROUGE metrics for a set of summaries - Hugging Face Tutorial
From the course: AI Text Summarization with Hugging Face
Computing ROUGE metrics for a set of summaries
Let's do the same thing we did with the previous Pegasus model. I'm going to take the first 50 articles from our BBC Summaries dataset and I'm going to generate summaries for all 50 articles using this Bart model. We'll append each candidate summary generated into this candidate summaries list. Once we have those, we'll compute the aggregate rouge scores for all 50 candidate summaries. This will give us a better picture of how this model performs versus the Pegasus model. Here are the aggregated results. And just for reference, I'm going to show you the results from the Pegasus model. Observe that the rouge1 score has improved to 0.39. It was previously 0.33. Rouge2 is 0.27 for the new model, it was 0.23 for the Pegasus model. The rougeL and rougeLsum scores have also improved. Overall, the Bart model works much better on our dataset compared with the Pegasus model. Let's get the unaggregated rouge scores for the individual text summaries that were generated and then let's get the indices of the summary with the best rougeLsum score and the worst rougeLsum score. The best candidate summary seems to be at Index 8 and the worst at Index 46. Now, in order to view the summaries versus the reference summaries, let's get both of them in a DataFrame, predicted summaries and reference summaries. Let's take a look at the summary with the best rougeLsum score. This was at Index 8. So I'm going to quickly update the indices here so that we look at the actual summary and the reference summary at Index 8, and a quick look tells you that yes, the summary seems very close to the original. You can see that there is a lot of overlap between words in the original and words in the summary. Let's look at the summary with the worst rougeLsum score. This was the one at Index 46. And here, just a glance tells you there is not a lot of overlap between words in the candidate summary and the words in the reference text.
Contents
-
-
-
-
-
-
-
-
-
Accessing the BBC dataset on Google Drive3m 34s
-
Instantiating and cleaning the BBC News summaries dataset3m 48s
-
Generating summaries using Pegasus4m 55s
-
Generating multiple summaries and computing aggregate ROUGE scores2m 49s
-
Generating summaries using BART3m 19s
-
Computing ROUGE metrics for a set of summaries2m 9s
-
-