adporn.net Computing ROUGE metrics for a set of summaries - Hugging Face Video Tutorial | LinkedIn Learning, formerly Lynda.com

LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.

Select Accept to consent or Reject to decline non-essential cookies for this use. You can update your choices at any time in your settings.

Start free trial Sign in

From the course: AI Text Summarization with Hugging Face

Computing ROUGE metrics for a set of summaries - Hugging Face Tutorial

From the course: AI Text Summarization with Hugging Face

Computing ROUGE metrics for a set of summaries

“

Let's do the same thing we did with the previous Pegasus model. I'm going to take the first 50 articles from our BBC Summaries dataset and I'm going to generate summaries for all 50 articles using this Bart model. We'll append each candidate summary generated into this candidate summaries list. Once we have those, we'll compute the aggregate rouge scores for all 50 candidate summaries. This will give us a better picture of how this model performs versus the Pegasus model. Here are the aggregated results. And just for reference, I'm going to show you the results from the Pegasus model. Observe that the rouge1 score has improved to 0.39. It was previously 0.33. Rouge2 is 0.27 for the new model, it was 0.23 for the Pegasus model. The rougeL and rougeLsum scores have also improved. Overall, the Bart model works much better on our dataset compared with the Pegasus model. Let's get the unaggregated rouge scores for the individual text summaries that were generated and then let's get the indices of the summary with the best rougeLsum score and the worst rougeLsum score. The best candidate summary seems to be at Index 8 and the worst at Index 46. Now, in order to view the summaries versus the reference summaries, let's get both of them in a DataFrame, predicted summaries and reference summaries. Let's take a look at the summary with the best rougeLsum score. This was at Index 8. So I'm going to quickly update the indices here so that we look at the actual summary and the reference summary at Index 8, and a quick look tells you that yes, the summary seems very close to the original. You can see that there is a lot of overlap between words in the original and words in the summary. Let's look at the summary with the worst rougeLsum score. This was the one at Index 46. And here, just a glance tells you there is not a lot of overlap between words in the candidate summary and the words in the reference text.

Contents