From the course: AI Text Summarization with Hugging Face

Summarizing text using the fine-tuned model - Hugging Face Tutorial

From the course: AI Text Summarization with Hugging Face

Summarizing text using the fine-tuned model

If you remember earlier on in this demo, we used an example article to generate a summary from the T5 model using zero-shot learning. This was the example text, an article about a Labrador retriever that was cloned. Now let's generate a summary for this article using our fine-tuned model. We'll use the Hugging Face pipeline object to instantiate this model, but we'll instantiate our fine-tuned model, not the original T5 Small model. Let's see how we do that. On lines 5 through 9, I instantiate the pipeline, I specify the task I want to perform summarization, I specify the model, but this time the model points to the model that we've just fine-tuned, cloud-user/cnn_news_summary_model_trained_ on_reduced_data. Now your model path is likely to be a little different because your username will be different, your username on Hugging Face. Again, truncation = True so that long sentences are truncated. Let's generate the summary text by invoking the summarizer. On the example text, make sure you add the prefix and let's see what the summary looks like. Here is the summary for the original article. The summary does capture the essence of the article, but whether it's better than the previous summary we got using zero-shot learning, well, it's hard to tell. We saw how we could use our fine-tuned pipeline directly, but we can also access the tokenizer and model separately from our repository. Here, I use AutoTokenizer.from_pretrained and I point to the model and tokenizer that we saved out on our repository. Then on line 5, I pass the example text along with the prefix to the tokenizer, return tensors = "pt" will return tensors in the PyTorch format and then I access the input IDs. The input IDs stored in the inputs variable are the form in which the model accepts data. This is what will pass into our model. Next, we use AutoModelForSeq2SeqLM.from_pretrained to load our model in. Again, specify the path to the repository where the model was saved, cloud-user/cnn_news_summary_model_trained_ on_reduced_data. We then get predictions from the model by calling model.generate and parse in the input tokens, that is, in the inputs variable. Model.generate will generate new text using the input text that we have specified. max_new_tokens are set to 100, which means the maximum number of tokens generated by this model will be 100. do_sample = False means the model should not use sampling while generating text. It will use an algorithm called greedy decoding. While generating word sequences, greedy decoding will select the word that has the highest probability, it will act greedily. The outputs variable that contains the predictions from our model is in the form of tokens, not in the human-readable format. So we need to call tokenizer.decode to actually decode those tokens and get the resulting summary as a string. And here we get the same summary that we got earlier when we used our fine-tuned model in a pipeline directly. We can now compute rouge metrics on summaries generated by our fine-tuned model. Let's look at the reference text for our example article first. So this is the reference summary for the article on the cloned Labrador. Now let's compute the rouge score for the summary generated by our fine-tuned model by comparing its summary with this reference text. Based on the rouge scores here, this summary is much better than the summary we got using zero-shot learning. When we use the pre-trained model directly to generate a summary, you can see the older scores here at the bottom of your screen and the current scores. Notice the rouge1 score is now 0.25, it was just 0.13 earlier. Also, notice the rougeL score is 0.153 now in the fine-tuned model and previously the rougeL score was 0.07. So clearly this model generates summaries that are far better than when we use just zero-shot learning. Rather than evaluate a fine-tuned model on just one article, let's generate summaries for the first 50 articles in our test dataset. We do this using a for loop in exactly the same way as before, and we'll compare this with our original pre-trained model. Let's compute the aggregated rouge scores on the 50 summaries that were generated by our fine-tuned model, and we'll compare these results with the model where we did not perform any fine-tuning. So here are the rouge scores. Let's compare them with the previous rouge scores that we received, and you can see these rouge scores are improved. Previously, the average rouge score for the first 50 summaries was 0.324, now it's 0.342. Rouge2 score went from 0.139 to 0.157. And you can see that both the rougeL and rougeLsum scores have also improved. Our fine-tuned model, even with just three epochs of training on a small training set is clearly better. Let's follow some steps that we've seen before. I'll get the unaggregated rouge scores for each of the 50 articles that were summarized by this model. Now that I have the unaggregated rouge scores, I'm going to extract the rougeLsum and compute the summary with the best rougeLsum score and the worst rougeLsum score. So the summary at Index 35 was the best and Index 3 was the worst. Let's now get a DataFrame of the predicted summaries from our fine-tuned model and the reference summaries from the actual dataset. Now that we have the predicted and actual summaries in this DataFrame format, we can compare the summary at Index 35 with the reference summary. Remember, this is the summary with the best rougeLsum score. And a quick glance at the actual summary as compared to the reference summary seems to indicate that the actual summary is pretty good. Now let's look at the worst index, Index 3. Let's look at the actual summary versus the reference summary, this had the worst rougeLsum score and you can see that the summaries are indeed very different. The model did not do a very good job with this one.

Contents