From the course: AI Text Summarization with Hugging Face

Extractive text summarization - Hugging Face Tutorial

From the course: AI Text Summarization with Hugging Face

Extractive text summarization

In this movie, we'll discuss the basics of extractive text summarization. If you remember, we discussed that summarization models can be categorized based on the kind of output that they generate. They can be extractive models or abstractive models. Extractive models do not produce any new text when they generate the summary. They simply identify important sections of the original text and generate those sections verbatim. Extractive models work exactly as their name suggests. They depend only on the extraction of sentences from the original content. All extractive summarizers perform three basic tasks: They construct an intermediate representation of the input text which tries to capture the main aspects of the text. They then score the sentences based on the intermediate representation. And finally, once scores have been generated, they select a summary comprising of a number of sentences which have the highest scores. As you might imagine, there are several different techniques that can be used to generate the intermediate representation of the input text. Now, this representation is used to find important portions of the text and summaries are generated based on this representation. In the next movie, we'll discuss the different ways in which intermediate representations can be generated from input text. The intermediate representation can be divided into two broad categories; topic representation and indicator representation. The topic representation tries to identify the important topics in the original text and then to generate a summary which includes those topics. The indicator representation represents every sentence as a list of features of importance and then uses that to generate a summary. Once we have the intermediate representation of the content, a sentence score is assigned to every sentence based on its importance. There are, of course, different techniques that you can use to assign this importance score to each sentence. Once every sentence has an importance score, the algorithm will go in and select the top-k most important sentences to generate the summary. And there are different techniques that can be used for this as well. You can use the greedy approach where you pick important sentences, where the importance score is above a certain threshold, or the summarizer can use some kind of optimization techniques so that the sentences that have been selected maximize the overall importance and coherency and minimize the redundancy.

Contents