From the course: AI Text Summarization with Hugging Face

The sumy library for extractive summarization - Hugging Face Tutorial

From the course: AI Text Summarization with Hugging Face

The sumy library for extractive summarization

In this demo, we'll see how you can perform extractive text summarization using Hugging Face. Now, extractive text summarization is a technique where we generate a summary of a longer text by selecting and extracting the most important sentences or phrases directly from the original text. So we do not generate new sentences, however, we identify and condense existing content that represents the key points using sentences from the original text. We've discussed that extractive text summarization can be performed using a variety of different techniques. Now, it turns out you can try to implement those techniques yourself, but they've already been implemented by someone out there who's interested in text summarization and has made those summarizers available to us here on GitHub. This is the GitHub repo for the Sumy module which allows automatic summarization of text content and HTML pages. Sumy is primarily available as a Python module. You should note that the author created Sumy as a part of his diploma thesis because he wanted to reduce the length of articles he was reading in the Czechoslovak language. However, Sumy grew from there, and it's now used by a variety of different projects. The Sumy library implements a number of different summarizers using extractive techniques. Sumy has documentation documenting all the summarizers that it implements, and it also works with a number of different languages. We'll, of course, work with text summarization in English. Now if you scroll a little further down under usage, you can see that Sumy is available on Hugging Face. You see the URL there, https://huggingface.co/spaces/issam9/sumy_space. This makes the Sumy library available on Hugging Face as a very simple app that we can just use. And that's how we'll be working with Sumy in this demo. Before we actually use the Sumy summarizer on Hugging Face, let's head over to the documentation and see what summarizers have been implemented here. As you scroll down and look at all of the summarization techniques, you can see that all of these are extractive summarizers. There is latent semantic analysis for summarization, LexRank, TextRank, KL-Sum, all of these are extractive techniques and we'll see how these techniques work once we head over to Hugging Face. Now if you head over to Spaces on Hugging Face, we've discussed earlier that this is where developers put up applications that use machine learning. Here in Spaces, simply search for Sumy. And the first hit here, Sumy_space is a very simple app that has a UI and behind the scenes, simply uses the Sumy library for extractive summarization. This app that you see hosted on Hugging Face performs extractive summarization using the Sumy library. The app itself is very straightforward. It's just a web frontend that allows you to pick the summarization method that you want to use, the language in which you're specifying your input text, the number of sentences you want in your summary, and essentially generates a summary based on the input that you provide. The input can be in the form of text or it can be a pointer to an HTML web page. Under the hood, this app uses the Sumy library to generate extractive summaries of your text.

Contents