From the course: NLP with Python for Machine Learning Essential Training
Unlock the full course today
Join today to access over 24,400 courses taught by industry experts.
Using lemmatizing - Python Tutorial
From the course: NLP with Python for Machine Learning Essential Training
Using lemmatizing
- [Instructor] Now that we've learned what lemmatizing means, we're going to put it to use. So we'll do this in two steps. First, we're going to test out the lemmatizer on specific words to understand how it works and then we'll apply it on the SMS Spam Collection Data Set to further clean it up. So the same process that we saw on the stemming notebook. Just like we saw with stemmers, there are a few different lemmatizers as well that handle words in slightly different ways. So we're going to use the WordNet lemmatizer. This is probably the most popular lemmatizer. WordNet is a collection of nouns, verbs, adjective and adverbs that are grouped together in sets of synonyms, each expressing a distinct concept. This lemmatizer runs off of this corpus of synonyms, so given a word, it will track that word to its synonyms, and then the distinct concept that that group of words represents. You can read more about it at the WordNet website right here. We're going to import the nltk package…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.