adporn.net Creating text representations - Deep Learning: Getting Started Video Tutorial | LinkedIn Learning, formerly Lynda.com

LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.

Select Accept to consent or Reject to decline non-essential cookies for this use. You can update your choices at any time in your settings.

Start free trial Sign in

From the course: Deep Learning: Getting Started

Creating text representations

From the course: Deep Learning: Getting Started

Creating text representations

“

- [Instructor] Let us proceed to create text representations for spam data. Code for this preprocessing is available in section 5.2 of the notebook. Data in this example, is available in the CSV file, Spam.Classification.csv in the Exercise Files folder. We load this data into a pandas data frame and print its content to check it. We then separate the feature and target attributes into separate variables. Let's run this code. As we can see, the spam message has a lot of special characters and words that need to be cleaned. To perform the required pre-processing, we first create a custom tokenizer function. This function first splits the sentences into tokens using the tokenizer in nltk library. Then it filters for stopwords. Finally, it lemmatizes the words and returns them in a lemmatized array. We create a TfidfVectorizer model using the custom vectorizer. We build a model using the spam messages attribute, and also transform them into a TfidfVector. We the convert this vector into a numpy array. The feature variables are now ready for deep learning. For the target variable, we first converted into numeric values using a label encoder. This encoder provides encoding for two classes. Then we create a one-hot encoding vector using the keras.utils. The target variable is now ready. We print the size of the feature and target variables. We then split the dataset into training and test sets. Let's run this code now. The feature variables have 4,566 columns, and the target variable has two. We can now proceed to build the deep learning model.

Contents

- Extending your deep learning education
  
  37s