From the course: Artificial Intelligence Foundations: Neural Networks
Data checks and data preparation
From the course: Artificial Intelligence Foundations: Neural Networks
Data checks and data preparation
- [Instructor] In this video, we load and check the data, then perform exploratory data analysis. Here we use a Jupyter Notebook to run our code. A Jupyter Notebook lets you run Python, an open source programming language. You'll find out more in the challenge instructions where you'll only need to focus on building the neural network, which is writing six lines of code to train your network. So let's assume you've already imported the necessary libraries and modules in Python. The first step is to load the data. Here, the dataset is being pulled from the course's GitHub repo. Once the data is pulled in, you can check the first 10 lines. Note that the last column has the target that we would like to predict, which is sales. Shown here is the .info method, which gives you information on your dataset. The information contains the number of columns, column labels, column data types, memory usage, range index, and the number of cells in each column. Shown here is the describe method, which computes and displays summary statistics for the dataset. Shown here is the .shape method, which gives you the number of rows and columns in the dataset. Here, 1,200 rows because the count starts at zero and you also have five columns. Shown here is the .isnull method, which shows you any null values in the dataset. Shown here is a heat map or correlation matrix. This visual shows how each feature is correlated to another. You would use this matrix to see how sales correlates with the other four features. Shown here is another heat map that clearly shows the top two features that have a relationship with sales, which is TV and radio. And again, just another way of seeing very quickly, using a gradient color scheme, how each media channel impacts sales. This is the same data and table form. Shown here are scatterplots that show the relationships between digital, TV, newspaper, and radio to sales.