From the course: Artificial Intelligence Foundations: Neural Networks
Train the neural network using Keras
From the course: Artificial Intelligence Foundations: Neural Networks
Train the neural network using Keras
- [Instructor] Since we will be using the Keras Sequential model, we merely need to describe the layers and sequence. model equal Sequential means that we will store our model in the variable model, and we'll describe it sequentially layer by layer in between the square brackets. Our first layer is a Dense layer with four neurons and relu activation. Defining the input shape or the number of inputs is optional, so we did not add it. Recall that Dense refers to a fully connected layer. The second layer is a Dense hidden layer with three neurons and relu activation. Note that we do not have to describe the input shape since Keras can infer from the output of our first layer. The third layer is a Dense output layer with one neuron, linear activation. And that's it. The model architecture is built. So now that we've built the model architecture, we need to configure the model by adding an optimization algorithm. Here, we use adam. For the loss function, we use mean squared error. Configuring the model with these settings requires us to call the function model.compile. Training on the data is pretty straightforward and requires us to write one line of code. The function is called fit as we are fitting the parameters to the data. You have to specify what data you are training on which is X_train and y_train. And then you specify what your validation_data is so that the model can tell you how it is doing on the validation data at each point. This function will output a history which you save under the variable history. You then specify the number of epochs or how many iterations you want the model to go through when training. And then for this simple network, we're just going to use 32. If you had a very large dataset, you would want to specify a batch size. Once you run the code in the cell, the model starts training. You can now see that the model is training. What we might want to do is to plot the training loss and the validation loss over the number of epochs passed. In the first cell of the Jupyter Notebook, you may have noticed this code which imports matplotlib that will help us create some really nice graphs of our results. Shown here is an example visualizing the training loss and the validation loss. The first two lines of the code says that we want to plot the loss and the validation loss. The third line specifies the title of this graph, Model loss. The fourth and fifth line tells us what the y and x axes should be labeled, respectively. The sixth line includes a legend of our graph, and the location of the legend will be in the upper right. And the seventh line shows the graph. Since the improvements in our model to the training set looks somewhat matched up with improvements to the validation set, it doesn't seem like overfitting is a huge problem in our model. Each curve shows that loss is decreasing through the iterations. To sum it up, you use matplotlib to visualize the training and validation loss over time to see if there's overfitting in our model. Overfitting is a common and serious problem because it affects the ability of our model to generalize a new unseen test data. Shown here is an image of an overfitting model. While a training curve is moving downward as loss decreases, the validation, our test data, is not. It is overfitting the unseen data. In this example, the model tries to fit the training data, even memorizing data patterns and any random fluctuations. Note here the peaks and values are similar shapes. We cover ways to mitigate overfitting in our next chapter. After training, the model outputs a list of predicted values from the test set. You can print those test predictions, shown here on the left. You can also show the true value and the predicted value, shown here on the right. Notice that the two red boxes highlight the same numbers for the predicted values. You can also visualize the true and predicted values graphically, as shown here. Let's evaluate our model. First, note the code circled in red. Here, we take the square root of the mean squared error by using the NumPy square root function. This is a fast and efficient way to calculate the square root of an array or a single value in Python. Recall that the square root of the mean squared error gives us the root mean squared error metric. Our model is achieving a stable performance with not much variance in the train and test set root mean squared error values where we have 4.67 versus 4.72.