From the course: Artificial Intelligence Foundations: Neural Networks
Recurrent neural networks (RNN)
From the course: Artificial Intelligence Foundations: Neural Networks
Recurrent neural networks (RNN)
- [Instructor] What are recurrent neural networks? Let's begin by describing a few use cases. Our brains could easily process and understand the words that comprise the sentences that would comprise the language used in this book. But getting a machine to understand the text and language in this book is a complicated task. Similarly, translating text from different languages is also a very complicated task for machines. This is a picture of an audio wave. Our brains and ears can easily detect these sounds and understand what is being transmitted. Getting a machine to detect these sounds is a difficult task. These images show time series data. Think of time series data as timestamped data, as a sequence of data points in order based on time. Sequential data is unstructured data as compared to structured data. This image shows examples of both unstructured and structured data. The use cases just presented are an example of sequential data because the points in the dataset are dependent on the other points in the dataset, where the order matters. Sequential data includes text streams, audio clips, video clips, speech recognition, time series data, et cetera. Different data types need to be processed differently in order to train machine learning models. In an earlier video, you saw that convolutional neural networks are used primarily for image or voice data And you learned that structured data, data in a table of rows and columns, can be fed into a multilayer perceptron and output predictions. But what do you do if you have qualitative sequential data with no predefined structure? And what if patterns in your data change with time? Earlier, we learned about the feed-forward neural network, but this type of network is not suitable for the use cases previously mentioned. Why is that? A feed-forward neural network allows information to flow only in the forward direction, from the input nodes, through the hidden layers, and to the output nodes. There are no cycles or loops in the network. Feed-forward networks cannot memorize previous inputs, cannot handle sequential data, and considers only the current input. To process data such as text, audio, and time series, a different network type is required, for information about the past data needs to be kept in memory to predict the next word, or sound, or timestamp in the sequence. The word recur means to go or come back again, to happen again, and in this case of an RNN, to loop back onto the previous layer. Think of this as a hidden layer that remembers information through the passage of time. The idea behind an RNN is to train a network by passing the training data through it in a sequence where each example is an ordered sequence. Once trained, new test data can be evaluated. For example, in order to determine the next letter in the word in the sequence shown in this training set, the previous word probability must be known. The word certification is a noun and is an entity of this sentence, and in these two examples, the word my precedes it. The probability that the letter C would follow the letter Y would be considered high. So just like feedforward and convolutional neural networks, recurrent neural networks utilize training data to learn. While traditional deep learning networks assume that inputs and outputs are independent of each other, the image shown on the left, the output of recurrent neural networks, depend on the prior elements within the sequence, the image shown on the right. There are five types of RNNs. The one-to-one is the simplest type of RNN, which allows a single input and a single output. It has fixed input and output sizes and acts as a traditional neural network. The one-to-many gives multiple outputs for a single input. It takes a fixed input size and gives a sequence of data outputs. AN example is music generation. In music generation models, RNN models are used to generate a music piece with multiple outputs from a single musical note, which is a single input. The many-to-one RNN type is used when multiple inputs are required to give a single output. An example is sentiment analysis, where any sentence is classified as expressing the positive or negative sentiment. An example is a movie rating, where reviews are text as input to provide a rating. In the many-to-many equal type, the number of both the input and output units are the same. An example is machine translation, where the RNN reads any sentence in English and it outputs to sentence in French. In the many-to-many unequal type, the number of both the input and output units are not the same. An example is video classification, where every frame of the video should be labeled. Because they're not the same. Traditional neural networks have independent input and output layers, which make them inefficient when dealing with sequential data. Recurrent neural networks are used in sequence models because they have the power to remember what it has learned in the past and apply it in future predictions. This allows it to be used in applications like pattern detection, speech and voice recognition, natural language processing, time series prediction, image captioning, and language translation.