Basics of Artificial Intelligence – VIII

Neural Networks are an incredibly useful method for teaching computers how to recognize complex relationships in data. However, in order to get them working properly, you need to know a little more about how they work and how to tune them. This week, we’ll be looking at the two key settings for Neural Networks in scikit-learn.

What Is a Neural Network?

But before we go into those settings, it’s useful to understand what a neural network is. Neural networks are an attempt at modeling computer intelligence on how the human brain works. In our brains, neurons receive electrical impulses from other neurons and, optionally, transmit impulses to other neurons. From there, the process continues with those neurons again deciding how to act on the signal from the previous neuron. Our brains have an estimated 100 billion neurons, all connected to the network to receive and process data.

In the computer, this same idea is replicated with the Neural Network. The inputs values for the network form the first layer of neurons in the artificial brain. From there, one or more hidden layers are created connecting the inputs from the previous stage. Finally, one or more output neurons provide the user with the answer from the digital brain. Of course, this assumes the network has been trained to identify the data.

So, for the developer, the first step to creating the neural network is to determine the number of layers for the network and the number of neurons in each layer. Next, the developer will select from a group of ‘activation functions’ that will define when the neuron fires. The available options are the logistic sigmoid function (logistic), the hyperbolic tan function (tahn) and the rectified linear unit function (relu). Various other parameters can also be set to further tune the network.

Back to the Code

# Create a Neural Network (AKA Multilayer Perceptron or MLP)
# In this example, we will create 3 hidden layers
# The first layer has 512 neurons, the second 128 and the third 16
# Use the rectified linear unit function for activation (RELU)
# For training, iterate no more than 5000000 times
mlp = MLPClassifier( 
    hidden_layer_sizes=(512,128,16)
    activation='relu',
    max_iter=5000000
)

You can see in the above code that we are going to try with 3 layers in the network. This is simply a guess, and we will want to repeatedly attempt different network configurations until we come upon a model that performs to the required specifications.

# Train the neural network
mlp.fit(X_train,y_train)

# Get metrics 
train_metric = mlp.score(X_train, y_train)
test_metric = mlp.score(X_test, y_test)

pred = mlp.predict(X_test)
recall_metric = recall_score(y_test, pred)
precision_metric = precision_score(y_test, pred)

With the above code, we can retrieve scores indicating how well the model did. With a perfect network, all values would be 1 – meaning they were 100% accurate. However, this is rarely the case with actual data. So, you will need to determine what level of accuracy is required. For some networks, 80% may be the limit.

Armed with this information, you should now be able to repeatedly train your network until you have the desired output. With a large dataset, and a large number of configurations, that may take a substantial amount of time. In fact, the training and testing part of AI development is by far the most time consuming.

What’s Next?

Next week, we will look at the final part of developing an AI solution – the Confusion Matrix. This chart will give us a better understanding of how our network is performing than the simple metrics we calculated above.

Leave a Reply