Basics of Artificial Intelligence – VIII

Neural Networks are an incredibly useful method for teaching computers how to recognize complex relationships in data. However, in order to get them working properly, you need to know a little more about how they work and how to tune them. This week, we’ll be looking at the two key settings for Neural Networks in scikit-learn.

What Is a Neural Network?

But before we go into those settings, it’s useful to understand what a neural network is. Neural networks are an attempt at modeling computer intelligence on how the human brain works. In our brains, neurons receive electrical impulses from other neurons and, optionally, transmit impulses to other neurons. From there, the process continues with those neurons again deciding how to act on the signal from the previous neuron. Our brains have an estimated 100 billion neurons, all connected to the network to receive and process data.

In the computer, this same idea is replicated with the Neural Network. The inputs values for the network form the first layer of neurons in the artificial brain. From there, one or more hidden layers are created connecting the inputs from the previous stage. Finally, one or more output neurons provide the user with the answer from the digital brain. Of course, this assumes the network has been trained to identify the data.

So, for the developer, the first step to creating the neural network is to determine the number of layers for the network and the number of neurons in each layer. Next, the developer will select from a group of ‘activation functions’ that will define when the neuron fires. The available options are the logistic sigmoid function (logistic), the hyperbolic tan function (tahn) and the rectified linear unit function (relu). Various other parameters can also be set to further tune the network.

Back to the Code

# Create a Neural Network (AKA Multilayer Perceptron or MLP)
# In this example, we will create 3 hidden layers
# The first layer has 512 neurons, the second 128 and the third 16
# Use the rectified linear unit function for activation (RELU)
# For training, iterate no more than 5000000 times
mlp = MLPClassifier( 
    hidden_layer_sizes=(512,128,16)
    activation='relu',
    max_iter=5000000
)

You can see in the above code that we are going to try with 3 layers in the network. This is simply a guess, and we will want to repeatedly attempt different network configurations until we come upon a model that performs to the required specifications.

# Train the neural network
mlp.fit(X_train,y_train)

# Get metrics 
train_metric = mlp.score(X_train, y_train)
test_metric = mlp.score(X_test, y_test)

pred = mlp.predict(X_test)
recall_metric = recall_score(y_test, pred)
precision_metric = precision_score(y_test, pred)

With the above code, we can retrieve scores indicating how well the model did. With a perfect network, all values would be 1 – meaning they were 100% accurate. However, this is rarely the case with actual data. So, you will need to determine what level of accuracy is required. For some networks, 80% may be the limit.

Armed with this information, you should now be able to repeatedly train your network until you have the desired output. With a large dataset, and a large number of configurations, that may take a substantial amount of time. In fact, the training and testing part of AI development is by far the most time consuming.

What’s Next?

Next week, we will look at the final part of developing an AI solution – the Confusion Matrix. This chart will give us a better understanding of how our network is performing than the simple metrics we calculated above.

Basics of Artificial Intelligence – VII

Last week, we used Python libraries to import the data, set the input and out values for the computer to learn, and split the data into groups. Next, we will actually train the computer to learn the relationships. For this, we can use a variety of different tools. While each one has its pros and cons, the novice can simply run each one and determine which one provides the best results. In addition, we will print the results for analysis.

Logistic Regression

# train the model
logreg = LogisticRegression()
logreg.fit(X_train, y_train)

# print accuracy
train_metric = logreg.score(X_train, y_train)
test_metric = logreg.score(X_test, y_test)
print('Accuracy of Logistic regression classifier on training set: {:.2f}'.format(train_metric))
print('Accuracy of Logistic regression classifier on test set: {:.2f}'.format(test_metric))

# print recall
pred = logreg.predict(X_test)
recall_metric = recall_score(y_test, pred, average=recall_average)
precision_metric = precision_score(y_test, pred, average=recall_average)
print('Recall of Logistic regression classifier on test set: {:.2f}'.format(recall_metric))
print('Precision of Logistic regression classifier on test set: {:.2f}'.format(precision_metric))

Decision Tree Classifier

# train the model
clf = DecisionTreeClassifier().fit(X_train, y_train)

# print overall accuracy
train_metric = clf.score(X_train, y_train)
test_metric = clf.score(X_test, y_test)
print('Accuracy of Decision Tree classifier on training set: {:.2f}'.format(test_metric))
print('Accuracy of Decision Tree classifier on test set: {:.2f}'.format(train_metric))

# print recall/precision
pred = clf.predict(X_test)
recall_metric = recall_score(y_test, pred, average=recall_average)
precision_metric = precision_score(y_test, pred, average=recall_average)
print('Recall of Decision Tree classifier on test set: {:.2f}'.format(recall_metric))
print('Precision of Decision Tree classifier on test set: {:.2f}'.format(precision_metric))

Linear Discriminant Analysis

# train the model
lda = LinearDiscriminantAnalysis()
lda.fit(X_train, y_train)

# print overall accuracy
train_metric = lda.score(X_train, y_train)
test_metric = lda.score(X_test, y_test)
print('Accuracy of LDA classifier on training set: {:.2f}'.format(train_metric))
print('Accuracy of LDA classifier on test set: {:.2f}'.format(test_metric))

# print recal
pred = lda.predict(X_test)
recall_metric = recall_score(y_test, pred, average=recall_average)
precision_metric = precision_score(y_test, pred, average=recall_average)
print('Recall of LDA classifier on test set: {:.2f}'.format(recall_metric))
print('Precision of LDA classifier on test set: {:.2f}'.format(precision_metric))

Neural Network

# activation - ‘identity’, ‘logistic’, ‘tanh’, ‘relu’

mlp = MLPClassifier( 
    hidden_layer_sizes=(512,768,1024,512,128,16)
    activation='relu',
    learning_rate='adaptive',
    max_iter=5000000
)

mlp.fit(X_train,y_train)

# print overall accuracy
train_metric = mlp.score(X_train, y_train)
test_metric = mlp.score(X_test, y_test)
print('Accuracy of Neural Network classifier on training set: {:.2f}'.format(train_metric))
print('Accuracy of Neural Network classifier on test set: {:.2f}'.format(test_metric))

# print recall
pred = mlp.predict(X_test)
recall_metric = recall_score(y_test, pred, average=recall_average)
precision_metric = precision_score(y_test, pred, average=recall_average)
print('Recall of Neural Network classifier on test set: {:.2f}'.format(recall_metric))
print('Precision of Neural Network classifier on test set: {:.2f}'.format(precision_metric))

What We did

You will notice that much of the code above is very similar. This is part of what makes Scikit-Learn such an amazing framework – it’s relatively easy to change between Artificial Intelligence algorithms. In addition to the above algorithms, you can also use Support Vector Machines, Naive Bayes, K-Nearest Neighbor, and many more.

Once you’ve run the training, the scores show how each algorithm performed after it was trained. On any given data set, a different algorithm may work better. This is another benefit to Scikit-Learn – the easy access to a variety of models allows for experimentation to find the best model.

What Next?

While much of underlying math for these algorithms is well outside the scope of knowledge for most, it is useful to understand how Neural Networks operate. They are one of the more interesting implements of AI, and can be tuned to work with lots of data. However, that tuning requires some knowledge of what a Neural Network is and how it works. That’s what we’ll look at next week.

Getting an IT Job Without a Degree

I frequently talk to high school students or young adults who are hoping to land a lucrative IT job without a degree. Unfortunately, few of these individuals have the skills necessary to get the job they want. While many high schools now offer an increasing number of computer courses, rarely do they provide the depth or breadth of skills required by employers. However, this does not mean you need a degree to work in IT. In fact, some of the best techies I know started their career without a degree.

If it is possible to get a job without a degree, how do you do it? First, it’s important to recognize that IT jobs are broadly divided into two groups – system management and software development. System management jobs involve the management of computer systems, networks, servers, and other computer hardware. Additionally, cybersecurity professionals fall into this category (although there is often some overlap with software development skills). Software development jobs include web developers, software engineers, mobile application developers, and a variety of other jobs focused on using computer code to create applications for users.

Information Technology Certifications

Typically, individuals with system management jobs have degrees in Information Technology Management. However, those without a degree can show their competence with a variety of tech certifications. Some of the most widely known certifications are from the Computing Technology Industry Association better known as CompTIA. This includes CompTIA’s most well known certification for desktop maintenance and support – A+ certification. However, CompTIA offers a variety of other entry-level certifications as well. Network+ certification shows competency with network management and Security+ demonstrates basic security knowledge.

In addition to CompTIA certifications, a variety of other organizations provide IT certifications such as Cisco’s CCNA, Amazon’s AWS Certified Solutions Architect, and Google’s Associate Cloud Engineer. These certifications – unlike those from CompTIA – are vendor specific. However, the skills these certifications demonstrate are highly valuable to businesses.

Software Development Projects

Software developers typically have a bachelor’s degree in Computer Science. And, while there are some certifications available for programmers, they are not as widely desired as those for IT management. As such, it’s more difficult to demonstrate programming skills to a potential employer. However, this can be overcome by providing sample code on GitHub or BitBucket. If you want a job as a developer, spend some time creating professional-quality software applications that demonstrate your knowledge. Then, ensure to include a link to your repository in your resume. While you learn to code, don’t neglect learning SQL, HTML, and JavaScript. During the last decade, these skills have become standard for nearly all software development jobs.

I’ve talked to many young men who would like to become game developers. For them, I would recommend you consider your background in math and physics first. While there are libraries that make game programming easier, it’s hard to get far without some knowledge of matrix manipulation, trigonometry, gravity, and other topics that require a solid background in math and science.

Conclusion

While most people enter the IT world with a bachelor’s degree, it is possible to find good jobs without a formal education. If you want to work in the system management field, focus on certifications. If you want to work in software, focus on projects you can demo to show your ability. While either of the above will require effort, there really are no shortcuts in the IT world. Furthermore, if you are expecting an employer to pay you the high salaries common to the IT world, your efforts will be well compensated.

Basics of Artificial Intelligence – VI

Last week, we looked at languages used for artificial intelligence development. While there are numerous options available, Python has some of the best tools and is the easiest for the beginner to get started with quickly. However, setup can be quite a bit of work. First, setup Python and a development environment – I strongly recommend Jupyter, but VS Code is ok too. Next, begin installing all the necessary libraries – numpy, pandas, and sklearn. You may also wish to install matplotlib and seaborn. When you’ve got all the libraries installed, you can create a block of code in Jupyter to include all the necessary imports in your project such as what I have below. Some of these libraries are large, so you can prune the list to include only the tools you need.

Of particular interest are the sklearn modules. In this section, you will see imports for a variety of different AI algorithms including logistic regression, decision trees, nearest neighbors, linear discriminant analysis, naïve Bayes, and neural networks. These libraries will do the bulk of the work for us with little effort.

Import Libraries

import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
from matplotlib import cm
import seaborn as sns
import pandas as pd
import patsy

import itertools as it
import collections as co
import functools as ft
import os.path as osp

import glob
import textwrap

from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.neural_network import MLPClassifier
from sklearn.mixture import GaussianMixture
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn.ensemble import VotingClassifier, AdaBoostClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.pipeline import make_pipeline
from sklearn.ensemble import BaggingClassifier
from sklearn.svm import SVC
from sklearn.metrics import classification_report
from sklearn.metrics import precision_score, recall_score
from sklearn.metrics import f1_score, accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import plot_confusion_matrix

Load Data

The next step for any AI project is to import the data and manipulate as needed

# import the data file from CSV format
data = pd.read_csv(open("data.csv", "rb"))

# show the number of records
recordCount = len(data.index)
print("Record Count: {:d}".format(recordCount))

# optional removal of data 
# this will remove all records with a FIELD_VALUE for FIELD_NAME
# data = data.drop(data[data.FIELD_NAME == 'FIELD_VALUE'].index)

# add optional flags for processing
# add a boolean field of true where COLUMN_NAME = VALUE
data.insert(loc=0, column='COLUMN_NAME', value=(data.mood == 'VALUE'))

# show the new record count
newCount = len(data.index)
print("Filtered Count: {:d}".format(recordCount - newCount))

Set Prediction Field & Input Fields

Now that you have loaded the data and manipulated as necessary, it’s time to setup the information for prediction. That will consist of two parts – the field to predict and the values to use for the prediction. So, if I want to determine the value of a house, the prediction value would be the cost and the input fields would include square footage, yard size, number of rooms, etc. In the code snippet below, I will set the fields for predicting home price.

# CSV field to predict
predictionField = 'home_value'

# CSV fields to use for prediction
feature_names = ['square_footage', 'yard_size', 'num_room', 'num_bath']

# extract data into feature set and prediction value (X,y)
X = data[feature_names]
y = data[predictionField]

Split Into Groups

The next important step is to split the data into two groups – training data and test data. The training data will be used by the AI algorithm to ‘learn’ the data. Then, the test data is used to see how well the algorithm actually did in learning the data relationships.

# split into groups
X_train, X_test, y_train, y_test = train_test_split(X, y)

# scale data
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Next Steps

So far, we have loaded the necessary libraries, loaded the data, updated the data to exclude any records we don’t ant, added fields as necessary to augment the data, separated the data into features and prediction fields, and broke the data into groups for training. The next step is where the magic happens – the artificial intelligence algorithm. We’ll look at that next week…

Crypto Currency Problems

Even with the recent decline in cryptocurrency prices, enthusiasm remains high among blockchain supporters. However, after more than a decade, several key problems still remain before wide-spread adoption can be expected.

Investment / Currency Dilemma

The first problem is the investment/currency dilemma. Blockchain evangelists repeatedly tell us what an amazing investment crypto currencies are. Then, they tell about how crypto is replacing fiat currencies. Unfortunately, however, it’s not possible to be both an investment and a currency. Why? Because the two are – for the most part – mutually exclusive. Investments require volatility – something we see in abundance with crypto currencies. However, an actual currency requires stability. Nobody wants to be paid for work done this month at a wage that is wildly fluctuating. So, we need to decide which it is – a currency or an investment.

While some currencies – known as stablecoins – strive to maintain a 1-to-1 relationship with the dollar, this seems to fly in the face of the argument that fiat currencies should be replaced with crypto currencies. While these stablecoins may work great for purchasing goods and services, why not simply use the dollar instead and save yourself the transaction costs?

Energy Consumption

I have previously blogged about the crypto energy issue. In short, crypto currencies consume massive amounts of electricity while many around the globe are arguing that we need to reduce energy usage to prevent climate change. However, even if you reject climate change; it’s no secret that many places around the world suffer from energy shortages. Even in the US, brownouts are not uncommon in many communities on the hottest days of summer. Is it really reasonable to consume massive quantities of energy to create digital money?

Cyber Terrorism

Crypto has a long history of being used for money laundering, drugs, hacking, and other nefarious uses. While many will argue that this represents only a very small portion of the crypto market, it none-the-less is a real concern that the crypto community needs to address. This is particularly obvious with the growth of ransomware demanding payment in Bitcoin. Regardless of the actually percentage of illicit usage, it still reflects poorly on crypto currencies and will cause increasing oversight by government entities which could negatively impact the crypto markets and long-term viability of blockchain technologies.

Quantum Computing

Nobody seems to talk about it much, but quantum computing could unravel the entire blockchain in the blink of an eye. Since crypto currencies depend on encryption, it is absolutely essential that the encryption algorithms used be unhackable. Given the history of encryption protocols, that seems unlikely. However, it becomes even more unlikely when quantum computing enters the mainstream. While it may be years off, the introduction of a large quantum computer would allow the owner to rewrite the entire blockchain by simply having the majority of computing power on the internet – something not unreasonable with a modest quantum computer. This would rapidly shift financial power into the hands of a single individual.

Conclusion

While I continue to hear people talk about all the great things crypto currencies have to offer, few are interested in addressing the issues that will either prevent widespread adoption or create growing threats to commerce moving forward. If, indeed, nobody is interested in resolving these issues to support the long-term growth of blockchain technologies, then doesn’t it support the notion that this is really nothing more than a Ponzi scheme?