Hello! Random-man reporting. So this is gonna be a quick write up on some basic neural network stuff, and I’ll probably divide it in 2 parts. First off, let me say that I’m not a professional computer scientist or mathematician. I want to head down that route when I get to the point in life, but for now, I’m just an amateur. Just a disclaimer . I’m learning this as I write on it.
What is a Neural Network?
A neural network is sort a man-made “emulation” of a real neural network, that is a subset of an actual brain. Now, if you asked ME whether or not a complex neural network would produce a real “intelligence”, that is, a being with true consciousness, I’d probably say no, at least for now. However, they could definitely be used to produce something along the same level as a plant or simple minded animal :D. That’s very useful.
So as we all know in life, not ALL problems are easy to solve with a computer. At least, the way to programmatically solve them isn’t always obvious. If we have a set of numbers, we know that sorting them is definitely a task for a computer, and there are many ways to do this. Of course this is obvious to all who have read some of @oaktree’s great stuff on sorting, and taken a look at Bogo sort, perhaps one of mankinds most beautiful creations. But ask a computer whether or not this is a picture of a dog.
Is there an obvious solution? Probably not. This is where neural networks come to use in our endeavors. The idea of a neural network is that we setup a system that will take some input. It will most likely produce an arbitrary outcome based on it’s initial conditions. Then the output is judged on a criteria of being our desired outcome. Based on this, the system will shift its conditions based on our feedback. So a classical computational problem may be something like…
Programmer: Hey, if I ask you for the arithmetic mean of a set of numbers, I want you the return the ratio of the sum of all the elements to the total cardinality, or number of elements in the set. Got it?
Computer: Yeah seems legit.
But let us now consider a different style of problem.
Programmer: Hey, I have a situtation that you can never truly understand because you are only a computer, but I need you to produce a correct outcome based on such a situation. Since you can’t “really” understand the situation, I’m gonna give you some input, you can first give me some random output, and I’ll tell you how correct you are based on that. That way, you’ll start to get a feel for what kinds of results come from different kinds of inputs.
Computer: …
So that’s a real simplified set of ideas to digest. Let’s take a look at a bit of theory to understand the structure of a simple neural network setup. With a neural network, we have three fundamental parts. We have the inputs, then we have a “black box” or “hidden layer” in the middle, where the input is processed, and then we have the output. This may as well describe any program though huh? Neural networks are special though.
Now let’s briefly look at an EXTREMELY simple neural net, one that produces a very basic outcome.
As you can see, we have three inputs, which are all connected to the one neuron in the middle layer. But notice how each input connection has a “weight” to it. The end outcome, y, is simply the sum of the product of the input values and their weights, well, actually, it’s probably a normalized result from an Activation function, but we’ll get there. Let’s write up a quick function for the simplest kind of neural network. One dubbed the “perceptron”. Let’s take a quick look first.
So, let’s take this progressively. The inputs are gonna be real numbers. Each is muliplied by it’s weight, then they are summed for the final outcome in the end layer. So, we can describe any outcome for a given set with this function, expressed in symbolic and code form. Where I stands for input and w is the weight for that input.
def perceptron(_in1, _in2, _w1, _w2):
return _in1*_w1+_in2*_w2
Now, let’s briefly generalize this situation and simplify for N inputs :).
def gen_perceptron(_in,_w):
ret = 0
for i in range(len(_in)):
ret = ret + _in[i]*_w[i]
return ret/abs(ret)
That way, we get either +/-1 as an answer. Now, taking this last binary value is called our “Activation function”. In this case, we have simplified it. In many other applications, though, the
activation function takes other forms. In this article, we are gonna do something a little different and talk about the sigmoid activation function…
Now, since we are just starting out in neural networks, we are not gonna tackle a true A.I. problem yet, for now, let’s tackle something simple and logical. We will use the techniques of neural networks, though. Now, since we have a basic understanding of some of these concepts, let’s try and build a neural network and train it for a simple problem I will describe soon. This is definitely not a problem that needs a neural net to solve, but let’s tackle it to practice our skills and get a better understanding of things . Sometimes, instead of trying to understand the theory and dwelve deeply, we may first want to take what we know, roll with it, and DO something with it. In future articles, I’ll definitely go into more thorough and detailed explanations of things. For now, we may just want to get used to some of the ideas of a neural network, and practice a little application of one, just to get our feet a little wet.
Now, in trying to make a neural network for our problem, we shall look at a new Activation function. That is, a sigmoid function. It takes the mathematical form of…
Why would we use a function like this? Well, I couldn’t tell you all of the reasons as I’m still learning this too, but I can tell you that sigmoid functions are great for squashing values. This function will take any x as input as toss it somewhere between 0 and 1. If you are unfamiliar with the “e”, that is Eulers number. It is a very special number, like pi, and trying to explain the significance of it here will bring us to quite a tangent. e is a transcendental number, it could never be a solution of a polynomial with rational coefficients, and it is widely known as a common exponential and logarithmic base. If you are not familiar with all of the mathematical terms, don’t worry, we will look at them as we encounter them more in the future. To solve the problem I will describe briefly, we are going to first train the neural network! We shall first randomly set its initial conditions, then we will give it input, let it guess, then give it output on whether or not it was correct. Based on this, it will adjust its weights to better optimize for the situation! This is gonna be an example of what is called “Back Propagation”. Now, let’s look at a goal or problem… We want to train the neural net to emulate the following truth table logic…
Input Input Input Output
0 0 1 0
1 1 1 1
1 0 1 1
0 1 1 0
What’s the secret sauce? It just takes the value in the left most column. So, we know we’re gonna need 3 inputs. They are gonna be weighted as well. Our gameplan is to take these three inputs, multiply each by their weights, pass the sum of those through the sigmoid function, evaluate the error, and adjust the weighted values. Now notice that since all our input values are either zero or one, the results that are being passed to the sigmoid are really just gonna be the sums of a subset of the weights. Let’s take a look at some of the mathematical operations we will partake in with respect to this particular problem.
- To get the value we will push through the sigmoid function:
Simple enough, we take the sum of the products of the weights and inputs.
- Sigmoid function to “normalize” the outputs of neuron:
(note- what I mean by normalization is that we are bringing the outputs to be between 0 and 1)
For the next formula, I’m just gonna refer to the Sigmoid function as S(x) for simplicity
Now, the S’(O) is the derivative of the sigmoid function taking the neural network output as an input. I will not go into the Calculus of the derivative, but I will briefly try to give an intution incase you are not familiar with the concept of a derivative. Consider a linear function, that is a line, like y = mx + b. Notice that the slope of the line remains the same at any point along the line. However, in a non-linear function, or a curve, the slope changes as we move along the function. So, the derivative is an algebraic formula we can use to find the slope on the curve at that particular part of the curve.
You might be asking, why the hell are we multiplying by that?!!? Well just chill out and put down the bat. I can explain. So take another look at the curve of the sigmoid function. Notice that the closer y-value is to either 0 or 1, the smaller the derivative, or slope at that point is. Makes sense, because if the value is really close to the right answer, which in this case is either 0 or 1, we don’t want to adjust the weight as much. We’re just using some of the mathematical forces of nature and of this function here to our advantage is all. The e-value, in this formula, is not Eulers number! In this formula, it is an error value, the difference between the result and the desired output.
Why multiply by the error value? Because we want to make our change to the weights proportional to the error, or how badly the neural net screwed up on that particular try. Now we also multiply by the Input value, which in this situation, is either a 1 or a 0. Why? I’m gonna be completely honest with you, I am not completely sure why. However, when/if I find that out, I will definitely try to explain it in part 2 of this.
So, the main idea is that we are gonna adjust the weights with a value that is proportional to the error. Which makes sense since we’re trying to optimize this baby little by little…
(note- In part 2, or in a future part, I will attempt to better explain the back propagation algorithm )
Whats left after this? Well, we gotta write this thing, and train it a bunch of times! I’m gonna wrap this post up so the two parts are in more digestible smaller chunks. I’m gonna start on it tonight, but I don’t know if it will be posted by tomorrow. Hopefully soon. I hope you had a fun or at least an interesting time reading this, and I am excited to continue on this topic!
(Note: For the first part of this, I didn’t want to be too mathematically rigorous. It was more meant to get you used to some of the aspects of neural networks. For part 2, I plan on discussing how we will actually train this thing. Then, in future parts, I’ll probably start off from the basics again, but with more rigorous explanations. I hope I haven’t made any errors writing this late at night :P, and if I did, please feel free to point them out. Thanks for reading.)