Adventuring into Neural Nets (Part 1)

Hello! Random-man reporting. So this is gonna be a quick write up on some basic neural network stuff, and I’ll probably divide it in 2 parts. First off, let me say that I’m not a professional computer scientist or mathematician. I want to head down that route when I get to the point in life, but for now, I’m just an amateur. Just a disclaimer :wink: . I’m learning this as I write on it.

What is a Neural Network?

A neural network is sort a man-made “emulation” of a real neural network, that is a subset of an actual brain. Now, if you asked ME whether or not a complex neural network would produce a real “intelligence”, that is, a being with true consciousness, I’d probably say no, at least for now. However, they could definitely be used to produce something along the same level as a plant or simple minded animal :D. That’s very useful.

So as we all know in life, not ALL problems are easy to solve with a computer. At least, the way to programmatically solve them isn’t always obvious. If we have a set of numbers, we know that sorting them is definitely a task for a computer, and there are many ways to do this. Of course this is obvious to all who have read some of @oaktree’s great stuff on sorting, and taken a look at Bogo sort, perhaps one of mankinds most beautiful creations. But ask a computer whether or not this is a picture of a dog.

Is there an obvious solution? Probably not. This is where neural networks come to use in our endeavors. The idea of a neural network is that we setup a system that will take some input. It will most likely produce an arbitrary outcome based on it’s initial conditions. Then the output is judged on a criteria of being our desired outcome. Based on this, the system will shift its conditions based on our feedback. So a classical computational problem may be something like…

Programmer: Hey, if I ask you for the arithmetic mean of a set of numbers, I want you the return the ratio of the sum of all the elements to the total cardinality, or number of elements in the set. Got it?
Computer: Yeah seems legit.

But let us now consider a different style of problem.

Programmer: Hey, I have a situtation that you can never truly understand because you are only a computer, but I need you to produce a correct outcome based on such a situation. Since you can’t “really” understand the situation, I’m gonna give you some input, you can first give me some random output, and I’ll tell you how correct you are based on that. That way, you’ll start to get a feel for what kinds of results come from different kinds of inputs.
Computer: …

So that’s a real simplified set of ideas to digest. Let’s take a look at a bit of theory to understand the structure of a simple neural network setup. With a neural network, we have three fundamental parts. We have the inputs, then we have a “black box” or “hidden layer” in the middle, where the input is processed, and then we have the output. This may as well describe any program though huh? Neural networks are special though.

Now let’s briefly look at an EXTREMELY simple neural net, one that produces a very basic outcome.

As you can see, we have three inputs, which are all connected to the one neuron in the middle layer. But notice how each input connection has a “weight” to it. The end outcome, y, is simply the sum of the product of the input values and their weights, well, actually, it’s probably a normalized result from an Activation function, but we’ll get there. Let’s write up a quick function for the simplest kind of neural network. One dubbed the “perceptron”. Let’s take a quick look first.

So, let’s take this progressively. The inputs are gonna be real numbers. Each is muliplied by it’s weight, then they are summed for the final outcome in the end layer. So, we can describe any outcome for a given set with this function, expressed in symbolic and code form. Where I stands for input and w is the weight for that input.

def perceptron(_in1, _in2, _w1, _w2):
    return _in1*_w1+_in2*_w2

Now, let’s briefly generalize this situation and simplify for N inputs :).

def gen_perceptron(_in,_w):
    ret = 0
    for i in range(len(_in)):
        ret = ret + _in[i]*_w[i]
    return ret/abs(ret)

That way, we get either +/-1 as an answer. Now, taking this last binary value is called our “Activation function”. In this case, we have simplified it. In many other applications, though, the
activation function takes other forms. In this article, we are gonna do something a little different and talk about the sigmoid activation function…

Now, since we are just starting out in neural networks, we are not gonna tackle a true A.I. problem yet, for now, let’s tackle something simple and logical. We will use the techniques of neural networks, though. Now, since we have a basic understanding of some of these concepts, let’s try and build a neural network and train it for a simple problem I will describe soon. This is definitely not a problem that needs a neural net to solve, but let’s tackle it to practice our skills and get a better understanding of things :smiley:. Sometimes, instead of trying to understand the theory and dwelve deeply, we may first want to take what we know, roll with it, and DO something with it. In future articles, I’ll definitely go into more thorough and detailed explanations of things. For now, we may just want to get used to some of the ideas of a neural network, and practice a little application of one, just to get our feet a little wet.

Now, in trying to make a neural network for our problem, we shall look at a new Activation function. That is, a sigmoid function. It takes the mathematical form of…



Why would we use a function like this? Well, I couldn’t tell you all of the reasons as I’m still learning this too, but I can tell you that sigmoid functions are great for squashing values. This function will take any x as input as toss it somewhere between 0 and 1. If you are unfamiliar with the “e”, that is Eulers number. It is a very special number, like pi, and trying to explain the significance of it here will bring us to quite a tangent. e is a transcendental number, it could never be a solution of a polynomial with rational coefficients, and it is widely known as a common exponential and logarithmic base. If you are not familiar with all of the mathematical terms, don’t worry, we will look at them as we encounter them more in the future. To solve the problem I will describe briefly, we are going to first train the neural network! We shall first randomly set its initial conditions, then we will give it input, let it guess, then give it output on whether or not it was correct. Based on this, it will adjust its weights to better optimize for the situation! This is gonna be an example of what is called “Back Propagation”. Now, let’s look at a goal or problem… We want to train the neural net to emulate the following truth table logic…

Input Input Input Output
0 0 1 0
1 1 1 1
1 0 1 1
0 1 1 0

What’s the secret sauce? It just takes the value in the left most column. So, we know we’re gonna need 3 inputs. They are gonna be weighted as well. Our gameplan is to take these three inputs, multiply each by their weights, pass the sum of those through the sigmoid function, evaluate the error, and adjust the weighted values. Now notice that since all our input values are either zero or one, the results that are being passed to the sigmoid are really just gonna be the sums of a subset of the weights. Let’s take a look at some of the mathematical operations we will partake in with respect to this particular problem.

  1. To get the value we will push through the sigmoid function:

Simple enough, we take the sum of the products of the weights and inputs.

  1. Sigmoid function to “normalize” the outputs of neuron:
    (note- what I mean by normalization is that we are bringing the outputs to be between 0 and 1)

For the next formula, I’m just gonna refer to the Sigmoid function as S(x) for simplicity :stuck_out_tongue:

Now, the S’(O) is the derivative of the sigmoid function taking the neural network output as an input. I will not go into the Calculus of the derivative, but I will briefly try to give an intution incase you are not familiar with the concept of a derivative. Consider a linear function, that is a line, like y = mx + b. Notice that the slope of the line remains the same at any point along the line. However, in a non-linear function, or a curve, the slope changes as we move along the function. So, the derivative is an algebraic formula we can use to find the slope on the curve at that particular part of the curve.

You might be asking, why the hell are we multiplying by that?!!? Well just chill out and put down the bat. I can explain. So take another look at the curve of the sigmoid function. Notice that the closer y-value is to either 0 or 1, the smaller the derivative, or slope at that point is. Makes sense, because if the value is really close to the right answer, which in this case is either 0 or 1, we don’t want to adjust the weight as much. We’re just using some of the mathematical forces of nature and of this function here to our advantage is all. The e-value, in this formula, is not Eulers number! In this formula, it is an error value, the difference between the result and the desired output.

Why multiply by the error value? Because we want to make our change to the weights proportional to the error, or how badly the neural net screwed up on that particular try. Now we also multiply by the Input value, which in this situation, is either a 1 or a 0. Why? I’m gonna be completely honest with you, I am not completely sure why. However, when/if I find that out, I will definitely try to explain it in part 2 of this.

So, the main idea is that we are gonna adjust the weights with a value that is proportional to the error. Which makes sense since we’re trying to optimize this baby little by little…

(note- In part 2, or in a future part, I will attempt to better explain the back propagation algorithm :slight_smile: )

Whats left after this? Well, we gotta write this thing, and train it a bunch of times! I’m gonna wrap this post up so the two parts are in more digestible smaller chunks. I’m gonna start on it tonight, but I don’t know if it will be posted by tomorrow. Hopefully soon. I hope you had a fun or at least an interesting time reading this, and I am excited to continue on this topic!

(Note: For the first part of this, I didn’t want to be too mathematically rigorous. It was more meant to get you used to some of the aspects of neural networks. For part 2, I plan on discussing how we will actually train this thing. Then, in future parts, I’ll probably start off from the basics again, but with more rigorous explanations. I hope I haven’t made any errors writing this late at night :P, and if I did, please feel free to point them out. Thanks for reading.)

11 Likes

Some formatting issues. Sorry…

1 Like

Sweeeeeeet article @random-man! Really good job! I had no idea where to start with Neural networks, this is really cool!

Nice share :wink:

2 Likes

Nice. I think we should move this to a new category, though: How-To -> Artificial Intelligence.

2 Likes

Love it!

As you asked for comments.

  • Regarding the training, I would say you are talking about supervised learning. This is indeed how perceptrons are trained, but there are other networks that can be trained un-sepervised (Kohonen Networks for instance).
  • It might be worth mention that you may have multiple hidden layers and that the layers does not need to be fully connected
  • For the second part… would you mention SVM?..

Hope the comments would be useful and please keep going. This is a really interesting topic

P.S.: You may mention the Hopfield networks, the training method is a lot simpler ;)… but I guess you are aiming towards deep learning, so that may not be very useful :stuck_out_tongue:

2 Likes

Thanks @0x00pf ! and yes, I will look into SVMs! I never heard of hopfield networks, but I will check them out also!

2 Likes

And yeah, I guess deep learning is a goal of mine, but the Artificial Intelligence section would benefit from any machine learning algorithms I’d say :). Including the Hopfield networks. I’ll definitely have to familiarize myself with them.

1 Like

Thanks! I like that idea.

Thank you! I the section change too. I like this new section :slight_smile:

That’s good. In a sense SVMs have substitute MLPs in many fields…

Do not spend much time with the Hopfield networks. They are nice because of its simplicity (structural and training-wise), so they make a nice case for an introduction to the topic. Their main problem is the limited capacity that prevents its use in real applications. I like them, because they behave as an associate memory, and they were the first one I implemented!

I haven’t follow the topic for a while but I do not thing there has been any further research on them. Now that the computers are more powerful, all the recurrent networks that couldn’t be used many years ago, are coming to live (including the convolutional ones that all of us have seen recently)

2 Likes

Ah I see. Yeah like the one neural network doing all the image manipulation stuff. I think its Googles.

Yes. Now everything is using convolutional networks

Take a look to caffe project (http://caffe.berkeleyvision.org/). Google released tensorflow recently (https://www.tensorflow.org/)… there is quite some material to work with… if you have the time!

1 Like

Man I just got into summer break so time is pretty plentiful luckily :slight_smile:

2 Likes

Lucky you!

Guess we will see some cool AI stuff coming soon!

1 Like

I’m not an expert in Deep Learning or Neural Networks but from my experience, using ReLU as an activation, instead of sigmoid, results in more acurate results. The more non-linear the function is the mode descriminative the model is.

This topic was automatically closed after 30 days. New replies are no longer allowed.