History of Neural Networks and Deep Learning

Von Fabian Hertwig @@FabianHertwig auf Twitter
11. July 2018

In a way, Deep Learning has been introduced for decades in the form of neural networks, enriched with many additional tricks that make very deep neural networks possible.

In this blog, I'll look at the question why neural networks didn't become established in their first blooms in the 1960s and 1980s. And why this is hopefully different this time.

The brain cell as a model for electrical circuits

The history of neural networks begins in the 1940s. At that time, researchers investigated how neurons function in the brain and tried to simulate this with electrical circuits. The first breakthrough came in 1957 with the Mark I Perceptron, a machine that used a single "neuron" to divide input data into two classes. It learned by minimizing the error from previous experiments. In the following years, researchers tried to solve many different problems with artificial intelligence, for example that of natural language.

The program "STUDENT" could understand mathematical tasks in natural language. However, the number of words in this language was very limited and the algorithm consisted of an optimized search in a graph rather than a learning model. It was similar with the Shakey robot, which got its name because of its trembling movements. The A*algorithm, which can efficiently calculate the path between a starting point and a target point, is still widely used today. Today this would no longer be called artificial intelligence, but it helped to raise the expectations of artificial intelligence.




That's what the robot Shakey looked like.

Two times to hibernations and back

The 1970s saw the arrival of the "Artificial Intelligence Winter". Further research and funding of artificial intelligence was stopped after the exaggerated expectations could not be met. As early as the 1950s, researchers believed that human-like artificial intelligence could be achieved in a few decades. It was thought that if you could teach a machine to play chess, then you could also teach it other tasks.





A TV show with interviews with some of the Artificial Intelligence pioneers


In the 1980s, with the development of the backpropagation algorithm, neural networks received more attention once again: With this method, multi-layer neural networks are trained by propagating the errors of the network back through the network. In the 90s, Yann LeCun, one of the deep learning fathers, developed the first convolutional neural networks to recognize handwritten numbers.

This new form of neural networks is particularly suitable for recognizing items in images. His networks read hundreds of thousands of checks every year. However, there was another forced hibernation for neural networks: they could hardly be trained for larger problems; other machine learning methods, such as Support Vector Machines, showed good results and took the lead on the artificial intelligence stage.

Until 2012, neural networks were hardly noticed; then another breakthrough occurred. Jeffrey Hinton developed a model that almost halved the error rate in the “Large Scale Image Recognition Challenge”. Several fundamental innovations in Deep Learning contributed to this development: Among other things, algorithms were developed to train the networks with graphic cards. Graphic cards are especially fast when calculating parallel matrix multiplications. Since the training of neural networks mainly consists of matrix multiplications, the training time could be reduced by a factor of 1.000; training times suddenly shortened to an acceptable time.

Tensorflow and Co: Graphic Cards as Calculators

Today, frameworks like Tensorflow, Theano or Pytorch help to execute code very easily on graphic cards. Easily means: A developer does not have to write specialized code that runs on the graphic card; the frameworks translate it for the developer and execute it either on the graphic card or on normal CPUs. The frameworks also transfer data between the graphic card and the CPU. In addition, they offer an automatic calculation of the derivation of functions. This is important for the backpropagation algorithm and relieves developers and researchers of a very complex task when developing deep learning models.

Over the top expectations?

Deep learning is currently at peak of inflated expectations in Gartner's hype cycle. The hypecycle sums up that new technologies usually do not meet the expectations of the first hype, or innovation trigger; they must then cross the trough of disillusionment to ultimately find application in industry. Deep learning is thus - at least according to this model - on the verge of becoming a disappointment, again.

In my view however, the underlying neural networks have already left the trough of disillusionment behind at least twice: In the 1960s, funding for research on them and other models of artificial intelligence was halted after extremely high expectations could not be met. After their resurrection in the 1980s, neural networks could not prevail against other methods and took a back seat. However, some researchers did not give up hope for the algorithm based on nature. If our brains lead to intelligence, then it must be possible to reconstruct this structure with machines, so the thought.

The persistence of the researchers has paid off: Even if singularity is still far off and human-like intelligence is still way off, neural networks show incredible progress in many areas. Whether it's recognizing things on pictures, understanding or outputting natural language, or translating languages - all of this is feasible today. Just recently it was demonstrated that computers with deep learning algorithms can be trained to beat human opponents in complex team-based strategy games (articles at and Also, not only the tech giants use deep learning technology to solve complex problems. The technology has already been adopted in many other companies.

I am curious as to whether deep learning technology will have to pass through the valley of disappointment again, or if the past perdiods of hibernation of neural networks will already suffice.