Artificial neural network/History



Warren McCulloch and Walter Pitts[1] (1943) opened the subject by creating a computational model for neural networks.[2] In the late 1940s, D. O. Hebb[3] created a learning hypothesis based on the mechanism of neural plasticity that became known as Hebbian learning. Farley and Wesley A. Clark[4] (1954) first used computational machines, then called "calculators", to simulate a Hebbian network. In 1958, psychologist Frank Rosenblatt invented the perceptron, the first artificial neural network,[5][6][7][8] funded by the United States Office of Naval Research.[9] The first functional networks with many layers were published by Ivakhnenko and Lapa in 1965, as the Group Method of Data Handling.[10][11][12] The basics of continuous backpropagation[10][13][14][15] were derived in the context of control theory by Kelley[16] in 1960 and by Bryson in 1961,[17] using principles of dynamic programming. Thereafter research stagnated following Minsky and Papert (1969),[18] who discovered that basic perceptrons were incapable of processing the exclusive-or circuit and that computers lacked sufficient power to process useful neural networks.

In 1970, Seppo Linnainmaa published the general method for automatic differentiation (AD) of discrete connected networks of nested differentiable functions.[19][20] In 1973, Dreyfus used backpropagation to adapt parameters of controllers in proportion to error gradients.[21] Werbos's (1975) backpropagation algorithm enabled practical training of multi-layer networks. In 1982, he applied Linnainmaa's AD method to neural networks in the way that became widely used.[13][22]

The development of metal–oxide–semiconductor (MOS) very-large-scale integration (VLSI), in the form of complementary MOS (CMOS) technology, enabled increasing MOS transistor counts in digital electronics. This provided more processing power for the development of practical artificial neural networks in the 1980s.[23]

In 1986 Rumelhart, Hinton and Williams showed that backpropagation learned interesting internal representations of words as feature vectors when trained to predict the next word in a sequence.[24]

From 1988 onward,[25][26] the use of neural networks transformed the field of protein structure prediction, in particular when the first cascading networks were trained on profiles (matrices) produced by multiple sequence alignments.[27]

In 1992, max-pooling was introduced to help with least-shift invariance and tolerance to deformation to aid 3D object recognition.[28][29][30] Schmidhuber adopted a multi-level hierarchy of networks (1992) pre-trained one level at a time by unsupervised learning and fine-tuned by backpropagation.[31]

Neural networks' early successes included predicting the stock market and in 1995 a (mostly) self-driving car.[a][32]

Geoffrey Hinton et al. (2006) proposed learning a high-level representation using successive layers of binary or real-valued latent variables with a restricted Boltzmann machine[33] to model each layer. In 2012, Ng and Dean created a network that learned to recognize higher-level concepts, such as cats, only from watching unlabeled images.[34] Unsupervised pre-training and increased computing power from GPUs and distributed computing allowed the use of larger networks, particularly in image and visual recognition problems, which became known as "deep learning".[35]

Ciresan and colleagues (2010)[36] showed that despite the vanishing gradient problem, GPUs make backpropagation feasible for many-layered feedforward neural networks.[37] Between 2009 and 2012, ANNs began winning prizes in image recognition contests, approaching human level performance on various tasks, initially in pattern recognition and handwriting recognition.[38][39] For example, the bi-directional and multi-dimensional long short-term memory (LSTM)[40][41] of Graves et al. won three competitions in connected handwriting recognition in 2009 without any prior knowledge about the three languages to be learned.[40][41]

Ciresan and colleagues built the first pattern recognizers to achieve human-competitive/superhuman performance[42] on benchmarks such as traffic sign recognition (IJCNN 2012).

Learning Tasks

  • Analyze the history of Artificial Neural Networks (ANN) and identify drivers for its application.
  • Analyze the scientific results in the history of ANNs. How do these results contribute to specifics properties of learning algorithms.


  1. McCulloch, Warren; Walter Pitts (1943). "A Logical Calculus of Ideas Immanent in Nervous Activity". Bulletin of Mathematical Biophysics 5 (4): 115–133. doi:10.1007/BF02478259. 
  2. Kleene, S.C. (1956). "Representation of Events in Nerve Nets and Finite Automata". Annals of Mathematics Studies. No. 34. Princeton University Press. pp. 3–41. Retrieved 17 June 2017.
  3. Hebb, Donald (1949). The Organization of Behavior. New York: Wiley. ISBN 978-1-135-63190-1. 
  4. Farley, B.G.; W.A. Clark (1954). "Simulation of Self-Organizing Systems by Digital Computer". IRE Transactions on Information Theory 4 (4): 76–84. doi:10.1109/TIT.1954.1057468. 
  5. Haykin (2008) Neural Networks and Learning Machines, 3rd edition
  6. Rosenblatt, F. (1958). "The Perceptron: A Probabilistic Model For Information Storage And Organization in the Brain". Psychological Review 65 (6): 386–408. doi:10.1037/h0042519. PMID 13602029. 
  7. Werbos, P.J. (1975). Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. 
  8. Rosenblatt, Frank (1957). "The Perceptron—a perceiving and recognizing automaton". Report 85-460-1 (Cornell Aeronautical Laboratory). 
  9. Olazaran, Mikel (1996). "A Sociological Study of the Official History of the Perceptrons Controversy". Social Studies of Science 26 (3): 611–659. doi:10.1177/030631296026003005. 
  10. 10.0 10.1 Schmidhuber, J. (2015). "Deep Learning in Neural Networks: An Overview". Neural Networks 61: 85–117. doi:10.1016/j.neunet.2014.09.003. PMID 25462637. 
  11. Ivakhnenko, A. G. (1973). Cybernetic Predicting Devices. CCM Information Corporation. 
  12. Ivakhnenko, A. G.; Grigorʹevich Lapa, Valentin (1967). Cybernetics and forecasting techniques. American Elsevier Pub. Co.. 
  13. 13.0 13.1 Schmidhuber, Jürgen (2015). "Deep Learning". Scholarpedia 10 (11): 85–117. doi:10.4249/scholarpedia.32832. 
  14. Dreyfus, Stuart E. (1 September 1990). "Artificial neural networks, back propagation, and the Kelley-Bryson gradient procedure". Journal of Guidance, Control, and Dynamics 13 (5): 926–928. doi:10.2514/3.25422. ISSN 0731-5090. 
  15. Mizutani, E.; Dreyfus, S.E.; Nishio, K. (2000). "On derivation of MLP backpropagation from the Kelley-Bryson optimal-control gradient formula and its application". Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium (IEEE): 167–172 vol.2. doi:10.1109/ijcnn.2000.857892. ISBN 0-7695-0619-4. 
  16. Kelley, Henry J. (1960). "Gradient theory of optimal flight paths". ARS Journal 30 (10): 947–954. doi:10.2514/8.5282. 
  17. "A gradient method for optimizing multi-stage allocation processes". Proceedings of the Harvard Univ. Symposium on digital computers and their applications. April 1961.
  18. Minsky, Marvin; Papert, Seymour (1969). Perceptrons: An Introduction to Computational Geometry. MIT Press. ISBN 978-0-262-63022-1. 
  19. Linnainmaa, Seppo (1970). The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors (Masters) (in Finnish). University of Helsinki. pp. 6–7.
  20. Linnainmaa, Seppo (1976). "Taylor expansion of the accumulated rounding error". BIT Numerical Mathematics 16 (2): 146–160. doi:10.1007/bf01931367. 
  21. Dreyfus, Stuart (1973). "The computational solution of optimal control problems with time lag". IEEE Transactions on Automatic Control 18 (4): 383–385. doi:10.1109/tac.1973.1100330. 
  22. Werbos, Paul (1982). "Applications of advances in nonlinear sensitivity analysis". System modeling and optimization. Springer. pp. 762–770. Retrieved 2 July 2017. 
  23. Mead, Carver A.; Ismail, Mohammed (8 May 1989). Analog VLSI Implementation of Neural Systems. The Kluwer International Series in Engineering and Computer Science. 80. Norwell, MA: Kluwer Academic Publishers. doi:10.1007/978-1-4613-1639-8. ISBN 978-1-4613-1639-8. Retrieved 24 January 2020. 
  24. David E. Rumelhart, Geoffrey E. Hinton & Ronald J. Williams , "Learning representations by back-propagating errors" Archived 8 March 2021 at the Wayback Machine, Nature, 323, pages 533–536 1986.
  25. Qian, Ning, and Terrence J. Sejnowski. "Predicting the secondary structure of globular proteins using neural network models." Journal of molecular biology 202, no. 4 (1988): 865-884.
  26. Bohr, Henrik, Jakob Bohr, Søren Brunak, Rodney MJ Cotterill, Benny Lautrup, Leif Nørskov, Ole H. Olsen, and Steffen B. Petersen. "Protein secondary structure and homology by neural networks The α-helices in rhodopsin." FEBS letters 241, (1988): 223-228
  27. Rost, Burkhard, and Chris Sander. "Prediction of protein secondary structure at better than 70% accuracy." Journal of molecular biology 232, no. 2 (1993): 584-599.
  28. J. Weng, N. Ahuja and T. S. Huang, "Cresceptron: a self-organizing neural network which grows adaptively Archived 21 September 2017 at the Wayback Machine," Proc. International Joint Conference on Neural Networks, Baltimore, Maryland, vol I, pp. 576–581, June 1992.
  29. J. Weng, N. Ahuja and T. S. Huang, "Learning recognition and segmentation of 3-D objects from 2-D images Archived 21 September 2017 at the Wayback Machine," Proc. 4th International Conf. Computer Vision, Berlin, Germany, pp. 121–128, May 1993.
  30. J. Weng, N. Ahuja and T. S. Huang, "Learning recognition and segmentation using the Cresceptron Archived 25 January 2021 at the Wayback Machine," International Journal of Computer Vision, vol. 25, no. 2, pp. 105–139, Nov. 1997.
  31. J. Schmidhuber., "Learning complex, extended sequences using the principle of history compression Archived 18 March 2020 at the Wayback Machine," Neural Computation, 4, pp. 234–242, 1992.
  32. Domingos, Pedro (September 22, 2015). "chapter 4". The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World. Basic Books. ISBN 978-0465065707. 
  33. Smolensky, P. (1986). "Information processing in dynamical systems: Foundations of harmony theory.". In D. E. Rumelhart. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. 1. pp. 194–281. ISBN 978-0-262-68053-0. 
  34. Ng, Andrew; Dean, Jeff (2012). "Building High-level Features Using Large Scale Unsupervised Learning". arXiv:1112.6209 [cs.LG].
  35. Ian Goodfellow and Yoshua Bengio and Aaron Courville (2016). Deep Learning. MIT Press. Retrieved 1 June 2016. 
  36. Cireşan, Dan Claudiu; Meier, Ueli; Gambardella, Luca Maria; Schmidhuber, Jürgen (21 September 2010). "Deep, Big, Simple Neural Nets for Handwritten Digit Recognition". Neural Computation 22 (12): 3207–3220. doi:10.1162/neco_a_00052. ISSN 0899-7667. PMID 20858131. 
  37. Dominik Scherer, Andreas C. Müller, and Sven Behnke: "Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition Archived 3 April 2018 at the Wayback Machine," In 20th International Conference Artificial Neural Networks (ICANN), pp. 92–101, 2010. doi:10.1007/978-3-642-15825-4_10.
  38. 2012 Kurzweil AI Interview Archived 31 August 2018 at the Wayback Machine with Jürgen Schmidhuber on the eight competitions won by his Deep Learning team 2009–2012
  39. "How bio-inspired deep learning keeps winning competitions | KurzweilAI". Archived from the original on 31 August 2018. Retrieved 16 June 2017.
  40. 40.0 40.1 Graves, Alex; Schmidhuber, Jürgen (2009). "Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks". Advances in Neural Information Processing Systems 21 (NIPS 2008). Neural Information Processing Systems (NIPS) Foundation. pp. 545–552. ISBN 9781605609492. 
  41. 41.0 41.1 Graves, A.; Liwicki, M.; Fernandez, S.; Bertolami, R.; Bunke, H.; Schmidhuber, J. (May 2009). "A Novel Connectionist System for Unconstrained Handwriting Recognition". IEEE Transactions on Pattern Analysis and Machine Intelligence 31 (5): 855–868. doi:10.1109/tpami.2008.137. ISSN 0162-8828. PMID 19299860. Archived from the original on 2 January 2014. Retrieved 30 July 2014. 
  42. Ciresan, Dan; Meier, U.; Schmidhuber, J. (June 2012). Multi-column deep neural networks for image classification. 3642–3649. doi:10.1109/cvpr.2012.6248110. ISBN 978-1-4673-1228-8. Bibcode: 2012arXiv1202.2745C. 

Cite error: <ref> tags exist for a group named "lower-alpha", but no corresponding <references group="lower-alpha"/> tag was found