Original Source Here
Creating AI By Using Brain Theory
It stands to reason that as the only human-level intelligence is in humans, brain science should lead the way to create artificial intelligence (AI). Patom theory was first described in the 1990s and since then the theory has been effective at solving problems in natural language understanding (NLU). Its terse description is that with Patom theory (PT), all a brain does is store, match and use hierarchical, bidirectional linkset patterns[i]. That is, sets and lists are sufficient to explain all that a human brain is capable of. And by dealing with the specific patterns that we experience, the decomposition of those patterns into their atomic parts enables general patterns to be recognized not by processing, but because they are the same.
Patom theory answers the question:
“how does our relatively slow brain outperform super computers at tasks like language, vision and motor control?”
It is theoretical neuroscience in that it models the function of the human brain without regard to any particular brain[ii]. A Pattern-ATOM (Patom) answers the question about why someone experiences particular things in reaction to brain stimulation (such as during brain surgery). The Patom is the thing stimulated — a small brain region — and it represents a collection of patterns through forward and backwards projections and, if the material is used at all — raising awareness through the backward projections to sensory experience. The memory of the sensory experience remains where it is matched in the first place and that pattern is then used in the hierarchy to match further complexity. In language, for example, the relationships of Patoms align with semantics, the linguistic theory that explains meaning.
Today’s article compares Patom theory with 3 current brain models: (a) the neuroscience-led model from Jeff Hawkins[iii] (the Thousand Brains Theory of Intelligence), (b) the latest model by the Deep Learning developer, Geoffrey Hinton[iv] (GLOM) and (c) the neuroscientist’s brain model[v] of prediction where “…the perceiving brain is fundamentally an engine of prediction.” The fundamental difference between those three models and Patom theory comes down to process or computation.
Brains don’t process. Brains aren’t devices of computation. Therefore, a student’s learning is biased by these current models.
The paradigm of computation is unhelpful in describing intelligence because it doesn’t explain where the program comes from to run it or how a prediction is created in the neuroscience model. Those paradigms leave out the questions of how and why brains learn which is intricately linked with the appearance of intelligence itself.
Let’s see why.
Evolution — Consciousness
Patom theory starts with evolutionary theory. Early brains enable the survival of an animal: its awareness of its body makes the animal protect all its cells, not allow others to eat away its limbs. Brains develop (a) to move the animal, and (b) to sense its environment. (c) Awareness (or the more human term, consciousness) creates a powerful controlling mechanism for an animal as a single entity will protect itself while a collection of cells needn’t bother with other parts of itself. New senses, improved motor control all start with these building blocks.
In a human brain, the evolutionary improvements extend the brain stem with improved motor coordination, emotional centers and the large cerebrum that caters to improved sensory recognition, multisensory recognition, and even human language.
The point is our brain starts with the brain stem in evolution, and our most human capabilities are extensions. Many suggest that human consciousness is some kind of pinnacle of evolution, but isn’t it more likely that awareness is the earliest of beneficial mutations in animals and human language is the pinnacle?
Pattern Matching and Use
Brains recognize things (pattern matching), and can then produce motion as a response (pattern use, such as moving or speaking). Recognition requires (a) the storage of the pattern and (b) the matching of the pattern. Taking action from recognition is called ‘use’. Motion is enabled by storing sequences of sets of patterns, as in muscle control. Feedback from the results of muscle control creates a loop that, over time, allows successful motion to be connected to desired motion. In this loop, when a sensory pattern is recognized, a motor pattern can be initiated. The learning of the linkset pattern comes from storing and matching connected patterns. In fact, the philosopher, David Hume’s basic elements included resemblance and contiguity — which could be thought of as snapshot pattern similarity (resemblance) and sequential pattern similarity (contiguity).
The relevance of this theoretical brain model is that brain function should be viewed as a set of extensions to the most basic brains, and even human capabilities of sensory recognition, memory and even language can be explained with short- and long-term memory as extensions.
The model sets the starting point for brain function. We will refer back to this when looking at alternative theories.
What does it mean to predict? Is the brain: “essentially a probabilistic prediction engine?” Does the brain anticipate what will happen next probabilistically? Where is the evidence for that? It seems valid to claim that a brain anticipates what will happen in the world, but the next question is ‘why’ or ‘how’?
How does a brain anticipate what will happen next? One explanation is that a brain lives in a black box (a skull) and cannot experience the world directly. That’s a narrow definition for a brain. If we think of inputs, a brain starts with nerve endings in our skin, our eyes, our balance in semi-circular canals and so forth and with outputs: muscle contractions from nerves. It’s best to think of all those elements making up our brain, since they operate as an integrated unit with the rest of the body.
“Brains predict” aligns perfectly with the idea that brains store patterns and match them. As the brain receives input, it compares the experience with what it expects, if it has stored the expected patterns previously. Discrepancies are detected, and so is consistency. At that level it is ‘predicting’ that current experience matches stored experience, but really it is just following pattern sequences and alerting if something different happens.
More importantly to that level of description is what a brain is doing to predict at all. First, it must recognize sensory experience. Second, it must recognize objects received in one sense across the other senses. Third, it must recognize the way the objects can be manipulated and their properties. These latter two steps means a brain consolidates sensory patterns in a hierarchy since we know a brain stores sensory patterns in a hierarchy for each sense, and then combines them.
For example, one sequence of patterns has a person strike a match and fire emerge from the match tip. Another has only smoke emerge form the match. We know both sequences and neither is surprising. Context can help select the more appropriate pattern. In a rain storm, the match is unlikely to ignite, while in the dry desert, it will. As we have stored both patterns, it is just context that determines what is likely.
Brains recognize pattern sequences in context, just like humans recognize sentences in context. It is a pattern matching capability that can be modelled as a prediction capability, once the sequence is recognized. If there is only one experienced sequence, it’s continuation will be a strong prediction!
The hierarchy needs to be bidirectional as well. To learn that a dog has soft fur, our brain must connect the touch sense to the visual sense that recognizes the dog. As the senses are in different parts of the brain, connecting brain material needs to link them together. The connected brain sensory patterns are called linkset patterns for short. The bidirectional hierarchy allows the multisensory representation of the dog to be connected back to its relevant sensory patterns — the visual recognition of the dog and the soft fur.
This leads to the storage method that is unlike today’s computational models.
In PT, brains store the memories of objects in their sensory areas, and only reference the memories in reverse links when needed. A digital computer today is the opposite, storing a high-level abstraction as the starting point and sensory data is not a primary consideration.
Prediction as a description leaves out the key function of a brain: storing patterns and connecting them together for future recognition. Some would call this learning, but today the word ‘learning’ introduces a number of invalid connotations. Does our brain learn to see? Sure. There is support from experiments on cat vision. It also learns to speak a language. It learns to throw a ball. These are all combinations of sequences and sets of patterns.
The scientific theory that a brain is a prediction engine is fine, but the next step is surely to explain how and why it works, isn’t it?
Hinton — GLOM model
Hinton’s recent theory starts with the idea that vision is a good starting point for a brain because of “evidence the people parse visual scenes into part-whole hierarchies…”[vi]. Taken literally, this suggests that the human visual system recognizes part-whole relationships, but in practice this relationship is stored in the temporal and parietal lobe (as validated by brain damage), not just the occipital lobe (where visual input is received in the cortex). Brain damage that causes ‘simultagnosia’ and ‘form agnosia’ have patients that perceive object parts only, or only single objects at once, not groups. Those and other kinds of brain damage suggest that brains have specialized regions for part-whole hierarchies, rather than having a common operating model.
Patom theory explains the hierarchy as the relationship between objects — multi-sensory objects — not just a single modality like vision.
The evolution of a brain justifies Patom theory’s position on this. How would a visual system evolve to recognize a visual object’s composition independently to their other properties? The simplest of animals relies on multi-modal experience — the smell of a predator is sufficient for the animal to move away or freeze. As evolution progresses, the expansion of brain specialization appears simpler by improving the multi-sensory interactions, rather than just visual interactions, because of the survival benefits.
This distinction invalidates the GLOM model as human-like because it factors out the reality that a human brain operates with both (a) layers of neurons including the forward and reverse projections and (b) regions of speciality — such as visual motion, independent senses, and multi-sensory areas.
In language, it isn’t the semiotic signs that matter (i.e. the sound recognition in a human brain, typically in the left temporal region), but their association with multisensory objects (temporal lobe) and their interactions (frontal lobe) that matter.
As Hinton writes: “If we want to make neural networks that understand images in the same way as people do, we need to figure out how neural networks can represent part-whole hierarchies.” Patom theory agrees and has a solution for that based on human-brain observations but we need to jump out of the visual modality into the multisensory regions.
Hinton attempts to integrate the symbolic model used in semantics into GLOM. He calls it “Good Old-Fashioned Artificial Intelligence (GOFAI)[vii]” but as a cognitive scientist, it is really better called mainline modern linguistics as practiced by semanticists. And while hypernym (IS-A) relations and meronym (HAS-A) relations are used for generalization in human languages (NLU), instead of making a single bidirectional connection, Hinton’s proposal is to make “different entities correspond to different vectors of activity on the same set of neurons.” Ockham’s Razor would suggest that this additional complexity needed to fit Hinton’s model of artificial neural networks is invalid, since a single association is a simpler model. It forces a change in the deep learning model, rather than questioning if deep learning is the right starting point. A model based on multisensory objects and context integration seems to deal with more observations.
Considering what science shows the brain does already, why not try to replicate that? I do not see GLOM achieving this.
Neuroscience: Brains are Prediction Machines?!
Neuroscientists use the paradigm of prediction to explain what a brain does — a paradigm that has been followed for some time. And while it explains the observations of brain function, it explains what a brain does, but not why or how it does it. A human brain detects quickly any contrasts with experience. In language it calls out a missing syllable in a sentence, for example. In NLU work, we see the amazing recognition of errors by native speakers as they review language output.
In the Nave article [xii], the use of stored patterns as in Patom theory is supported(i.e. we tend to experience what we have learned/believe):
“The novel addition this theory makes to traditional, feedforward-dominated perception research is that perception is not explained by incoming signals alone, but crucially also includes active top-down predictions about their shape…”
In my favourite scientific folly, the geocentric Earth model, observers on Earth see that the sun goes around the earth once per day. That explains what is seen. But why does it do that? Because the earth is revolving daily and that motion creates the appearance of a moving sun. That model appears ridiculous, of course, because if the Earth were moving our feeling of lack of motion would be incorrect, and then why wouldn’t we fall off the Earth? It took a few scientists, like Newton, to explain why.
The next step up from ‘prediction’ is Patom theory. There are two kinds of patterns in Patom theory — (i) snapshot patterns like the set of active items in a room at a point in time and (ii) sequential patterns like the changes in a set of action items in a room at a time. (In NLU, the set of participants in a conversation is tracked in context to allow us to know what other’s have heard. This common ground knowledge is critical to avoid explaining known facts again to people with shared experiences, or leaving them out for those who don’t have the background.)
Prediction is therefore the tracking of changes in patterns over time in Patom theory. When changes are unexpected, the new patterns are learned and also the new patterns are addressed. For example, when someone says a new word, like ‘writch’ was it a slip of the tongue, a coded message, or a new word? As children we constantly experience new situations. Some are repeated throughout life, some are rare, some are infrequent. That tracking of stored patterns is what a brain does. It explains why a brain appears to predict what will happen because it already has experience in what can happen next.
Patom theory is more effective as a scientific explanation because it explains what goes on before prediction can take place and it sets out the research needed to validate it, or falsify it. It also sets out what a brain is doing with pattern matching in contrast to the processing model: storing patterns in a hierarchy for future rapid matching from constituents of pattern atoms — sensory and multisensory snapshot experiences and sequences of them.
Jeff Hawkings Thousand Brain Theory of Intelligence
Jeff’s work is excellent in the research of what the human brain’s neocortex does, but it also focuses on prediction, today’s best neuroscientists view of what brains do which as mentioned above does not explain how and why a brain predicts. As the peer reviewed paper says: “How the neocortex works is a mystery.”
Patom theory explains that a brain tracks sequential patterns and detects deviations from known sequences. Intelligence, if it is a thing to discuss, takes place during those deviations. Should a brain learn this new pattern, should it initiate the flight response? Should it initiate the fight response? Should it do the same thing it did the last time it found an unexpected sequence?
In the Hawkins and Blakesley 2005 book[viii], they describe the act of catching a ball, something that Patom theory similarly explained on the radio in 2000[ix]. While PT explains the learning of the sequential patterns of sets of muscle contractions in order to throw a ball (muscle coordination with visual and balance sensors), Hawkins explained it: “the difference between computing a solution to a problem and using memory to solve the same problem. Consider the task of catching a ball. …[x]” Hawkins then compares the processing approach with the memory-based approach. The steps in the memory model, point out that the memory to catch a ball was “not programmed into your brain; it was learned over years of repetitive practice, and it is stored, not calculated, in your neurons”.
With Patom theory, the visual representations are stored and the muscle motion sequences are stored. The use of the stored patterns enables the catching of a ball, or as described in my work, the example of throwing a ball (a simpler model to describe).
But in the same book, Hawkins claims[xi]: “Prediction is … the primary function of the neocortex, and the foundation of intelligence. The cortex is the organ of prediction.”
In the Hawkins 2019 peer-reviewed article, it says:
“We showed how long-range associative connections in the “object” layer allow multiple columns to vote on what object they are currently observing.”
The non-statistical model, as my experience shows is needed in NLU to deal with ambiguity, would take the observations reported and, rather than model it as a set of “voting” columns, de-personify it by taking the best, complete match. That brings the explanation into line with PT.
The improved model that not only explains what a brain does, but how and why it does it is Patom theory.
This has been an extremely high-level comparison of three brain models today — the one many neuroscientists use today (prediction), Geoffrey Hinton’s 2019 model, and Jeff Hawkins’ 2019 model — with Patom theory, my 1990s theoretical brain model.
While the prediction model appears consistent with observation of brain functions, it lacks the description of how and why that can be. It lacks Patom theory’s explanation of sets and lists as the building blocks of brains. And, in any case, prediction requires someone to predict. Pattern matching does away with the personification of a brain’s function.
[i] To read about the evidence for snapshot and sequential patterns as well, the 2016 version is available here that includes the distinction with Geoffrey Hinton’s work on parallel systems: https://www.amazon.com/Machine-Intelligence-Death-Artificial-ebook/dp/B01E9NM1XM
[iii] Hawkins, J. and Maver, C., The Thousand Brains Theory of Intelligence, https://numenta.com/blog/2019/01/16/the-thousand-brains-theory-of-intelligence/ Jan 16, 2019 and Hawkins, J. et al., A Framework for Intelligence and Cortical Function Based on Grid Cells in the Neocortex https://www.frontiersin.org/articles/10.3389/fncir.2018.00121/full, 11 January 2019.
[v] Nave, K. et al. Wilding the predictive brain, 9 Sept. 2020, https://onlinelibrary.wiley.com/doi/full/10.1002/wcs.1542 is an example of the prediction paradigm used in neuroscience. “The Predictive Processing (PP) framework casts the brain as a probabilistic prediction engine…”
[vi] Hinton, p1.
[vii] Hinton, p29.
[viii] Hawkins J, and Blakeslee S., On Intelligence: How a New Understanding of the Brain Will Lead to the Creation of Truly Intelligent Machines, 2004, https://www.amazon.com.au/Intelligence-Jeff-Hawkins/dp/0805078533
[ix] Our Brain, The Patom-matcher”, Radio National, broadcast on Sun 16 Jan 2000 https://www.abc.net.au/radionational/programs/ockhamsrazor/our-brain-the-patom-matcher/3562700
[x] Hawkins and Blakeslee, p68–69.
[xi] Hawkins and Blakeslee, p89.
[xii] Nave, K. et al. p3.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot