Original Source Here
Seat of Knowledge: AI Systems with Deeply Structure Knowledge
How an information-centric classification of AI architectures can facilitate the construction of task-optimized AI systems
In a series on the choices for capturing information and using knowledge in AI systems, I introduced the concept of an information-centric classification of AI systems as a complementary view to a processing-based classification such as Henry Kautz’s taxonomy for neural symbolic computing. The classification emphasizes the high-level architectural choice related to information in the AI system. This blog will outline the third class in this classification and its promising role in supporting machine understanding, context-based decision making, and other aspects of higher machine intelligence.
The proposed information-centric classification includes three key classes of AI systems based on the architectural partition and use of information on the fly during inference time:
· Class 1 — Fully Encapsulated Information: Training data and relations are incorporated into the parametric memory of the neural network (NN). There is no access to additional information at test time. Examples include recent end-to-end deep learning (DL) systems and language models (e.g., GPT-3).
· Class 2 — Semi-Structured Adjacent Information (in retrieval-based systems): These systems rely on retrieving information from a repository (e.g., Wikipedia) in addition to the NN parametric memory (e.g. retrieval-augmented generation).
· Class 3 — Deeply Structured Knowledge (in retrieval-based systems): Retrieval-based systems that interact closely with a deep knowledge base, as defined in the dimensions of knowledge for higher intelligence.
The major distinction between Class 1 and Class 2 AI systems lies in the choice of information placement — encapsulated in the NN model (Class 1) or an auxiliary repository (Class 2).
The major distinction between Class 2 and Class 3 AI systems is in where the deeper knowledge resides — whether in the NN parametric memory (Class 2) or in the form of deeply structured knowledge in a knowledge base (knowledge graph in Class 3).
What Deep Knowledge is Required for Machine Intelligence?
First, let’s define data, information and knowledge:
· Data are raw, unorganized facts that need to be processed from which information is derived, e.g., pixels in an image.
· Information is created when data is processed, organized, structured, or presented in a given context to make it worthwhile. Information provides structure and context for data.
· Knowledge refers to the relevant and objective information gained through experience. Knowledge operationalizes the data and makes it a helpful resource for making predictions or deciding on actions. For example, when the autonomous vehicle recognizes that the light is turning red, it can assess the learned practices of safely stopping the car and select a braking action.
A detailed discussion on the types of knowledge considered to be deep and structured can be found in the article Understanding of and by Deep Knowledge — How knowledge constructs can transform AI from surface correlation to comprehension of the world. This article outlines the various classes of deep information and discusses descriptive knowledge and the role of models of the world, stories, values, and priorities in capturing the full spectrum of knowledge needed for machine comprehension and higher machine intelligence. Concept References are introduced to support disambiguation and unified linkage across modalities and dimensions. I will refer to a representation that reflects the relations and complexities of multiple types of knowledge (of the kinds depicted in Figure II) as Deep Knowledge.
In Class 2 systems, the repository contains information, but much of the complex relationships and insights related to the information are encapsulated in the embedded space of the NN. In systems with deeply structured knowledge (Class 3 systems), most of the dependencies and relationships are explicitly represented in the knowledge base.
Class 3: AI Systems with Deeply Structured Knowledge
In an AI system with deeply structured knowledge (Class 3), an NN has an adjacent knowledge base with an explicit structure that conveys the relations and dependencies that constitute deep knowledge. The auxiliary knowledge base is accessed during training and inference/test time. Some of the deep knowledge still resides in the NN parametric memory, but in this class of systems, most of knowledge resides outside the NN.
A great deal of effort has been devoted to extracting information and storing it in structured knowledge bases. Abstract Meaning Representation (AMR), for example, is used in semantic parsing to generate a graph that captures the semantic regardless of syntax/representation. The extracted entities and relations can be linked and mapped onto an ontology and stored in a structured knowledge base.
Wikidata is a knowledge base with multiple elements of deep knowledge. In addition to a hierarchical ontology, Wikidata provides “special” relations or attributes, such as temporal or spatial annotations of relations and entities. These attributes bridge the gap between static data and events that alter the state of the knowledge. They reflect the dynamic nature of the knowledge and enable dedicated reasoning (such as temporal, spatial, causal) on the knowledge base. These attributes do not exist systematically in all knowledge bases.
Most of the knowledge captured in Wikidata is descriptive. Different knowledge representations can combine descriptive knowledge with other knowledge dimensions. A knowledge base with causal models can support more powerful reasoning and enable counterfactual explorations. Adding context can enable a more refined use of information in its proper intent and cases where it applies. Adding source attribution and provenance can allow systems to understand bias in the data, and analyze the information (e.g., controversial political events), through a more informed perspective.
A key aspect of utilizing knowledge is the interplay between knowledge representation and reasoning. Knowledge and reasoning are points on a continuum rather than two completely distinct functions. Reasoning can tease out required knowledge or outcomes when they are not already fully represented and ready for as-is retrieval. Applying reasoning on an explicit knowledge base can be done as part of the NN, as a separate operation performed on the knowledge base, or a combination of both. This architectural choice has major implications for the nature of training, knowledge representation, and the types of computations performed during inference/test time.
In an NN-only reasoning system, the knowledge base serves as a repository. A Class 3 system will use an explicit knowledge base during inference; however, reasoning functions such as sorting, selection, neighbors’ identification, and others are conducted by the NN within the embedded space — as can be found in examples of QA systems operating over knowledge graphs.
Other Class 3 systems have an active functionality of selecting information or performing parts of the reasoning on top of the KB. We will refer to such a mechanism as reasoned extraction. One example is Neuro-Symbolic Question Answering (NSQA). A key advantage of reasoned extraction over NN-only reasoning is that the answer returned by the system can change dynamically as the KG is updated without needing to retrain the model.
Key Elements of a Class 3 System
Figure III depicts the high-level architecture of a Class 3 system and its key components:
The term Knowledge refers to the relevant and objective information gained through experience. Deep Knowledge describes knowledge that has multiple dimensions, with complex relations captured within each domain. A knowledge base implements structured interactive knowledge as a repository in a particular solution, primarily implemented as knowledge graphs (e.g.: Google’s Knowledge Graph). Finally, an AI system with Deeply Structured Knowledge is a system with a knowledge base that captures deep knowledge and reflects its structure through extraction schemes.
The Neural Network is the primary functional part of a Class 3 system. It may include all perception elements such as image recognition and scene segmentation, or a language model for processing syntax, placement-based relations, and the core of common semantics. It will likely learn an embedding space that represents the key dimensions of the incoming data. In cases of multimodal systems, it will reflect the space of images and the language space. Similar to class 2, AI systems with deeply structured knowledge will only incorporate within it a portion of the data/information. However, unlike class 2 where the complex knowledge structure resides within the NN, class 3 architectures rely on the adjacent structured knowledge base for much of the deep relations in its semantic space.
Similar to Class 2 systems, the neural network system in Class 3 can engage with the structured knowledge base during inference/test time, and extract the information needed for completing its task successfully. In this architecture, the training of the NN needs to be done together with the extraction mechanism and some representation of the knowledge base to allow the NN to learn how to extract the required knowledge during inference.
The Knowledge Base contains facts and information that might be required for future inference and some or all of the deep knowledge structures depicted in Figure II. These include descriptive knowledge, models of the world dynamics, stories, context and source attribution, value and priorities, and Concept References.
The knowledge base gets populated ahead of being utilized and can be further enhanced during training or inference time (in online or continuous learning systems). It is primarily based on the function of Knowledge Acquisition that extracts facts, information, taxonomies, functional models, and other knowledge elements from sources external to the AI system and structures them in a way amenable for extraction and reasoning. When it accrues additional information based on training or inference runs, it can be seen as a memory extension for the NN, to structure and retain additional learned information and knowledge.
The knowledge base can change after training and can include additional data and knowledge. As long as the nature of knowledge and information is similar to what was encountered by the NN during training, the modified knowledge base should be fully utilizable during inference based on its latest incarnation.
Finally, the Reasoned Extraction block is mediating between the NN and its external source of knowledge. In the simplest case, it is a direct mapping from embedding vectors through some indexed links to the knowledge base. However, in that case, the knowledge base structure does not bring any additional value compared with a flat information repository. In the more general case, reasoned extraction will extract its information using libraries based on queries or APIs. An example of this is AMR, which extracts and stores information as semantic graphs that are insensitive to the original syntactic expression. One application is the ability to later retrieve and paraphrase or summarize the information from that representation.
Due to the complexity of creating a sufficiently-populated knowledge graph that covers all relevant information, likely, some Class 3 systems will also integrate the full mechanism of retrieval from a large semi-structured corpus of data in addition to its knowledge base. Some insufficiencies in the coverage of the knowledge base, such as missing links, can be actively addressed at inference time through methods of reasoning over incomplete information. Other cases may require the system to access additional sources of information, which we refer to as Class 3+ and illustrated in Figure IV. In Class 3+ systems, there are three levels of information available to the NN:
· The most immediate information and knowledge reside in the NN parametric memory
· A significantly larger volume of information, as well as deeply structured knowledge is available in the knowledge base
· For cases in which the knowledge base lacks sufficient coverage, a retrieval mechanism pulls information from the largest available semi-structured information repository
What AI System Architecture is best for the task?
No one architecture is best for all purposes. Each class of AI systems has clear advantages and associated challenges. It is important to understand the characteristics of the systems and match them with the profile of the use cases to be performed by the AI system.
The structural relationships between data, information, and knowledge relate to the information-centric classification as shown below:
Class 1 AI systems with fully encapsulated information might be the most prevalent and impactful AI solutions being developed today. End-to-end deep learning systems have exceptional effectiveness in many domains. They will likely be the best solutions for all types of perception tasks (such as image recognition and segmentation, speech recognition, and many natural language processing functions), sequence-to-sequence capabilities (e.g., language translation), recommendation systems, many question-and-answer applications, and more. In general, any function that can be solved over a continuous space and can be modeled through latent manifolds can be effectively addressed by NN systems.
Some of the current DL limitations will likely be addressed by further work in the field. The question is not only what Class 1 system can eventually do, but rather what such fully enclosed systems can do well, given considerations of time, cost, variability, information reliability/bias, smaller data domains, and more. For example, a NN can learn to do Boolean logic and some basic arithmetic, but is it an efficient solution to actual use cases? DL systems do not perform high-school-level arithmetic correctly, as it provides approximate values rather than performing discrete algebra. NN systems also have difficulty in identifying when they do not have an answer. Another example is capturing source attribution and information provenance. If a DL system is to maintain the provenance of facts it has seen during training as part of the model, it will require a very different NN solution which is not likely to be viable.
Class 2 of AI systems with Semi-Structured Information Repository are most helpful in addressing use cases with very large data/information space: an AI system tasked with answering questions about Wikipedia articles will be more efficient with a retrieval mechanism that points to the external repository than incorporating all the data in a parametric memory through NN memorization. The ability to modify the information in the repository between training time and test/inference time can be important for out-of-domain challenges, even if the relevant information was not present during training. Links to the original information provide a partial but valuable contribution to provenance challenges and improve interpretability and explainability.
Class 3 AI systems with deeply structured knowledge can make a central contribution to increasing the comprehension of the world and create a multi-faceted reflection of the outside world within the AI. This kind of visibility is expected to enable improved cognitive functions and higher machine intelligence. Other advantages can be facilitated by Class 3 systems such as the overlay of context, attribution of source for knowledge; provenance for relations in a knowledge structure, reduced brittleness of models by utilizing concepts and ontologies; adding values and priorities to support goal-based decision making, and more. Class 3 also provides a strong foundation as AI systems transition from performing a function (like question answering) to becoming persistent agents with a set of goals and behaviors. The knowledge base can be seen as a continually evolving active memory for an intelligent agent.
An example to consider is a healthcare assist AI that evaluates the likelihood of infection in a patient. A Class 1 AI with fully encapsulated information can be trained to analyze radiology images and use image recognition to identify potential patterns indicating infection. A Class 2 system with access to an information repository may access medical literature to be retrieved with some patterns as well as support some level of informational QA based on the repository. A Class 3 system with deeply structured knowledge may be able at some point to address several modalities of information (including radiology, medical records, latest research results), provide reasoned analysis including likely causes, and will be able to explain the sources of information and the path to conclusion.
Despite all its considerable strengths, Class 3 systems require a higher level of complexity as they necessitate creating and updating the knowledge base. They also shift the learning processes as the knowledge is now split between the NN and the KB, which will require new techniques to integrate gradient descent statistical methods with symbolic representations and learning.
Selecting the Optimal Information-Centric Architecture Class per AI Goals
Endowing AI with the ability to understand and operate at a higher level of intelligence seems to be necessarily associated with deeply structured knowledge. While active research attempts to create such complex knowledge constructs and models within the medium of the parametric memory and the structures of NN latent space, this approach seems to face significant challenges due to limitations of the stochastic nature of its learning and knowledge representation. Deeply structured knowledge AI architectures that augment neural networks with a knowledge base offer the promise of combining the best of both worlds.
The formalization and elaboration of the types of systems based on their information and knowledge approach are at an early stage. However, a deeper understanding of the attributes of each class will better equip the field in making AI architecture choices based on the target goals and use cases.
Singer, Gadi. “Seat of Knowledge: Information-Centric Classification in AI, Class 1 — Fully Encapsulated Information”. LinkedIn, February 16, 2021. https://www.linkedin.com/pulse/seat-knowledge-information-centric-classification-ai-gadi-singer/
Kautz, Henry. “The Third AI Summer”. AAAI Robert S. Englemore Memorial Lecture, Director’s Cut. https://www.cs.rochester.edu/u/kautz/talks/Kautz%20Engelmore%20Lecture%20Directors%20Cut.pdf (accessed on 06/09/2021)
Singer, Gadi. “Seat of Knowledge: Information-Centric Classification in AI, Class 2 — Semi-Structured Information Repository”. LinkedIn, March 23, 2021. https://www.linkedin.com/pulse/seat-knowledge-information-centric-classification-ai-gadi-singer-1c/
Lewis, P. et al. “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks”. ArXiv, April 12, 2021. https://arxiv.org/pdf/2005.11401.pdf
Singer, Gadi. “Understanding of and by Deep Knowledge — How knowledge constructs can transform AI from surface correlation to comprehension of the world”. Towards Data Science, May 6, 2021. https://towardsdatascience.com/understanding-of-and-by-deep-knowledge-aac5ede75169
Chakraborty, N. et al. “Introduction to Neural Network-based Approaches for Question Answering over Knowledge Graphs” ArXiv, July 22, 2019. https://arxiv.org/pdf/1907.09361.pdf
Kapanipathi, p. et al. “Leveraging Abstract Meaning Representation for Knowledge Base Question Answering”. ArXiv, December 3, 2020. https://arxiv.org/abs/2012.01707
Singer, Gadi. “The Rise of Cognitive AI”. Towards Data Science, April 6, 2021. https://towardsdatascience.com/the-rise-of-cognitive-ai-a29d2b724ccc
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot