The paper is very long.

This page is created for easier reading and referencing in papers.

Have a quick look and see just how long it is via A community-driven vision for a new knowledge resource for AI

Cyc See People also ask

The Complete Paper is analysed much further down.

Ref numbers added. The short and long summaries are via Gemini Google:

Here is the short synthesized version of the paper, reduced to approximately 5000 words while maintaining all original headings and the core community vision.

1 ABSTRACT

2 INTRODUCTION

3 ENVISIONING A KNOWLEDGE RESOURCE

4 FORMALIZING FOUNDATIONAL KNOWLEDGE

5 AUTOMATED REASONING

6 KNOWLEDGE CURATION USING MANUAL METHODS, MACHINE LEARNING AND LARGE LANGUAGE MODELS

7 MODERN EDUCATION ON KNOWLEDGE REPRESENTATION

8 EVALUATING A KNOWLEDGE RESOURCE

9 NEXT STEPS TOWARD CREATING A KNOWLEDGE RESOURCE

10 CONCLUSION

1 ABSTRACT

The goal of creating a comprehensive, multi-purpose knowledge resource—reminiscent of the 1984 Cyc project—remains a critical frontier in AI. Despite the success of specialized resources like WordNet and commercial knowledge graphs, AI infrastructure still lacks verifiable, widely available world knowledge.
Large language models (LLMs) struggle with knowledge gaps, and robotic planning lacks essential commonsense. Following a 2025 AAAI workshop involving over 50 researchers, this paper outlines a community-driven vision for a new knowledge infrastructure: an open engineering framework designed to exploit "knowledge modules" within practical applications through shared conventions and social structures.

2 INTRODUCTION

The Cyc project, launched in 1984, aimed to provide an ontology for human-like reasoning. While 30 years have passed, the AI community has not yet settled the question of how far systems can get without such commonsense knowledge.
While neural networks and knowledge graphs have revolutionized information retrieval and translation, commercial graphs often sidestep everyday commonsense, and LLMs rely on implicit knowledge without formal guarantees. The lack of structured knowledge hinders robotics and the detection of misinformation.
This paper synthesizes the findings of the TIKA-2025 workshop ?? , advocating for an open infrastructure that leverages 40 years of progress in knowledge representation (KR) to create modular, reusable, and verifiable knowledge.

3 ENVISIONING A KNOWLEDGE RESOURCE

We define a "knowledge resource" as curated knowledge—formalized via ontologies, rules, or causal models—that is human-verifiable. more

Problems Solvable: Curated knowledge excels where rules are available (arithmetic, qualitative spatial reasoning) or where high accuracy and transparency are vital (taxation, education). In spatial benchmarks, GPT-4’s accuracy drops from 0.55 to 0.15 as scene complexity increases, whereas axiomatic reasoners remain correct.
Current Limits: Existing resources like Wikidata are often restricted to simple triples. In fields like biomedicine, reasoning must handle varying levels of certainty and complex relationships beyond "established facts." Furthermore, proprietary barriers - Cyc [see 2.1 above], Wolfram|Alpha) hinder open-source community growth.
A New Approach: A modern resource should function as a public utility, pairing formal representation with provenance in multiple modalities (text, images, video). It must utilize expressive languages like Answer Set Programming (ASP) and theorem provers like Lean to provide formal guarantees.
Beneficiaries: It will serve AI engineers seeking "off-the-shelf" knowledge modules, researchers needing structured data for machine learning, and scientists seeking "superhuman" knowledge aggregation across complex domains.

TOP

4 FORMALIZING FOUNDATIONAL KNOWLEDGE

Foundational knowledge covers abstract concepts like time, space, and causality.

Methodology: Scaling requires moving from "ontological prodigies" to community-curated, repeatable engineering practices. This includes identifying concepts through corpus analysis and "successive formalization"—iteratively refining axioms into competent logical theories.
Evaluation: Knowledge must be tested both intrinsically (consistency, completeness) and extrinsically (improved performance on specific tasks).
Challenges: Key hurdles include handling multi-modal data, bridging the gap between logic and the vagueness of natural language, and creating interoperability standards to allow knowledge to be translated across different formalisms.

5 AUTOMATED REASONING

Reasoning encompasses a spectrum from deduction and abduction to induction and analogy.

Discovery: While modern practice uses LLMs for initial insights, reliable decision-making requires specialized modules for space, time, and motion.
Modeling Humans: AI should augment human reasoning, helping users arrive at sounder conclusions while avoiding psychological flaws. This requires a taxonomy of reasoning types and the ability to resolve ambiguities or ignore irrelevant "noise" during processing.
Scale and Context: Reasoning must scale to the breadth of "all science," identifying gaps in existing research. Crucially, knowledge must be organized into context-specific "microtheories" that allow for different scientific viewpoints (e.g., viewing Earth's shape differently in astronomy vs. topography).

6 KNOWLEDGE CURATION USING MANUAL METHODS, MACHINE LEARNING AND LARGE LANGUAGE MODELS

Curation is the lifecycle of gathering and maintaining knowledge.
Human Oversight: Experts are indispensable for defining intent, designing schemas for "durable semantics" (e.g., tracking a movie from novel to release), and validating automated outputs to ensure they align with domain requirements.
Role of LLMs: LLMs can automate labor-intensive tasks by acting as a source of explicit knowledge, aiding in expert elicitation through natural language dialogue, and serving as mediators for casual users interacting with complex formal systems.
The Hybrid Paradigm: The optimal approach combines human expertise with automation in a feedback loop: humans provide initial artifacts and evaluation, while automation handles complex transformations and identifies inconsistencies through rigorous validation.

7 MODERN EDUCATION ON KNOWLEDGE REPRESENTATION

There is a decline in academic focus on KR. To revitalize the field, we propose:

Curriculum Reform: Standard textbooks often portray KR as outdated. We must shift the focus toward "conceptual modeling"—the skill of identifying implicit rules and world-workings before writing code.
Practical Tools: The community should provide "plug-and-play" modules (slides, auto-graders, online sandboxes) for diverse computer science courses.
Broadening Scope: KR should be integrated into databases, software engineering, and high school "AI literacy" programs. Teaching logic through games and puzzles can alleviate the "fear of logic" in higher education.

8 EVALUATING A KNOWLEDGE RESOURCE

AI evaluation currently suffers from "proxy failure," where benchmarks become targets rather than measures of true capability.

Expert Interviews: Qualitative assessment by human experts can be more insightful than quantitative metrics.
Reasoning Steps: Benchmarks should test the individual pieces of knowledge and reasoning steps used, not just the final output.

9 NEXT STEPS TOWARD CREATING A KNOWLEDGE RESOURCE

The TIKA initiative - locate AI Overview, aims to foster a "Hugging Face" for knowledge modules.

more - locate AI Overview.

Recommendations: The resource must be open-source, permissive, and managed by a non-profit. It should focus on specific tasks, providing a "pip install" experience for knowledge.
Use Cases: Future workshops will explore Education (skill graphs), Supply Chain (semantic modeling), and Computational Law (codifying regulations).
Proof of Concept: Partnering with Bosch, the initiative is testing three areas: filling inference gaps in robotic planning, enabling generalizable robotic skill learning, and improving root cause analysis in manufacturing through structured causal models.

10 CONCLUSION

In an era of inscrutable generative AI, curated knowledge offers transparency and formal guarantees. While neural networks model perception, symbolic KR captures structured understanding.
By adopting the open-source engineering practices of the deep learning community—and demonstrating value through industrial use cases—the KR community can provide the interpretability and robustness necessary for the next generation of trustworthy AI.

source A community-driven vision for a new knowledge resource for AI

-----------------------------------------------

The Complete Paper

Includes pagetops - an advantage! Long paragraphs, as above, are broken down and given refs.

Main ordinals commence with + to distinguish those above.

Many readers won't want the complete paper. However, their searches could well highlight items within.

+1 Abstract

The long-standing goal of creating a comprehensive, multi-purpose knowledge resource, reminiscent of the 1984 Cyc project, still persists in AI. Despite the success of knowledge resources like WordNet, ConceptNet, Wolfram|Alpha and other commercial knowledge graphs, verifiable, general-purpose, widely available sources of knowledge remain a critical deficiency in AI infrastructure.
Large language models struggle due to knowledge gaps; robotic planning lacks necessary world knowledge; and the detection of factually false information relies heavily on human expertise. What kind of knowledge resource is most needed in AI today? How can modern technology shape its development and evaluation? A recent AAAI workshop gathered over 50 researchers to explore these questions.
This paper synthesizes our findings and outlines a community-driven vision for a new knowledge infrastructure. In addition to leveraging contemporary advances in knowledge representation and reasoning, one promising idea is to build an open engineering framework to exploit knowledge modules effectively within the context of practical applications. Such a framework should include sets of conventions and social structures that are adopted by contributors.

+2 INTRODUCTION

The Cyc project, started in 1984, created the first large-scale database of commonsense knowledge. The initiative continues to this day with its aim to provide a comprehensive ontology and knowledge base of commonsense knowledge to enable human-like reasoning for artificial intelligence (AI) systems. In the concluding paragraph of his Communications of the Association of Computing Machinery (CACM) 1995 article A Large-Scale Investment in Knowledge Infrastructure (Lenat 1995), Cyc's founder Douglas B. Lenat wrote:Is Cyc necessary?
How far would a user get with something simpler than Cyc but that lacks everyday commonsense knowledge? Nobody knows; the question will be settled empirically. Our guess is most of these applications will eventually tap the synergy in a suite of sources (including neural nets and decision theory), one of which will be Cyc.
Although 30 years have passed since the above article was written, AI research community has not conclusively settled (Brachman and Levesque 2022) the question “How far would a user get with something simpler than Cyc but that lacks everyday commonsense knowledge?”
However, it is clear that significant strides have been made in addressing many of the tasks that were original Cyc use cases, including information retrieval, semi-automatically linking multiple heterogeneous external information sources, spelling and grammar correction, machine translation, natural language understanding, and speech understanding.
Much of this progress has been facilitated by rapid, profound, and synergistic advancements in neural networks, automated agents, and knowledge graphs (Chaudhri et al. 2022). The increasing importance of knowledge graphs is in line with Lenat's early vision for Cyc, as he presented in 1985 (Lenat et al. 1985).
Knowledge graphs that have had commercial impact, for example, Google's knowledge graph (Singhal 2012), have primarily focused on capturing relationships among real-world entities and have sidestepped the challenge of capturing everyday commonsense knowledge.
Likewise, even though large language models (LLMs) perform very well on tasks that require knowledge about the world, that knowledge is most often implicit. Furthermore, it has been difficult to make formal guarantees about how and when LLMs leverage this implicit knowledge (Rossi 2025).
TOP
The lack of large-scale commonsense knowledge bases hinders practical AI applications. For example, robotic task and motion planning would significantly benefit from world knowledge and commonsense task descriptions to be performed by a robot (Jiang et al. 2019).
At present, such knowledge is custom-built for each project. Pressing problems such as identifying factually false information (National Academies of Sciences, Engineering, and Medicine 2023) require trusted knowledge; without it, systems must resort to guesswork, and are forced into over-reliance on human expertise to guide them. To ground the narrative in this paper, the examples of “robotics” and “identifying factually false information” are used to illustrate how many applications suffer from a lack of knowledge.
This raises important questions about AI infrastructure. What kind of knowledge resource is most needed in the modern context? What categories of knowledge should be included in such a resource? How should they be represented? How should they be made accessible to external users and applications? How can recent technological advances be harnessed to create such a knowledge resource? How should such a resource be evaluated?
A workshop at the 2025 Conference for the Advancement of Artificial Intelligence (AAAI) gathered over 50 researchers, including the authors of this paper, to explore these questions (TIKA-2025: AAAI 2025 Workshop on A Translational Institute for Knowledge Axiomatization 2025).
This paper is an attempt to synthesize our discussions and present a community view on the requirements and approach for creating a new knowledge infrastructure. We concluded that building an open engineering infrastructure for disseminating and using knowledge modules in practical applications, by leveraging steady advances in knowledge representation and reasoning over the last 40 years, is a promising path forward. This includes establishing sets of conventions and social structures that should be adopted by contributors.
We begin by envisioning the type of knowledge resource required in the current context of AI and then exploring aspects of creating it, including foundational knowledge, domain-specific knowledge, automated reasoning, and evaluation. We also review the current state of education in knowledge representation and reasoning. We conclude by presenting a community view on productive steps toward the creation of this much-needed knowledge resource.
source A community-driven vision for a new knowledge resource for AI

TOP

+3 ENVISIONING A KNOWLEDGE RESOURCE

We use the term knowledge resource to refer to a body of curated knowledge that can be examined and verified by humans. This knowledge could be formalized in any computational framework, including ontologies (Noy 2001), rules (Genesereth 2022), a constraint network (Dechter 2003), a probabilistic causal model (Pearl 1989), and even in unambiguous natural language (Kowalski et al. 2023).
We envision the knowledge resource needed for AI by exploring four points: the type of problems that could best be solved by it, the limits of current practice, what should be done differently, and its likely beneficiaries.
Problems solvable by a knowledge resource
Data-driven learning is central to modern AI. But in some cases, curated knowledge can be better. For example, in basic arithmetic, people easily outcompete chatbots such as ChatGPT because they learn how to add and multiply numbers using rules rather than by looking at a large number of examples (Cheng and Yu 2023).
Curated knowledge works better than data-driven learning in these scenarios: rules needed to solve the problem are readily available, such as in systems of axioms for qualitative spatial reasoning (Cohn and Renz 2008; Forbus 2019); in applications that require high levels of accuracy, transparency, and reliability, such as income tax calculations; and in contexts in which clarity and precision are crucial, such as in education.
It has been shown through the qualitative spatial reasoning benchmark, Room Space 100, that when the number of objects in a scene (n) increases from 3 to 6, GPT-4's accuracy generally declines, for example, for n = 3, GPT-4 has an accuracy of 0.55, and for n = 6, an accuracy of 0.15 (Li et al. 2024).
It has also been found that GPT-4 could solve some action-based complex puzzles—performing better when generating Python code—but an axiomatic reasoner using a proper action description language solved all puzzles correctly (Ishay and Lee 2025).
A set of concrete examples where a knowledge resource can supplement the knowledge in large language models has been put out by Wolfram|Alpha (Stephen Wolfram 2023). Examples include “distance between Chicago and Tokyo,” “3 to the power 73,” “circumference of an ellipse with axes 4 and 12”, etc.
A knowledge resource is necessary for us to develop automated reasoning and intelligence that is more human-like, capable of participating in formulation of formal models from everyday inputs and interpreting their results in real-world terms (model formulation and model interpretation) (Forbus 2019).
This requirement is supported by a recent survey of AI community in which 61.8% of survey participants estimated the minimal percentage of symbolic AI techniques required for reaching human-level reasoning to be at least 50% (Rossi 2025).
TOP
Limits of the current practice
Current knowledge resources such as Wikidata (Vrandečić and Krötzsch 2014) and Google's knowledge graph (Singhal 2012) capture relationships between real-world entities such as people, places, organizations, and so forth. ConceptNet captures relationships between concepts (for example, a knife is used for cutting) (Speer et al. 2017), but the relationships are limited to triples that are awkward or insufficiently expressive for capturing general-purpose knowledge. LLMs encompass enormous amounts of knowledge implicitly. This is effective for many applications, but their output lacks formal guarantees (Rossi 2025).
Applications such as biomedicine require a much higher level of expressivity. For example, a drug may up-regulate or down-regulate a gene depending on the context (Unni et al. 2022). The knowledge in biomedicine is often below the level of certainty of “established facts.” In clinical practice, a great deal of reasoning has to rely on assertions, statistical associations, and observations. Existing reasoning engines are unable to accommodate varying levels of certainty at the discretion of the practitioner.
Knowledge resources such as Cyc (Lenat 1995), Wolfram|Alpha (Wolfram Research 2024), Component Library (Barker et al. 2001) or its variant CoreALMLib (Inclezan 2016) attempt to bridge this gap. Cyc, for example, targets formalization of complex knowledge patterns across the full spectrum of the human world.
Because of the restricted intellectual property of Cyc and Wolfram|Alpha, limited access to them is not conducive to fostering an open-source community. OpenCyc, which is the open-source version of Cyc, is useful for limited purposes, as it provides access to only a small fraction of the full knowledge base (Opencyc 4.0. 2014).
Demonstrations of inference gaps in LLMs for spatial reasoning and action reasoning cited in the previous section have been limited to academic settings. Establishing that such inference gaps are critical for real-world applications is an open research problem.
What should be different about a modern knowledge resource?
A modern knowledge resource should provide a public-utility-like infrastructure, becoming a go-to place for trusted and verified knowledge and reasoning methods across a variety of domains. Toward that end, it should provide knowledge in such a way that its formal representation is paired with its provenance in multiple modalities, including text, images, video, or other formats such as graphs. As most pieces of knowledge are not universally true, emphasis must be placed on the applicable context of that knowledge.
The knowledge resource should advance the state-of-the-art in how knowledge representation and reasoning are leveraged in engineered systems. As there are ongoing initiatives assembling knowledge in data graphs (e.g., Wikidata uses RDF) and ontologies (e.g., Stanford's BioPortal uses Web Ontology Language or OWL), we are envisioning that the proposed resource will use a language more expressive than these existing efforts.
Examples of such languages include answer set programming (Lifschitz 2019) and its query-driven variants such as s(CASP) (Arias et al. 2018), Rulelog/Ergo (Grosof et al. 2023) and other extended well-founded logic programs, theorem proving systems such as Lean (Moura and Ullrich 2021), constraint representation (Dechter 2003), methods to incorporate uncertainty (Koller and Friedman 2009) and causal information (Pearl 1989, 2009).
The knowledge resource should foster a distributed community, document use cases, and enable collaboration. Instead of wasting time debating ideal knowledge representations, the community's focus should be on developing shared infrastructure and interoperability and on maintaining consistency across representations whenever possible.
The resource should enable unanticipated linkages across diverse domains, for example, environmental knowledge could be cross-linked to health knowledge, population knowledge could be linked to economic factors, and so forth.
source A community-driven vision for a new knowledge resource for AI
TOP
Beneficiaries of a knowledge resource
The knowledge resource must exist in the modern ecosystem of societal needs and challenges.
The knowledge resource will benefit AI engineers creating novel AI applications by weaving together different off-the-shelf components. If knowledge modules are made available in a way that engineers can use them as easily as installing a new Python library, a range of AI applications will benefit, including those incorporating agentic workflows and generative AI models.
The knowledge resource will also be an important new form of data for training machine learning systems. Most existing machine learning data sets are limited to facts.
The knowledge resource will benefit applications that require world models. Examples of such applications include those from the robotic planning, which must bridge between sensor-level data and higher-level cognitive world models, and the fact-checking systems that must detect factually false information.
With the growth in biomedical literature, it is not humanly possible for any one person to absorb and synthesize everything that is being published. A knowledge resource will enable superhuman knowledge aggregation and consistency maximization over complex domains (such as molecular biology, medicine, or social policy) (Witbrock et al. 2015).
A knowledge resource will benefit knowledge representation and reasoning researchers by catalyzing research and application in this area. UC Irvine's library of machine learning tasks and data sets (UCI Machine Learning Repository 2025), and the Hugging Face repository of transformer models (Hugging Face 2025) have catalyzed the research and adoption of machine learning and natural language processing. A similar open-source resource will benefit knowledge representation and reasoning researchers.

+4 FORMALIZING FOUNDATIONAL KNOWLEDGE

Foundational knowledge involves representing abstract knowledge, such as knowledge about time, space, actions, causality, and mid-level knowledge, such as the working of physical devices or tenets of social psychology, qualitative physics, etc. (Davis and Marcus 2015). To achieve maximum reusability and applicability, the knowledge resource must leverage foundational knowledge.
We organize our discussion along four dimensions: easily available foundational knowledge, methodology for collecting and using foundational knowledge, evaluating its impact on the system behavior, and some long-term challenges.
Easily available foundational knowledge
As there has been much work in representing foundational knowledge, a knowledge resource can easily bootstrap from this prior work. This includes qualitative representations of time and space (Cohn and Renz 2008; Walega et al. 2015; Forbus 2019), representations of events and actions (Gelfond 1998), formalizations of psychology (Gordon 2017), and representations to capture constraints (Dechter 2003) and probablistic knowledge (Pearl 1989, 2009; Koller and Friedman 2009; Darwiche 2009). OpenCyc's ontology with integrated representations for vision, space, and language is also available (Forbus 2025).
Methodology for collecting foundational knowledge
The success of some past foundational knowledge representation efforts can be attributed to the rare contributions of ontological prodigies, for example, Pat Hayes, Jerry Hobbs, and Ernest Davis. In contrast, a scalable approach requires community-curated resources that leverage repeatable processes and well-defined engineering best practices for building foundational knowledge infrastructure. To support this, a structured, multi-step approach is desirable.
First, start with a broad theoretical understanding by identifying key concepts in a domain. This process can be aided with corpus analysis, for example, by identifying sample texts in a domain for the occurrence of certain concepts (Chaudhri et al. 2014).
Second, the dual goals of broad coverage and inferential competency in foundational theories can be pursued via the approach of successive formalization (Gordon 2017). In this method, first-draft axioms are authored to support inference across the entire set of domain concepts previously identified in the first step, then incrementally formalized into competent logical theories through elaborations and refinements.
The process of representation is iterative in that the initial choices for predicates and functions will likely evolve as the work progresses. It is crucial to recognize both obvious and subtle issues and to balance theoretical perspectives with practical problem-solving. Through this iterative process, one can develop a more refined understanding.
Finally, large language models can potentially contribute to the creation of foundational knowledge. This is, however, an area of research that is under-explored (Hitzler et al. 2024). An initial discussion on the use of LLMs appears in section, Role of Large Language Models in Knowledge Curation.
TOP
Evaluating foundational knowledge
We can evaluate foundational knowledge using both intrinsic and extrinsic methods. Intrinsic evaluation involves a theoretical analysis based on criteria such as completeness, consistency, elaboration tolerance, redundancy, etc. Extrinsic evaluation involves showing that the use of foundational knowledge improves performance on a suite of tasks. Extrinsic evaluation is more easily understood by the stakeholders because of its direct connection to specific problems.
Foundational knowledge can also be evaluated on the basis of whether it can inform the creation of nuanced and efficient test datasets. For example, instead of creating a broad test data set in which all examples are similar, we can use the foundational knowledge to identify the corner cases and then design the test data to cover those corner cases, thus yielding a more compact test set. Finally, an indirect measure of the usefulness of foundational knowledge is in its adoption and reuse beyond its original creators.
source A community-driven vision for a new knowledge resource for AI
Long-term challenges in formalizing foundational knowledge
There is no dearth of problems when it comes to formalizing foundational knowledge, but a few of them stand out.
First, real-world problems tend to be multi-modal in nature. For example, foundational knowledge will need to handle text combined with images, audio, video, and text. The foundational knowledge must address conceptual, temporal, and spatial aspects taken together.
Second, there needs to be a better bridge between the foundational knowledge and the abstract and vague aspects of natural language. Different logical forms of verbs and different senses of words can be typically understood by humans through surrounding context, but the current foundational theories offer no similar mechanisms. For example, consider the sentences “the bottle contains wine” and “the wine contains alcohol.” Foundational theories must provide a way to disambiguate between such different uses of “contains.”
Finally, there is still no easy way to translate across different representations of knowledge. For storing pictures, for example, there are multiple formats such as JPEG, PNG, etc.
Translation tools exist for going across them, although some of the translations can be lossy. In contrast, knowledge represented in one particular formalism remains locked into that formalism, and it is not straightforward to exploit it in a system different from the one it was originally developed in.
Creating a standard for interoperability between formalisms, such as was done for the less expressive language OWL, is a possible approach.

+5 AUTOMATED REASONING

The word “reasoning” has been used to refer to a variety of computational processes (Rossi 2025). On one hand, we have deductive or probabilistic reasoning in which typically there is a formal proof that relates a question to its answer. On the other hand, we have inductive reasoning and analogical reasoning in which there may not always exist a formal proof that relates a question to an answer.
In contrast to classical forms of deductive and inductive reasoning, a major emphasis in logic and knowledge representation and reasoning (KR) research has also been on abductive reasoning, which is a cornerstone of hypothesis formation and belief revision. Computer scientists strive to associate formal properties and guarantees for all of these reasoning processes. Examples of such formal properties include soundness, completeness, and tractability.
For the purpose of the present paper, the term reasoning refers to the full spectrum of computational processes that have been assigned this label in the literature. Given this broad notion, we will organize our discussion along the following dimensions: discovering axioms automatically, modeling human reasoning, reasoning at scale, and incorporating context into reasoning.
TOP
Discovering axioms
The problem of automatically discovering axioms has been traditionally studied under the topic of inductive logic programming (Muggleton 1991). Related efforts exist in qualitative reasoning to learn new patterns based on analogical generalization (McLure et al. 2010) or as default rules (Wang and Gupta 2024). Modern practice, however, relies on a combination of manual and automatic approaches.
For example, while pursuing Bayesian causal reasoning, it is expected that partial knowledge about the world is provided by an external source. Likewise, inductively acquiring knowledge about everyday embodied human interactions (e.g., from multimodal data) requires support for specialized domains such as space, time, and motion (Suchan et al. 2016). To support reliable and predictable decision support, large language models are coupled with an external module that contains human verifiable and explicit knowledge.
Open challenges in modeling human reasoning
Human reasoning is often prone to error. For instance, people frequently favor conclusions that are psychologically appealing over those that are mathematically sound (Tversky and Kahneman 1974). This raises concerns about designing AI systems that directly mimic human reasoning, as they risk inheriting these flaws. A more pragmatic objective is to develop AI systems that assist and augment human reasoning, helping individuals arrive at more accurate and reliable conclusions. In the following section, we outline the key challenges in pursuing this goal.
First, it will be helpful to develop a taxonomy of distinct kinds of human reasoning. Such a taxonomy will enable better communication about which aspect of human reasoning is being modeled in a computational system. A partial taxonomy is available in the existing literature for reasoning tasks such as query answering, planning, projection, diagnostics, etc.
Second, the human understanding of the world has a symbolic structure, and AI programs must exploit this to reason correctly, even though they might rely on raw data for some of the processing. For example, when humans engage in diagrammatic reasoning, they are able to supplement their purely logical reasoning with diagrams and sketches by exploiting the symbolic structure of the world.
Third, just as humans can resolve ambiguities and conflicts during a conversation, reasoning tools must be able to do the same. Humans often make judgment calls because of these ambiguities and conflicts. Reasoning tools should be able to represent such judgment calls.
Fourth, the reasoning systems must be good at ignoring irrelevant details. For example, in traditional procedural programming a local variable has effect only in a certain scope. Similarly, these reasoning processes should be able to keep only a subset of facts in scope that are relevant for the current problem.
Fifth, the reasoning tools should be such that complex reasoning mechanisms that humans use, such as deduction, abduction, induction, counterfactual reasoning, etc., can be elegantly captured.
Finally, graphical models, be they deterministic or probabilistic, enable causal and counterfactual reasoning that are central to human reasoning. A primary challenge is to acquire the causal model, even partially (namely the causal graph). Once a partial causal model is available, computing causal effect and counterfactual reasoning will be facilitated and should be further explored.
source A community-driven vision for a new knowledge resource for AI
TOP
Reasoning at scale
With the growing knowledge in science, it is humanly impossible for any one person to effectively reason with it all at once. Reasoning at the scale of all science is, therefore, a practically useful challenge for AI systems. Reasoning can be particularly effective in processing what is already known about science and pinpointing gaps for further research.
For example, structural causal models and constraint reasoning models can be especially effective to support drug discovery and protein design.
Incorporating context into reasoning
Incorporating real-world constraints into reasoning is necessary for it to work correctly. Cyc's knowledge base achieved this goal by organizing its knowledge into microtheories into a hierarchical structure (Lenat 1995). The same goal can be achieved through other methods.
We must not assume our knowledge to be a single monolithic structure. Different knowledge modules that apply to different contexts should be able to interoperate with each other depending on the problem at hand.
Real-world reasoning scenarios themselves present constraints that the reasoning process should be able to pick up. For example, designing an artifact requires understanding its operating conditions to ascertain what materials are appropriate.
Many of these issues can be addressed by checking inconsistencies with the real-world constraints, and through the development of specialized but domain-independent solvers integrating aspects such as space, motion, actions, events, and dynamics (Walega et al. 2015; Suchan et al. 2016).
As science is a social process, reasoning methods must gracefully integrate with existing workflows and take into account the assumptions that the scientists are making. For example, a recent experiment combining qualitative process theory with a language model led to substantial improvements in performance (Victor 2025).
Finally, as much of reasoning in science adopts a certain point of view, choosing the correct viewpoint is crucial. As an example, the shape of the Earth is viewed differently in topography versus astronomy.

+ 6 KNOWLEDGE CURATION USING MANUAL METHODS, MACHINE LEARNING AND LARGE LANGUAGE MODELS

Knowledge curation is the process of gathering, extending, and maintaining knowledge. Knowledge curation encompasses the full life cycle of specifying the requirements, schema design, data cleaning and loading, debugging and troubleshooting, and revising the knowledge.
Current practice on knowledge curation relies on teams of knowledge engineers and domain experts. We believe that this needs to change. Next, we address the role of human oversight and automation in knowledge curation and popularizing knowledge curation among scientific communities.
Role of human oversight in knowledge curation
Human oversight is indispensable in any knowledge curation effort. We will illustrate this using three use cases: knowledge curation to support web search, compliance to emission standards, and development of machine learning solutions.
Accurate results for web searches require durable semantics. For example, a movie can go through several stages—initial publication of the story as a novel, availability of screenplay, filming and production, box office release, Netflix release, release in other languages, and so forth. As this process unfolds over a number of years, the search engine must correctly correlate different versions of the movie. At present, such correlation requires human oversight through a careful schema design.
To test the compliance of emission standards of automobiles, much sensor data is available, but most of it is not relevant. Human oversight is needed to identify the relevant aspects of sensor information that should be used for checking compliance with standards.
Most machine learning approaches require data that must be annotated by humans. Once the model is trained, reinforcement learning with human feedback is a vital part of model fine-tuning. Once a machine learning model is deployed, human oversight is necessary to ensure that the model performance does not drift as the input data evolves.
In summary, human experts can provide initial domain knowledge artifacts and define their intent to guide automated knowledge curation systems. This human-guided initialization helps systems understand specific domain contexts and requirements before any automation begins. Even after the initial automated generation of logical structures, human reviewers must validate and refine generated representations, ensuring accuracy and alignment (Akinfaderin and Diallo 2025). The quality of automatically curated knowledge ultimately depends on human expertise to verify that automated outputs correctly represent the intended domain knowledge.
source A community-driven vision for a new knowledge resource for AI
TOP
Role of large language models in knowledge curation
We assume that any step of the knowledge curation process that could be automated should be automated. The question we address is whether previously human labor-intensive tasks can now be automated with the advent of LLMs. LLMs can enable knowledge curation in at least three ways: becoming a source of knowledge, aiding in knowledge elicitation, and serving as knowledge curators.
LLMs capture an immense amount of knowledge implicitly. We can interrogate them to explicitly emit their knowledge on topics of interest for a given application. Such explicit knowledge, either in a natural or a formal language, can be used in multiple ways. It can be directly built into the application under human oversight and used by a reasoning process. It can also be used as a way to gain insights into the domain of interest, which can speed up the downstream design work of curators.
LLMs have enabled the construction of powerful chatbots. This capability could be leveraged toward creating a systematic methodology to facilitate interdisciplinary knowledge acquisition. LLM-based natural language dialogs would need to be designed that support a domain expert in articulating knowledge, which can be further curated by knowledge engineers or by the LLM itself.
LLMs might be used as components in knowledge curation interfaces between humans and large, complex knowledge resources. As knowledge resources get big, they become difficult to understand by casual users. LLMs could provide an ability for a broader class of users to add and contribute their knowledge to a knowledge resource. In this scenario, an AI system using an LLM is a useful mediator between a human and a complex knowledge base.
In practice, frontier knowledge curation systems powered by LLMs have shown promise in analyzing documents, identifying key concepts, translating natural language into formal representations, and combining them into comprehensive knowledge models (Akinfaderin and Diallo 2025). This automation significantly reduces the manual effort traditionally required for knowledge formalization.
These systems can auto-formalize in order to validate claims, applying automated reasoning to detect factual inaccuracies with minimal human intervention and with explainability built-in. When validation fails, advanced systems can generate suggestions showing alternative representations that would resolve inconsistencies, effectively automating parts of the knowledge refinement process that previously required extensive human expertise.
We posit that an optimal approach to knowledge curation combines human expertise with automated systems in a continuous feedback loop. In this paradigm, humans provide initial knowledge artifacts and domain expertise, while automation handles complex transformations into formal structures that support verification (Akinfaderin and Diallo 2025).
Human-in-the-loop testing and validation exemplify this partnership: humans review and pose test scenarios and evaluate presented outcomes, while automated systems apply rigorous validation against established, formalized knowledge bases. When inconsistencies are detected, these systems provide grounded explanations and suggestions, which humans can then use to refine knowledge representations or improve system responses. This creates a continuous improvement cycle where human oversight guides automation, and automation enhances human capabilities.
TOP
Popularizing knowledge curation among scientific communities
The knowledge curation enterprise must be popularized among the scientific computing communities. Database curation efforts already exist across multiple sciences, by one count, there are 27 databases in materials science and chemistry alone (Blaiszik 2025). But few scientific communities are currently creating knowledge bases using expressive knowledge representation languages. Though some work leverages knowledge graphs (Segler and Waller 2017; Mrdjenovich et al. 2020), current approaches have many limitations. The representation of relationships between equations, variables, and broader theories that we see in Wikidata is limited.
For instance, in Wikidata, “Stoke's Theorem” is a “generalization of” “Green's Theorem”—this captures some connections, but “generalization of” carries no deeper meaning about the nature of this generalization. Languages such as Lean provide greater expressiveness for formalizing scientific knowledge (Bobbin et al. 2024; Tooby-Smith 2025), but its complexity makes it challenging to use by users outside formal logic disciplines.
Nonetheless, projects like PhysLean aim “to create a library of digitalized physics results in the theorem prover Lean 4, in a way which is useful to the broad physics community” (Tooby-Smith 2025). We envision formalized versions of scientific texts like Feynman's Lectures (Feynman et al. 2013), where the scientific content and derivations are structured, functional, and executable, with concepts interlinked across the text. Achieving this requires popularizing knowledge curation among scientific computing communities so that more of these specialists come forward to contribute.
source A community-driven vision for a new knowledge resource for AI

+7 MODERN EDUCATION ON KNOWLEDGE REPRESENTATION

As highlighted in a recent report of a Dagstuhl seminar (Delgrande et al. 2023), there has been a consistent decline in the open academic positions in knowledge representation and reasoning, as well as in the number of students and researchers attracted to this field.
There is a concern that after the current faculty members teaching knowledge representation retire, there is no plan to replace them. In this section, we consider in more detail the current practice for teaching knowledge representation, identify what is missing, and outline potential steps for the future.
Current practice for teaching knowledge representation
At most universities, especially in the United States, knowledge representation is taught as part of either an AI course or as a module on logic embedded in a course on discrete mathematics. Some universities provide knowledge representation and reasoning courses as advanced electives, and there are several textbooks to support such courses (Brachman and Levesque 2004; Reiter 2001; Hendler et al. 2024). Currently used standard AI textbooks (Russell and Norvig 2021; Poole et al. 2010) have several chapters on knowledge representation.
Several modern textbooks are available on logic programming (Genesereth 2022; Lifschitz 2019; Gelfond 2014; Gebser et al. 2012; Darwiche 2009). Three of these textbooks focus exclusively on answer set programming (Lifschitz 2019; Gelfond 2014; Gebser et al. 2012).
They are used in courses taught by faculty members associated with that field. Two of these books come with online repositories containing slides and other teaching materials (Kahl and Michael 2025; Potassco 2025). There is also a textbook that focuses on teaching an audience without a technical background how to think using computational ideas Kowalski (2011), which has found home in some philosophy courses. In addition, computer science departments at many universities offer courses in computational logic that cover knowledge representation and reasoning.
Wright State University has created an educational hub to teach people about knowledge graphs (Kastle Lab 2025). Their curriculum is tailored to different audiences ranging from students to senior executives. They are also developing an industry certification for knowledge graph professionals in cooperation with the Knowledge Graphs Conference.
Northwestern University has a Knowledge Representation and Reasoning course that exposes students to logic, Semantic Web technologies, and Cyc-style knowledge bases. Students get hands-on experience with industrial-scale knowledge bases with a heavy project-based component.
Within the industry, especially at Cyc, teaching materials tailored to their technology have been developed. This teaching material typically assumes familiarity with logic. and focuses on teaching practical skills for expressing a given piece of knowledge formally.
Deficiencies in the current education practice
As recent AI research has been dominated by approaches based on machine learning, the coverage of knowledge representation in standard textbooks (Russell and Norvig 2021); Poole et al. 2010) is out of date. There is also a tendency to portray the topic in less than positive terms.
The current teaching practice on knowledge representation does not always succeed in conveying its importance for building reliable computing systems. Most courses, with a few exceptions, are limited to using small theoretical examples, without making adequate connections to the real-world problems and actual impact. There is inadequate integration of logic into other computer science courses. For example, many students do not realize that they could use propositional logic to debug their if-then-else statements.
Another major missing piece in the current teaching practice on knowledge representation is in cultivating an ability to identify implicit and explicit knowledge and rules of thumb that capture how the world works. In other words, the current teaching fails to cultivate skills to conceptually model a task domain and answer questions about what that task/domain is.
For example, given a short story such as: “For sale: baby shoes, never worn,” one should be able to infer the implicit possibility of the death of the child and the tragedy of having to sell the shoes. Such skills are cultivated in courses on philosophy and literature, but similar skills are needed for computer science students to be effective at knowledge engineering.
There is also a philosophical deficiency in the framing of computer science curricula, as they primarily focus on “how to build” versus “how to understand.” This encourages students to rush toward coding solutions instead of developing clear specifications. Clear specifications require them to think formally about the requirements and clarify any implicit information before engineering a solution.
TOP
Steps to improve the teaching of Knowledge Representation
Knowledge Representation and Reasoning community should make the teaching materials easily available. Much can be learned from a similar effort undertaken at the University of California at Berkeley for a course on introduction to AI (UC Berkeley 2025).
To make the teaching materials easy to use, the community should develop modules that can be easily picked up and plugged into a variety of computing courses. The modules should touch on various aspects of knowledge representation and be accompanied by slides, worked-out examples, sample exercises, sample projects, and exam questions.
Consistent with these goals, the Prolog Education Group (PEG) was founded in 2022, on the occasion of the 50th anniversary of the programming language Prolog, to “teach logic, programming, sound reasoning, and AI” to people of all ages and to develop and provide relevant educational resources (Dahl 2025). The efforts of the PEG group need to be expanded beyond teaching programming to include topics of knowledge representation.
Modern platforms for instructors, such as Gradescope (Gradescope 2025) or GitHub Classroom (GitHub 2025), support means for implementing automatic grading utilities for programming exercises and projects. Providing such utilities is indispensable for easy adaptation of teaching materials.
Similarly, providing user-friendly sandbox environments online for various modules will lower the barrier for integrating material at different academic levels. For example, while undergraduate students in a computer science program might be asked to install a specific reasoner or solver to execute sample code, an online sandbox environment supported by that reasoner could allow a high school student to practice the same task.
New materials need to be developed to address “why care” questions that illustrate the practical application of knowledge representation in multiple fields. A repository of real-world examples should be developed.
We should integrate the teaching of logic programming into other computing courses. For example, within a course on databases, logic programming should be taught as the foundation of modern database management systems.
Similarly, a course on programming languages can incorporate material depicting answer set programming and the algorithmic aspects of systems that support this knowledge representation paradigm. Some universities, for example, University of Nebraska, Omaha, and University of Texas at Dallas, already integrate logic programming into other courses as suggested here.
The community should develop a knowledge representation body of knowledge similar to software engineering body of knowledge (Bourque et al. 2014), against which certifications could be granted. Teaching materials should also underscore the importance of interoperability, demonstrating how different knowledge representations and reasoning systems can be integrated. A key component of the teaching materials should be clear instructions and examples of how to integrate the knowledge resources into different applications.
Much of R&D in knowledge representation over the last 2–3 decades has focused on extending the expressiveness of declarative logic programs to go beyond that of knowledge graphs and relational databases. AI education should include coverage of those key expressive features, including answer set semantics versus well-founded semantics, higher-order syntax, constraints, etc.
AI literacy courses are being developed across many institutions to teach people about new developments. The community should engage with such initiatives to ensure that the role and value of knowledge representation are adequately covered. It is especially important to include an adequate history of different developments to avoid reinvention of already-known concepts.
It would be valuable to introduce set theory and logic in high schools (Chaudhri 2024). These students should be taught different forms of reasoning, including deduction, abduction, and induction. The importance of logic should be highlighted at an early age through games, puzzles, and debates. This will not only make students critical thinkers but also alleviate any fear of logic courses in college. The Prolog Education Group has initiated efforts in this direction (Gupta et al. 2024).
source A community-driven vision for a new knowledge resource for AI
TOP
Most courses in computer science, except perhaps software engineering, can be reframed as knowledge representation courses by focusing on objects, their relationships, and the types of questions or tasks associated with them. Even in software engineering, precise reasoning is critical for developing requirement specifications. The key distinction between an introductory programming course and a knowledge representation course lies in the type of knowledge being represented, its intended audience, and its impact.
By embedding knowledge representation throughout the curriculum, students could be trained as both computer scientists and implicit knowledge representation experts—without them explicitly realizing it. This approach reframes computer science not just as an engineering discipline but also as a natural science.
More broadly, universities can be encouraged to develop an explicit categorization of courses relevant to knowledge representation (perhaps in computer science, philosophy, library science, etc.). Universities could be encouraged to offer a minor concentration in knowledge representation. A consolidated resource that gathers different courses and their relevance to knowledge representation would make it easier for universities to offer this minor.

+8 EVALUATING A KNOWLEDGE RESOURCE

We can think of evaluating a knowledge resource in at least three different ways. First, we can evaluate individual modules, for example, evaluating foundational knowledge or evaluating how effectively existing knowledge sets are hosted and disseminated. Second, we can evaluate it in terms of how effectively it fosters a community that helps create it. Finally, we can evaluate a knowledge resource as an enabler of AI and its role in the current frontier of AI developments.
This section situates the knowledge resource evaluation in the context of current developments in AI. From that perspective, we will first consider the limitations of the current practice of evaluations in AI and then consider a few alternative ways to perform better evaluations through expert interviews, virtual environments, and examination of the working of the system.
Limitations of the current evaluation practice
Current evaluation practice in AI suffers from proxy failure—when a measure becomes a target, it ceases to be a good measure (John et al. 2024). With human testing, there is an imperfect correlation between a benchmark and the underlying capability that the benchmark is assumed to measure; with AI evaluation, this gap is much wider. For example, a program doing well on a multi-state bar exam is hardly suitable to practice law. Furthermore, when benchmarks are the basis for evaluating and comparing AI systems, those benchmarks are subject to corruption pressures, such as training for the test.
The Winograd Schema Challenge (Kocijan et al. 2023) was designed to evaluate the ability to use commonsense knowledge to disambiguate pronoun references. The research community has developed programs that do well on the test without incorporating the explicit common sense that the test was originally meant to test.
We next discuss different approaches to address proxy failure.
Evaluation through expert interviews
Evaluation of an AI system by an expert user interacting with it over just a few hours can be much more insightful than the quantitative metrics reported by benchmarks (Cohn and Hernandez-Orallo 2023). Unlike the famed Turing test, the evaluator must know that the subject of evaluation is a computer program and should be given information about its architecture. If the task requires domain expertise, the evaluation panel can include both an AI expert and a human expert.
Evaluation in virtual environments
Virtual environments, simulated worlds, and games can be effective in evaluating targeted aspects of reasoning. For example, the angry birds competition (Renz et al. 2015) has been effective at exploring reasoning about actions and qualitative reasoning. Game environments and cognitive robotics tasks can also constrain the allowed moves in ways that can explicitly force any successful program to reason. Recently proposed Gardner test is an example of such a test that situates the rich tradition of General Game Playing competitions in the context of modern generative AI systems (Chaudhri 2025).
Evaluation of the functioning of the system
Benchmarks should test not just the output of the system but the reasoning steps and individual pieces of knowledge that were used in producing the result. Even though it may be expensive to produce such test sets from scratch, existing resources such as the Cyc knowledge base could be leveraged to generate such tests. Such a resource would significantly enhance the existing collection of problems assembled by the common-sense reasoning community.
source A community-driven vision for a new knowledge resource for AI

TOP

+9 NEXT STEPS TOWARD CREATING A KNOWLEDGE RESOURCE

The AAAI workshop (TIKA-2025: AAAI 2025 Workshop on A Translational Institute for Knowledge Axiomatization 2025) that is the basis of the present article is part of a larger initiative (Shimizu and Chaudhri 2025) to formulate and execute a research program to develop a new knowledge resource for AI. The present workshop is the first step toward finding the requirements and building a community. We have already planned three additional workshops that are focused on use cases for the envisioned knowledge resource.
After the AAAI workshop, a subset of the participants formulated a project to develop a proof of concept for the utility of the knowledge resource. With the wider community feedback and the results from the proof-of-concept project, we will be in a better place to design the engineering framework and formulate the associated research problems that must be solved.
For the rest of the section, we discuss the recommendations from the workshop participants, plan for the follow-up workshops, and describe the proof-of-concept feasibility study.
Recommendations from the workshop participants
The workshop participants agreed that a meticulously curated knowledge resource, along with the necessary tools and methodologies for its effective use, is sorely needed.
For creating such a knowledge resource, much can be learned from the Hugging Face repository (Hugging Face 2025). A similar dynamic platform promoting interoperability among various knowledge representations and reasoning systems should be created. The portal should be organized around specific tasks that can be performed using the knowledge.
It should provide a sandbox for trying out different reasoning capabilities without requiring any licenses or difficult installations. The portal should inspire contributions from students taking knowledge representation courses worldwide. The problems addressed by the portal should be of immediate relevance to a cross section of the industry. It should be straightforward to make use of any given package by simply doing “pip install ”.
While some argued for intrinsic value of a knowledge resource on its own, standalone knowledge resources already exist, for example, Common Logic Ontology Repository (Grüninger 2012), Standard Upper Merged Ontology (Niles and Pease 2001), and Basic Formal Ontologies (Arp et al. 2015). These prior existing knowledge resources differ from the Hugging Face model and the new knowledge resource envisioned here in that they do not target any specific end-user task. In addition to focusing on specific tasks, the success of Hugging Face can be attributed to providing interoperability between a few prevalent deep learning models, creating a unified API that simplifies training across different model architectures and tasks, and providing a hub for model training and discovery.
The knowledge resource should be positioned as a source of trusted and verifiable knowledge. LLMs should be explored as an initial use case needing a trusted knowledge resource. In addition to LLMs, a few other use cases must be identified. These use cases must span the breadth of knowledge levels/certainty, as well as mission criticality.
To avoid some of the barriers that have prevented Cyc from being widely incorporated in contemporary AI systems across academia and industry, there was an overwhelming consensus that the knowledge resource needs to be open-source and released under a permissive license.
TOP
The community also agreed on the need for effective teaching materials in the form of modules that can be easily adapted by others. More details have been discussed in the section on Steps to Improve the Teaching of Knowledge Representation.
Acknowledging the importance of evaluation and validation, our discussion highlighted the need for test sets and benchmarks, potentially developed in collaboration with existing initiatives such as Cyc and MLCommons (MLCommons 2025). This would ensure the reliability and accuracy of the knowledge resources while also mitigating the risk of redundant or inaccurate data proliferation. The creation of public test sets, useful for evaluating the performance of different systems, would be a valuable teaching tool.
A strong consensus emerged that the resource should be created and/or managed by a nonprofit foundation, allowing for membership from both academic and commercial entities. This structure would ensure sustainability and broad participation. To kickstart the initiative, a virtual institute was suggested as an initial phase.
Follow on workshops
We are planning three follow-on workshops that are focused on specific use cases that can benefit from a knowledge resource: education, supply chain, and computational law. These topics were chosen because there exists prior work to justify further exploration. We briefly describe each of these workshops.
The workshop on education is aimed at identifying problems that cannot be solved using LLMs alone and require knowledge representation to be created. Examples of such problems include skill graphs, precision knowledge tracing, and grounding AI in verifiable knowledge. The goal of this workshop is to formulate a large community-driven knowledge graph construction project that would benefit education.
The workshop on supply chain is aimed at addressing critical vulnerabilities in the global supply chain exposed by the pandemic and infrastructure failures. We will explore several potential supply chain domains (for example, minerals, plastics, water supply, etc.) with an eye toward identifying a domain where the data is easily available and where a global view of the supply chain could be enabled by creating a rich semantic model.
source A community-driven vision for a new knowledge resource for AI
TOP
The workshop on computational law is aimed at creating a community of users for a national library of laws that are represented as computer code. Preliminary work exists in codifying local building codes, suggesting that scaling it to a national level will address significant inefficiency in regulatory compliance landscape.
Proof of concept for a knowledge resource
To adapt the Hugging Face model to knowledge modules, the major roadblock is not the design of a portal itself but to identify concrete problems and use cases where the use of a knowledge resource makes a significant difference. Dr. Alessandro Oltramari, the president of the Carnegie Bosch Institute, who attended the workshop, came forward with a set of use cases from Bosch that could be used to establish the value of a knowledge resource. Consequently, Dr. Chaudhri and Prof. Shimizu worked with Bosch to define three use cases: inference gaps in LLMs, robotic skill learning, and root cause analysis. We briefly explain each of these use cases.
Through a qualitative reasoning benchmark, Room Space 100, it has been shown that the accuracy of GPT-4 declines from 0.55 to 0.15 when the number of objects in a scene increases from 3 to 6 (Li et al. 2024). In contrast, a qualitative spatial reasoner for the same task is complete and correct regardless of the number of objects.
We will evaluate this claim in the context of a robotic planning use case provided by Bosch. Bosch currently uses LLMs for both high-level and low-level planning (Saxena et al. 2024). In our experiment, we will replace the LLM used by high-level planner with a symbolic planner that uses a spatial reasoner and compare their performance.
Robotic skill learning can benefit by leveraging a knowledge resource. Envision a scenario in which a robot learns to Pick up a hard object, Orient it for insertion, and then finally performs the Insert. In the current approach, a robot would learn this process as one unit.
By using a knowledge resource, it could map the process to individual steps such as Pick, Orient, and so forth, which are more generalizable. For example, if a robot has learned to pick up a hard object while doing spark plug insertion, it could use the same skill while moving a hard object from one place to another. Starting from the skill library provided by Bosch (Saxena et al. 2024), we will evaluate how many of these could be mapped to existing resources to enable generalizable robotic skill learning.
Root cause analysis in manufacturing involves identifying the cause of a defect. For example, temperature spike could be caused by machine vibration, which could be caused by an alignment error. In the current approach, the root cause analysis primarily relies on a machine learning algorithm that processes sensor data. If some of the causes are modeled in structured knowledge (Jaimini et al. 2023), root cause analysis need not rely solely only on sensor data. We will evaluate how the use of a knowledge resource improves the root cause prediction accuracy.
If the results from any of the above evaluations are positive, that will serve as a template to replicate similar application of knowledge resources to other use cases. Such use cases can also be the starting point for the envisioned open engineering framework.

TOP

+10 CONCLUSION

It is natural to question the value of a curated knowledge resource in the context of modern AI. The world is captivated by ever-larger generative AI models that produce fluent text, lifelike images, and powerful predictions. These systems dominate the headlines, attract billions in investment, and fuel the race for AI supremacy.
No matter how impressive, the generative AI systems have dangerous flaws: they reflect the data they were trained on, lack formal guarantees, and will always have inference gaps. They offer performance without understanding, and fluency without truth. And yet, the incentives of today's AI ecosystem—benchmarks, funding, and hype—reward more scale, not more sense.
That is why the kinds of knowledge resources pioneered in the classical AI must return. Knowledge resources built on logic, rules, and meaning offer exactly what generative AI lacks: transparency, justification, and formal guarantees. They let us encode shared human knowledge and values, instead of outsourcing everything to inscrutable models. Alone, symbolic AI once faltered. But in partnership with deep learning, it can give us systems that are both powerful and trustworthy.
The research culture of deep learning has much to teach us. Abundant tools, open-source frameworks, and shared data sets that are the staple of deep learning research make it easy for newcomers to experiment. Symbolic AI knowledge resources lack comparable support and require painstaking modeling and domain expertise. This creates barriers to their adoption by AI engineers.
Our hope through the present workshop and the follow-on activities, is to re-energize the knowledge representation and reasoning community in creating knowledge resources that parallel the deep learning models by adapting the engineering practices exemplified by Hugging Face. We must, however, start small and demonstrate the value and effectiveness of knowledge resources within the modern context of generative AI systems.
We have embarked on that journey by identifying industry use cases where LLMs have inference gaps, machine learning is done on very specific skills, and causal reasoning is drowning in data. We will evaluate the usefulness of existing knowledge resources in these three contexts and prototype how the knowledge artifacts can be disseminated for widespread use.
We hope that this model of working from use cases to delivering easily reusable engineering products will inspire others in the knowledge representation and reasoning community to undertake similar efforts in their own spheres.
Much work needs to be done to define actionable research program. Based on the results of the proof-of-concept study mentioned in the previous section, we hope to define research projects in all aspects of creating the knowledge resource: methodologies for creating both foundational and domain-specific knowledge, effective reasoning techniques that scale and take context into account, and translational techniques that can automatically translate a reasoning task framed in one formalism into another formalism.
Enabling seamless distributed development, especially by domain experts, is essential for fostering an effective community. Last but not least, we must pay attention to considering how to involve contributors from diverse geographic, linguistic, cultural, and institutional backgrounds, including underrepresented or low-resource communities.
AI research has always moved in cycles, with certain paradigms rising and falling in prominence. The remarkable progress we now see in deep learning would not have been possible without the persistence of researchers like Geoffrey Hinton, who championed neural networks long before they were in vogue. Just as neural networks have proven powerful for modeling aspects of human perception and cognition, curated knowledge remains essential for capturing structured, declarative understanding—the kind that underpins reasoning, learning, and communication.
Although knowledge curation has receded from the forefront of mainstream AI, its value has not diminished. By investing sustained effort into reintegrating curated knowledge into the AI toolkit, we can unlock new forms of robustness, interpretability, and societal impact—and, once again, broaden the horizons of what AI can achieve.

TOP

+11 ACKNOWLEDGMENTS

This work has been supported by grant number 2514820 from the United States National Science Foundation. Anthony G Cohn gratefully acknowledges financial support from: the Fundamental Research priority area of The Alan Turing Institute; the Special Funds of Tongji University, Shanghai, for the Sino-German Cooperation 2.0 Strategy. We are grateful to Prof. Ernest Davis for leading the panel discussion on foundational knowledge at the TIKA workshop.

source A community-driven vision for a new knowledge resource for AI

CONFLICT OF INTEREST STATEMENT

The authors declare no conflicts of interest.

+12 Biographies

Vinay K Chaudhri is a principal scientist at the Knowledge Systems Research LLC.
Chaitan Baru is a senior advisor in the Directorate for Technology, Innovation and Partnership at the National Science Foundation.
Brandon Bennett is the director of the postgraduate Research Studies in the School of Computer Science at the University of Leeds.
Mehul Bhatt is a professor of Computer Science at the Örebro University, Sweden.
Darion Cassel is a senior applied scientist in the Amazon Web Services Automated Reasoning Checks Team.
Anthony G Cohn is a Professor of Automated Reasoning in the School of Computer Science at the University of Leeds, and the Foundations Model Lead at the Alan Turing Institute.
Rina Dechter is a professor of Computer Science at the University of California, Irvine.
Esra Erdem is a professor of Computer Science in the Faculty of Engineering and Natural Sciences at the Sabanci University, Turkey.
Dave Ferrucci is the managing director at the Institute for the Advanced Enterprise AI.
Ken Forbus is a professor of Computer Science at the Northwestern University.
Gregory Gelfond is a research scientist at the University of Dayton Research Institute
Michael Genesereth is a professor in the Department of Computer Science at the Stanford University.
Andrew S. Gordon is a research associate professor of Computer Science and the director of interactive
narrative research at the Institute for Creative Technologies at the University of Southern California.
Benjamin Grosof is a program manager in the Defense Advanced Research Projects Agency.
Gopal Gupta is a professor of Computer Science and a Co-director of the Center for Applied AI and Machine Learning at the University of Texas at Dallas.
Jim Hendler is the director of the future of Computing Institute and the Tetherless World Professor of Computer, Web and Cognitive Sciences at RPI and is also the director of the RPI-IBM Artificial Intelligence Research Collaboration.
Sharat Israni is the Chief Technology Officer of the Bakar Computational Health Sciences Institute at the University of California at San Francisco. He is also an affiliate faculty at UCSF and Berkeley.
Tyler R. Josephson is an assistant professor in the Department of Chemical, Biochemical and Environmental Engineering at the University of Maryland, Baltimore County.
Patrick Kyllonen is a distinguished presidential appointee at the Education Testing Service.
Yuliya Lierler is a professor in the Department of Computer Science at the University of Nebraska, Omaha.
Vladimir Lifschitz is a professor emeritus of Computer Science at the University of Texas at Austin.
Clifton McFate is a senior AI engineer at Cynch.AI.
Hande Küçük McGinty is an assistant professor in the Department of Computer Science at Kansas State University.
Leora Morgenstern is a principal scientist at SRI International.
Alessandro Oltramari is the President, Carnegie Bosch Institute, Carnegie Mellon University & Senior Manager, Bosch Research and Technology Center (Pittsburgh, USA).
Praveen Paritosh is the founder and the CEO of The Third Ear.
Dan Roth is the chief AI Scientist at Oracle, and a professor of Computer Science at the University of Pennsylvania.
Blake Shepard is a senior ontologist at Cycorp.
Cogan Shimizu is an assistant professor at the Wright State University.
Denny Vrandečić is the Head of Special Projects at the Wikimedia Foundation and a Visiting Professor at King's College London.
Mark Whiting is CTO at Pareto and a research fellow at the University of Pennsylvania
Michael Witbrock is a professor of Computer Science at the University of Auckland at New Zealand.

source A community-driven vision for a new knowledge resource for AI