Towards Knowledgeable Foundation Models

@ ACL 2025 Workshop

Aug 1, 2025 in Vienna, Austria

Towards Knowledgeable Foundation Models

Knowledge has been an important pre-requisite for a variety of AI applications, and is typically sourced from either structured knowledge sources such as knowledge bases and dictionaries or unstructured knowledge sources such as Wikipedia documents.

More recently, researchers have discovered that language models already possess a significant amount of knowledge through pre-training: LLMs can be used to generate commonsense knowledge and factual knowledge context for question answering. While the results are encouraging, there are still lingering questions:

Where does this knowledge come from?
How much do language models know?
Is this knowledge reliable?
If some knowledge is wrong, can we fix it?

This workshop examines the lifecycle of knowledge within language models:

(1) the emergence of knowledge through language model pre-training;
(2) injection of external knowledge;
(3) the updating and modification of knowledge;
(4) probing and generation of knowledge.

This is the 3rd workshop for Knowledgeable Foundation Model workshop. The previous workshop was hosted at KnowFM@AAAI2025 and KnowLM@ACL2024.

Call for Papers

Knowledge has been an important prerequisite for various NLP applications and is typically derived from either structured knowledge sources such as knowledge bases and dictionaries or unstructured knowledge sources such as Wikipedia documents and news articles.

It is known that language models already possess a significant amount of knowledge through pre-training: LLMs can be used to generate commonsense knowledge and factual knowledge when prompted to do so. However, beyond the surface, there are still many lingering questions such as “where the knowledge comes from”, “how do we quantify the amount of knowledge”, “is the knowledge reliable (and do LMs themselves know)”, “how can we augment LMs with domain-specific knowledge”, “how can we revise knowledge without hurting the reasoning abilities of LMs” and “how can we leverage knowledge to assist the self-correction of LMs”.

In this workshop, we want to bring together researchers who focus on different stages and different aspects (structured knowledge, unstructured knowledge, and knowledge acquired from LMs themselves) of the knowledge lifecycle to discuss the role of knowledge in the era of large language models.

Submission Topics

We welcome submissions on all topics related to knowledgable LMs, including:

Analysis of knowledge within LMs: how much they know and where that knowledge is from.
Enhancing LMs with existing knowledge sources (knowledge graphs, domain-specific databases, manuals, and rules, etc, either during training or inference).
Analyzing and improving RAG (retrieval-augmented generation) systems
Updating and editing knowledge in LMs.
Knowledge extraction and generation using LMs
Evaluation of knowledge utilization (faithfulness, truthfulness) by LMs.
Identification and mitigation of LM hallucinations, factual error correction

We will also announce a Best Paper Award at our workshop.

Submission Instructions

We welcome two types of papers: regular workshop papers and non-archival submissions. Only regular workshop papers will be included in the workshop proceedings. Review process will be double-blind. All submissions should be in PDF format following the ACL template and made through OpenReview submission portal (https://openreview.net/group?id=aclweb.org/ACL/2025/Workshop/KnowFM)

Important Dates

All deadlines are 23:59pm UTC-12h (“Anywhere on Earth”).

Submission Deadline	Jun 6th 2025 (23:59pm AoE)
Decision Notifications	Jun 18th 2025 (23:59pm AoE)
Camera-Ready Deadline	Jun 25th 2025 (23:59pm AoE)
Workshop Date	1st Aug 2025

Schedule

Virtual posters go to the gathering town.

Time	Program
09:00-09:10	Opening Remarks
09:10-09:50	Keynote Speech Preslov Nakov: Towards Truly Open, Language-Specific, Safe, Factual, and Specialized Large Language Models First, we will argue for the need for fully transparent open-source large language models (LLMs), and we will describe the efforts of MBZUAI's Institute on Foundation Models (IFM) towards that based on the LLM360 initiative. Second, we will argue for the need for language-specific LLMs, and we will share our experience from building Jais, the world's leading open Arabic-centric foundation and instruction-tuned large language model, Nanda, our open-weights Hindi LLM, Sherkala, our open-weights Kazakh LLM, and some other models. Third, we will argue for the need for safe LLMs, and we will present Do-Not-Answer, a dataset for evaluating the guardrails of LLMs, which is at the core of the safety mechanisms of our LLMs. Forth, we will argue for the need for factual LLMs, we will discuss the factuality challenges that LLMs pose. We will then present some recent relevant tools for addressing these challenges developed at MBZUAI: (i) OpenFactCheck, a framework for fact-checking LLM output, for building customized fact-checking systems, and for benchmarking LLMs for factuality, (ii) LM-Polygraph, a tool for predicting an LLM's uncertainty in its output using cheap and fast uncertainty quantification techniques, and (iii) LLM-DetectAIve, a tool for machine-generated text detection. Finally, we will argue for the need for specialized models, and we will present the zoo of LLMs currently being developed at MBZUAI's IFM.
09:50-10:30	Keynote Speech Yunyao Li: Declarative to Generative: Building and Querying Enterprise Knowledge Bases Over the last 25 years -- search, knowledge graph and even large language model innovations have been adopted by consumers much before enterprises. The delay in adoption of such technologies in enterprises is largely due to two factors. First, enterprise knowledge bases vary widely based on industry verticals and even within an industry vertical by organization-specific terminology and vocabulary. Second, querying such knowledge bases needs to account for very low-tolerance enterprise users have for mistakes and hallucination. In this talk I will describe tools to build, maintain and query such knowledge bases and the evolution of these tools over two decades from declarative to generative systems.
10:30-11:00	Coffee Break
11:00-11:15	Oral Presentation: SIS-Fact: Towards Systematic, Interpretable and Scalable Factuality Evaluation for LLM
11:15-11:30	Oral Presentation: Atomic Calibration of LLMs in Long-Form Generations
11:30-11:45	Oral Presentation: Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning
11:45-12:00	Oral Presentation: Understanding the Interplay between Parametric and Contextual Knowledge for Large Language Models
12:00-12:15	Oral Presentation: The Mirage of Model Editing: Revisiting Evaluation in the Wild
12:15-12:25	Best Paper Award Announcement
12:25-14:10	Lunch Break
14:10-14:50	Keynote Speech Chengxiang Zhai: From Knowledgeable Foundation Models to Knowledgeable Agents: A Neurosymbolic Perspective on Knowledge Representation Foundation models acquire massive amounts of useful knowledge from both pre-training and fine-tuning, but the knowledge they encode in their parameter space is neither interpretable nor verifiable, and their behavior in applying the knowledge during inference time is unpredictable. These limitations cause concerns about their trustworthiness when they are directly used in real world applications. While much work has attempted to address those limitations via improving a foundation model itself, we argue that those limitations of foundation models are better addressed by building an agent that can augment the foundation model with a memory mechanism, regulate its behavior using a symbolic representation module, and self-improve itself over time. In this talk, we will discuss how compression of deep neural networks enables foundation models to acquire generalizable knowledge in both pre-training and fine-tuning, why the behaviors of foundation models are inherently unpredictable, and why it is necessary to build a knowledgeable agent on top of a knowledgeable foundation model and use a neurosymbolic knowledge representation to enable both trustworthiness and lifelong learning of the agent. We will conclude with some promising future directions for future research.
14:50-15:30	Panel Discussion: Ed Hovy, Chengxiang Zhai,Yunyao Li
15:30-16:00	Coffee Break
16:00-17:30	Poster Session

Accepted Papers

Environment Free Coding Benchmarks: Evaluating Language Model Coding Capabilities without a Dedicated Environment [PDF]
Laurence Liang

How Many Parameters for Multi-Hop? An Information-Theoretic Capacity Law for Knowledge Retrieval in Large Language Models [PDF]
Thomas Chen

GeoEdit: Geometric Knowledge Editing for Large Language Models [PDF]
Yujie Feng, Li-Ming Zhan, ZEXIN LU, Yongxin Xu, Xu Chu, Yasha Wang, Jiannong Cao, Philip S. Yu, Xiao-Ming Wu

Superfluous Instruction: Vulnerabilities Stemming from Task-Specific Superficial Expressions in Instruction Templates [PDF] [Poster]
Toma Suzuki, Yusuke Sakai, Justin Vasselli, Hidetaka Kamigaito, Taro Watanabe

DEAL: Disentangling Transformer Head Activations for LLM Steering [PDF]
Li-Ming Zhan, Bo LIU, ZEXIN LU, Yujie Feng, Chengqiang Xie, Jiannong Cao, Xiao-Ming Wu

Reasoning or Memorization? Investigating LLMs’ Capability in Restoring Chinese Internet Homophones [PDF] [Poster]
Jianfei Ma, Zhaoxin Feng, Huacheng Song, Emmanuele Chersoni, Zheng Chen

Knowledge Mechanisms in Large Language Models: A Survey and Perspective [PDF] [Poster]
Mengru Wang, Yunzhi Yao, Shuofei Qiao, Shumin Deng, Jia-Chen Gu, Fei Huang, Huajun Chen, Ningyu Zhang

Structure-Aware Hyperbolic Representation for Coarse-to-Fine Emotion Classification in Lyrics [PDF]
Yutong Hu, Menglin Yang, Reza Mohammadi

Theorem-of-Thought: A Multi-Agent Framework for Abductive, Deductive, and Inductive Reasoning in Language Models [PDF] [Poster]
Samir Abdaljalil, HASAN KURBAN, Khalid Qaraqe, Erchin Serpedin

IPAD: Inverse Prompt for AI Detection - A Robust and Interpretable LLM-Generated Text Detector [PDF] [Poster]
Samir Abdaljalil, HASAN KURBAN, Khalid Qaraqe, Erchin Serpedin

Context-Efficient Retrieval with Factual Decomposition [PDF]
Yanhong Li, David Yunis, David McAllester, Jiawei Zhou

Meetalk: Retrieval-Augmented and Adaptively Personalized Meeting Summarization with Knowledge Learning from User Corrections [PDF]
Zheng CHEN, JIANG FUTIAN, Yue Deng, Changyang He, Bo Li

Can LLMs Recognize Their Own Analogical Hallucinations? Evaluating Uncertainty Estimation for Analogical Reasoning [PDF]
Zheng CHEN, Zhaoxin Feng, Jianfei Ma, Jiexi Xu, Bo Li

Democratizing LLM Benchmarking via Automated Dynamic Knowledge Evaluation [PDF]
Yanhong Li, Tianyang Xu, Kenan Tang, Karen Livescu, David McAllester, Jiawei Zhou

A Progressive Learning Strategy for Medical Natural Language Understanding [PDF]
ZHE YANG, Yi Huang, Mengfei Guo, Yaqin Chen, Xiaoting Wu, Junlan Feng, Chao Deng

Exploring Personalization Shifts in Representation Space of LLMs [PDF]
Jiahong Liu, Wenhao Yu, Quanyu Dai, Zhongyang Li, Jieming Zhu, Menglin Yang, Tat-Seng Chua, Irwin King

Semantics-Preserving Adversarial Attacks on Event-Driven Stock Prediction Models [PDF] [Poster]
Aofan Liu, haoxuan li, Hongjian Xing, Yuguo Yin, Zijun Li, Yiyan Qi

Beyond Function-Level Search: Repository-Aware Dual-Encoder Code Retrieval with Adversarial Verification [PDF] [Poster]
Aofan Liu, Shiyuan SONG, haoxuan li, Cehao Yang, Yiyan Qi

MD3R: Minimizing Data Distribution Discrepancies to Tackle Inconsistencies in Multilingual Query-Code Retrieval [PDF] [Poster]
Aofan Liu, Yuguo Yin, Hongjian Xing, Zhen Li, Yiyan Qi

ATEB: Rethinking Advanced NLP Tasks in an Information Retrieval Setting [PDF]
Simeng Han, Frank Palma Gomez, Tu Vu, Zefei Li, Daniel Cer, Hansi Zeng, Chris Tar, Arman Cohan, Gustavo Hernandez Abrego

When to Trust Context: Self-Reflective Debates for Context Reliability [PDF]
Zeqi Zhou, Fang Wu, Shayan Talaei, Haokai Zhao, Cheng Meixin, Tinson Xu, Amin Saberi, Yejin Choi

Truth Neurons [PDF] [Poster]
Haohang Li, Yupeng Cao, Yangyang Yu, Jordan W. Suchow, Zining Zhu

FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning [PDF]
Ziming Luo, Ruosen Li, Xinya Du

ToolReAGt: Tool Retrieval for LLM-based Complex Task Solution via Retrieval Augmented Generation [PDF]
Norbert Braunschweiler, Rama Doddipatla, TUDOR-CATALIN ZORILA

COSMIC: Generalized Refusal Direction Identification in LLM Activations [PDF]
Vincent Siu, Nicholas Crispino, Zihao Yu, Sam Pan, Zhun Wang, Yang Liu, Dawn Song, Chenguang Wang

Predicting Task Performance with Context-aware Scaling Laws [PDF]
Kyle Montgomery, David Park, Jianhong Tu, Michael Bendersky, Beliz Gunel, Dawn Song, Chenguang Wang

MLAN: Language-Based Instruction Tuning Preserves and Transfers Knowledge in Multimodal Language Models [PDF]
Jianhong Tu, Zhuohao Ni, Nicholas Crispino, Zihao Yu, Michael Bendersky, Beliz Gunel, Ruoxi Jia, Xin Liu, Lingjuan Lyu, Dawn Song, Chenguang Wang

Stress-Testing Multimodal Foundation Models for Crystallographic Reasoning [PDF] [Poster]
Can Polat, HASAN KURBAN, Erchin Serpedin, Mustafa Kurban

Understanding the Interplay between Parametric and Contextual Knowledge for Large Language Models [PDF]
Sitao Cheng, Liangming Pan, Xunjian Yin, Xinyi Wang, William Yang Wang

Evaluating RAG Robustness to Symbolic Perturbations [PDF]
Xinyun Zhou, Xinfeng Li, Kun Wang, Xuanwang Zhang, Ming Xu, Yinan Peng, Miao Yu, Yidong Wang, Xiaojun Jia, Qingsong Wen, XiaoFeng Wang, Wei Dong

Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning [PDF]
Shuzheng Si, Haozhe Zhao, Cheng Gao, Yuzhuo Bai, Zhitong Wang, Bofei Gao, Kangyang Luo, Wenhao Li, Yufei Huang, Gang Chen, Fanchao Qi, Minjia Zhang, Baobao Chang, Maosong Sun

Knowledge-Grounded Detection of Cryptocurrency Scams with Retrieval-Augmented LMs [PDF]
Zichao Li

FIFA: Unified Faithfulness Evaluation Framework for Text-to-Video and Video-to-Text Generation [PDF]
Liqiang Jing, Viet Dac Lai, Seunghyun Yoon, Trung Bui, Xinya Du

A Comprehensive Analysis for Visual Object Hallucination in Large Vision-Language Models [PDF]
Liqiang Jing, Hardy Chen, Ehsan Aghazadeh, Xin Eric Wang, Xinya Du

Latent Knowledge Scalpel: Precise and Massive Knowledge Editing for Large Language Models [PDF]
Xin Liu, Qiyang Song, Shaowen Xu, Kerou Zhou, Wenbo Jiang, Xiaoqi Jia, Weijuan Zhang, Heqing Huang, Yakai Li

What makes Reasoning Models Different? Follow the Reasoning Leader for Efficient Decoding [PDF]
Ming Li, Zhengyuan Yang, Xiyao Wang, Dianqi Li, Linjie Li, Kevin Lin, Tianyi Zhou, Lijuan Wang

CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners [PDF]
Yunzhi Yao, Jizhan Fang, Jia-Chen Gu, Ningyu Zhang, Shumin Deng, Huajun Chen, Nanyun Peng

SIS-Fact: Towards Systematic, Interpretable and Scalable Factuality Evaluation for LLM [PDF]
Yuzhuo Bai, Kangyang Luo, Wenhao Li, Shuzheng Si, Gang Chen, Fanchao Qi, Maosong Sun

Shallow Focus, Deep Fixes: Enhancing Shallow Layers Vision Attention Sinks to Alleviate Hallucination in LVLMs [PDF]
Xiaofeng Zhang, Yihao Quan, Chen Shen, Chaochen Gu, Xiaosong Yuan, Shaotian Yan, Jiawei Cao, Hao Cheng, Kaijie Wu, Jieping Ye

How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training [PDF] [Poster]
Yixin Ou, Yunzhi Yao, Ningyu Zhang, Hui Jin, Jiacheng Sun, Shumin Deng, Zhenguo Li, Huajun Chen

Teaching Large Language Models to Express Knowledge Boundary from Their Own Signals [PDF]
Lida Chen, Zujie Liang, Xintao Wang, Jiaqing Liang, Yanghua Xiao, Feng Wei, Jinglei Chen, ZHENGHONG HAO, Bing Han, Wei Wang

AttentionRAG: Attention-Guided Context Pruning in Retrieval-Augmented Generation [PDF] [Poster]
Yixiong Fang, Tianran Sun, Yuling Shi, Xiaodong Gu

Atomic Calibration of LLMs in Long-Form Generations [PDF]
Caiqi Zhang, Ruihan Yang, Zhisong Zhang, Xinting Huang, Sen Yang, Dong Yu, Nigel Collier

Transparent and Coherent Procedural Mistake Detection [PDF] [Poster]
Shane Storks, Itamar Bar-Yossef, Yayuan Li, Zheyuan Zhang, Jason J Corso, Joyce Chai

CoRE: Condition-based Reasoning for Identifying Outcome Variance in Complex Events [PDF] [Poster]
Sai P Vallurupalli, Francis Ferraro

EdTec-ItemGen: Enhancing Retrieval-Augmented Item Generation Through Key Point Extraction [PDF]
Alonso Palomino, David Buschhüter, Roland Roller, Niels Pinkwart, Benjamin Paassen

ReSCORE: Label-free Iterative Retriever Training for Multi-hop Question Answering with Relevance-Consistency Supervision [PDF] [Poster]
Dosung Lee, Wonjun Oh, Boyoung Kim, Minyoung Kim, Joonsuk Park, Paul Hongsuck Seo

Temporal Information Retrieval via Time-Specifier Model Merging [PDF]
SeungYoon Han, Taeho Hwang, Sukmin Cho, Soyeong Jeong, Hoyun Song, Huije Lee, Jong C. Park

The Mirage of Model Editing: Revisiting Evaluation in the Wild [PDF]
Wanli Yang, Fei Sun, Jiajun Tan, Xinyu Ma, Qi Cao, Dawei Yin, Huawei Shen, Xueqi Cheng

MT2ST: Adaptive Multi-Task to Single-Task Learning [PDF]
Dong Liu, Yanxuan Yu