| MS Seminar

Name of the Speaker: Mr. HITHESH SANKARARAMAN M (EE22S077)
Guide: Dr. Umesh S
Co-Guide: Dr. Ashok Jhunjhunwala
Venue: ESB-210B (Conference Hall)
Online meeting link: http://meet.google.com/hyg-kvpj-sfx
Date/Time: 3rd December 2025 (Wednesday), 9.00 PM
Title: Provenance: A Light-weight Fact-checker for Retrieval Augmented LLM Generation Output.

Abstract :

LLMs can produce non-factual or hallucinated outputs. Manual fact-checking of LLM outputs is time-consuming. Hence, we wanted to develop a lightweight solution based on open-source models for verifying factual entailment in LLMs outputs without relying on LLMs for fact-checking. Our main objectives were to not use LLMs, our solution should have low latency, low run-time cost and should seamlessly integrate with existing RAG systems. In the first part of this thesis, we present a light-weight approach named Provenance to detect nonfactual outputs from retrieval-augmented generation (RAG). Given a context and putative output, we compute a factuality score that can be thresholded to yield a binary decision to check the results of LLM-based question-answering, summarization, or other systems. Unlike factuality checkers that themselves rely on LLMs, we use compact, open-source natural language inference (NLI) models that yield a freely accessible solution with low latency and low cost at run-time, and no need for LLM fine-tuning. The approach also enables downstream mitigation and correction of hallucinations, by tracing them back to specific context chunks. Our experiments show high Area Under the Receiver Operating Characteristic (ROC) Curve (AUROC) across a wide range of relevant open source datasets, indicating the effectiveness of our method for fact-checking RAG output. In the second part of this thesis, we enhance the Provenance framework by addressing two key challenges: context chunking and retriever uncertainty. We implement autonomous chunking i) to handle larger real-world RAG data exceeding the token handling capacity of open-source models and ii) to seamlessly integrate with existing RAG systems. We incorporate retriever uncertainty into the factuality score to better distinguish between hallucinations and re-ranker errors. The improved Provenance framework achieves a higher Average Area Under the Receiver Operating Characteristic (ROC) Curve (AUROC) score (5\% improvement in the AUROC score over Provenance – version 1) across 21 different datasets. Provenance provides a balance between performance and resource efficiency, addressing the limitations of traditional LLM-based fact-checkers in both cost and computational requirements. Our framework is production-ready and improves the reliability and trustworthiness of LLM-based applications. The study also identifies areas for future research, including fine-tuning the NLI-based fact-checker using RAG-specific datasets to enhance its effectiveness, extending the framework to identify specific hallucinated spans within the generated output and incorporating mechanisms for hallucination mitigation within the framework.