FSE 2025
Mon 23 - Fri 27 June 2025 Trondheim, Norway
Wed 25 Jun 2025 14:00 - 14:20 at Cosmos 3D - Testing 4 Chair(s): Antonio Mastropaolo

Question answering (QA) is a fundamental task of a large language model (LLM), which requires LLM to automatically answer human-posed questions in natural language. However, LLMs are known to distort facts and make non-factual statements (hallucination) when dealing with QA tasks, which may affect the deployment of LLMs in real-life situations. In this work, we present DrHall, a method for the detection of factual errors in black-box large language models inspired by metamorphosis testing in software testing. We believe that the model’s hallucination answer is unstable. It is easier to produce different answers to the hallucination by using metamorphic relation (MR) to make the model take different execution paths for re-execution. We empirically evaluate DrHall on three datasets covering natural and code language data, finding that it outperforms existing methods and baselines, often by a large gap. In addition, by transforming DrHall using diverse path sampling, we obtain error correction methods with higher success rates. Our results demonstrate the potential of using MR to mitigate LLM hallucination.

Wed 25 Jun

Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

14:00 - 15:20
Testing 4Industry Papers / Research Papers / Demonstrations at Cosmos 3D
Chair(s): Antonio Mastropaolo William and Mary, USA
14:00
20m
Talk
Detecting and Reducing the Factual Hallucinations of Large Language Models with Metamorphic Testing
Research Papers
Weibin Wu Sun Yat-sen University, Yuhang Cao Sun Yat-sen University, Ning Yi Sun Yat-sen University, Rongyi Ou Sun Yat-sen University, Zibin Zheng Sun Yat-sen University
DOI
14:20
10m
Talk
A Tool for Generating Exceptional Behavior Tests With Large Language Models
Demonstrations
Linghan Zhong University of Texas Austin, Samuel Yuan The University of Texas at Austin, Jiyang Zhang University of Texas at Austin, Yu Liu Meta, Pengyu Nie University of Waterloo, Junyi Jessy Li University of Texas at Austin, USA, Milos Gligoric The University of Texas at Austin
14:30
20m
Talk
Using Large Language Models to Support the Workflow of Differential Testing
Industry Papers
Arun Krishna Vajjala George Mason University, Ajay Krishna Vajjala George Mason University, Carmen Badea Microsoft Research, Christian Bird Microsoft Research, Jade D'Souza Microsoft, Robert DeLine Microsoft Research, Mikhail Demyanyuk Microsoft, Jason Entenmann Microsoft Research, Nicole Forsgren Microsoft Research, Aliaksandr Hramadski Microsoft, Haris Mohammad Microsoft, Sandeepan Sanyal Microsoft, Oleg Surmachev Microsoft, Thomas Zimmermann University of California, Irvine
14:50
20m
Talk
Adaptive Random Testing with Qgrams: the Illusion Comes True
Research Papers
Matteo Biagiola Università della Svizzera italiana, Robert Feldt Chalmers | University of Gothenburg, Paolo Tonella USI Lugano
DOI Pre-print
15:10
10m
Talk
Dynamic Application Security Testing for Kubernetes Deployment: An Experience Report from Industry
Industry Papers
Shazibul Islam Shamim Kennesaw State University, Hanyang Hu Company A, Akond Rahman Auburn University
Pre-print

Information for Participants
Wed 25 Jun 2025 14:00 - 15:20 at Cosmos 3D - Testing 4 Chair(s): Antonio Mastropaolo
Info for room Cosmos 3D:

Cosmos 3D is the fourth room in the Cosmos 3 wing.

When facing the main Cosmos Hall, access to the Cosmos 3 wing is on the left, close to the stairs. The area is accessed through a large door with the number “3”, which will stay open during the event.

OSZAR »