Despite the outstanding performance in vision-language reasoning, Large Vision-Language Models (LVLMs) might generate hallucinated contents that do not exist in given images. Most existing LVLM hallucination benchmarks are constrained for object-related hallucinations. However, the potential hallucination on the relations between two objects, i.e., relation hallucination, still lacks investigation. To remedy that, we design a unified framework to measure object and relation hallucination in LVLMs simultaneously. The core idea is to conduct hallucination evaluation on the (object, relation, object) triplets extracted from LVLMs' responses. We introduce Tri-HE, a novel Triplet-level Hallucination Evaluation benchmark which can be used to study both object and relation hallucination at the same time. We conduct comprehensive evaluations on Tri-HE and observe that the relation hallucination issue is even more serious than object hallucination among existing LVLMs, highlighting a previously neglected problem towards reliable LVLMs. Moreover, based on our findings, we design a simple yet effective training-free approach to mitigate hallucinations for LVLMs, with which, we exceed all open-sourced counterparts on Tri-HE, achieving comparable performance with the powerful GPT-4V.
@article{wu2024unified,
title={Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models},
author={Wu, Junjie and Chung, Tsz Ting and Chen, Kai and Yeung, Dit-Yan},
journal={arXiv preprint arXiv:2410.23114},
year={2024}
}