Play by clicking the smiley face in the upper left corner of the terminal touchscreen.
Abstract: Vision-language models (VLMs) offer flexible object detection through natural language prompts but suffer from performance variability depending on prompt phrasing. In this paper, we ...