Published: May 20, 2025

We are delighted to announce that we will be at ACL 2025 with 2 long papers in the Main and Findings Track! Catch us in Vienna, Austria to learn more about humor QA and biomedical NER.

"What do you call a dog that is incontrovertibly true? Dogma": Testing LLM Generalization through Humor

by A. Cocchieri, L. Ragazzi, P. Italiani, G. Tagliavini, and G. Moro

Humor, requiring creativity and contextual understanding, is a hallmark of human intelligence, showcasing adaptability across linguistic scenarios. While recent advances in large language models (LLMs) demonstrate strong reasoning on various benchmarks, it remains unclear whether they truly adapt to new tasks like humans (i.e., generalize) or merely replicate memorized content. To explore this, we introduce Phunny, a new humor-based question-answering benchmark designed to assess LLMs' reasoning through carefully crafted puns. Our dataset is manually curated to ensure novelty and minimize data contamination, providing a robust evaluation of LLMs' linguistic comprehension. Experiments on pun comprehension, resolution, and generation reveal that most LLMs struggle with generalization, even on simple tasks, consistently underperforming the human baseline. Additionally, our detailed error analysis provides valuable insights to guide future research. The data is available at https://anonymous.4open.science/r/phunny/.

Check out the paper!

ZeroNER: Fueling Zero-Shot Named Entity Recognition via Entity Type Descriptions

by A. Cocchieri, M. M. Galindo, G. Frisoni, G. Moro, C. Sartori, and G. Tagliavini

In real-world Named Entity Recognition (NER), annotation scarcity and the challenge of unseen entity types make zero-shot learning essential. While Large Language Models (LLMs) possess vast parametric knowledge, they fall short in cost-effectiveness compared to specialized encoders. Current zero-shot methods often rely solely on entity type names, overlooking both the critical role of descriptions in resolving definition ambiguities and the issue of type leakage during pretraining. In this work, we introduce ZeroNER, a description-driven framework that enhances zero-shot NER in low-resource settings. By leveraging entity type descriptions through cross-attention, ZeroNER enables a BERT-based student model to identify any entity type without additional training. Evaluated on three real-world zero-shot benchmarks under a rigorous hard zero-shot setting, ZeroNER consistently outperforms several LLMs by up to 15% in F1 score and surpasses alternative lightweight methods that rely solely on type names. Furthermore, our findings reveal that many LLMs significantly benefit from using type descriptions, underscoring their potential in zero-shot NER.

Check out the paper!