Published: Jul 16, 2025

We are proud to announce that our group will be at ECAI 2025 with 3 long papers: 2 works in the Main track and 1 in the Demonstrations track! Catch us in Bologna, Italy to learn more about automatic prompt learning, hierarchical extreme multi-label text classification in the food domain, and protein language models.

Magic Mirror on the Wall, Which is the Fairest Prompt of All? A Survey on Automatic Prompt Learning

by S. Fantazzini, G. Frisoni, G. Moro, L. Ragazzi, M. Ciccioni, and C. Sartori

Prompts direct the behavior of a model by conditioning its outputs on carefully designed instructions and examples, similar to setting the trajectory of an arrow before release. More broadly, prompt learning is the research area that aims to solve downstream tasks by directly leveraging the knowledge acquired by language models at pre-training time, removing the need for expensive fine-tuning stages with potentially different objective functions. While manual prompt engineering has enabled both small and large language models to achieve superhuman performance on numerous benchmarks, it remains a labor-intensive and suboptimal process. Recently, the field has shifted towards automating the search for prompts that effectively elicit the desired model responses. This survey presents the first systematic review of prompt learning for pre-trained language models operating on text inputs, with a particular focus on automatic methods. We critically analyze existing publications and organize them into a novel taxonomy, describing key aspects for practical usage. We finally discuss promising directions for future research. Our curated repository of annotated papers, continuously updated, is available at https://anonymous.4open.science/r/awesome-prompt-learning.

The paper will be available soon!

FEAST: Retrieval-Augmented Multi-Hierarchical Food Classification for the FoodEx2 System

by L. Molfetta, A. Cocchieri, S. Fantazzini, G. Frisoni, L. Ragazzi, and G. Moro

Hierarchical text classification (HTC) and extreme multi-label classification (XML) tasks face compounded challenges from complex label interdependencies, data sparsity, and extreme output dimensions. These challenges are exemplified in the European Food Safety Authority's FoodEx2 system—a standardized food classification framework essential for food consumption monitoring and contaminant exposure assessment across Europe. FoodEx2 coding transforms natural language food descriptions into a set of codes from multiple standardized hierarchies, but faces implementation barriers due to its complex structure. Given a food description (e.g., "organic yogurt"), the system identifies its base term ("yogurt"), all the applicable facet categories (e.g., "production method"), and then, every relevant facet descriptors to each category (e.g., "organic production"). While existing approaches perform adequately on well-balanced and semantically dense hierarchies, no work has been applied on the practical constraints imposed by the FoodEx2 system. The limited literature addressing such real-world scenarios further compounds these challenges. We propose FEAST (Food Embedding And Semantic Taxonomy), a novel retrieval-augmented framework that decomposes the FoodEx2 classification challenge into a three-stage approach: (1) base term identification, (2) multi-label facet prediction, and (3) facet descriptor assignment. By leveraging the system's hierarchical structure to guide training and performing deep metric learning, FEAST learns discriminative embeddings that mitigate data sparsity and improve generalization on rare and fine-grained labels. Evaluated on the multilingual FoodEx2 benchmark, FEAST outperforms the prior European's CNN baseline F1 scores by 12—38\% on rare classes. Code and models are released to support reproducibility at https://anonymous.4open.science/r/foodex2-coding-6741/

The paper will be available soon!

Predicting Protein Functions with Ensemble Deep Learning and Protein Language Models

by G. Frisoni, M. Fuschi, and G. Moro

Understanding protein functions enables deciphering cellular mechanisms and improving healthcare outcomes, from disease diagnosis to targeted therapy. We present GOMix, an ensemble learning method for predicting the functions of newly discovered proteins, packaged within an easy-to-use web application. By combining seven complementary base predictors---including sequence homology and protein language models, GOMix achieves competitive or state-of-the-art performance in the CAFA-3 challenge. Unlike existing solutions, GOMix is entirely open-source, modular, and computationally low-resource. The code is publicly available at https://github.com/disi-unibo-nlp/gomix (MIT License).

The paper will be available soon!