EMNLP 2022

Published: Dec 7, 2022

We are proud to announce that our group will be at EMNLP 2022 with an accepted long paper in the main track (22% acceptance rate). We will attend in presence and present BioReader, the first retrieval-enhanced transformer for biomedical literature.

BioReader: a Retrieval-Enhanced Text-to-Text Transformer for Biomedical Literature

by G. Frisoni, M. Mizutani, G. Moro and L. Valgimigli

The latest batch of research has equipped language models with the ability to attend over relevant and factual information from non-parametric external sources, drawing a complementary path to architectural scaling. Besides mastering language, exploiting and contextualizing the latent world knowledge is crucial in complex domains like biomedicine. However, most works in the field rely on general-purpose models supported by databases like Wikipedia and Books. We introduce BioReader, the first retrieval-enhanced text-to-text model for biomedical natural language processing. Our domain-specific T5-based solution augments the input prompt by fetching and assembling relevant scientific literature chunks from a neural database with ≈60 million tokens centered on PubMed. We fine-tune and evaluate BioReader on a broad array of downstream tasks, significantly outperforming several state-of-the-art methods despite using up to 3x fewer parameters. In tandem with extensive ablation studies, we show that domain knowledge can be easily altered or supplemented to make the model generate correct predictions bypassing the retraining step and thus addressing the literature overload issue.