Ramses

Retrieve-and-Marginalize End-to-End Summarization of Biomedical Studies

Ramses

Retrieve-and-Marginalize End-to-End Summarization of Biomedical Studies

Gianluca Moro, Luca Ragazzi, Lorenzo Valgimigli, Lorenzo Molfetta

SISAP-23

Description

An arduous biomedical task entails condensing evidence derived from multiple interrelated studies, given a context as input, to generate reviews or provide answers autonomously. We named this task Context-Aware Multi-Document Summarization (CA-MDS). Existing state-of-the-art (SOTA) solutions necessitate truncation of the input due to the high memory demands, resulting in the loss of meaningful content. To address this issue effectively, we propose a novel approach called RAMSES, which employs a retrieve-and-marginalize technique for end-to-end summarization. The model acquires the ability to (i) index each document by modeling its semantic features, (ii) retrieve the most relevant ones, and (iii) generate a summary via token probability marginalization. To facilitate evaluation, we introduce a new dataset, FAQsumC19, which encompasses synthesizing multiple supporting papers to answer questions related to Covid-19. Our experimental findings demonstrate that RAMSES achieves notably superior ROUGE scores compared to state-of-the-art methodologies, including establishing a new SOTA for generating systematic literature reviews using MS2. Quality observation via human evaluation indicates that our model produces more informative answers than the previous leading approaches. Keywords: Biomedical Multi-Document Summarization, Neural Semantic Representation, End-to-End Neural Retriever, NLP