Springer - AI and Law

Published: Jan 17, 2025

UniboNLP has 1 long paper published in Artificial Intelligence and Law 2025. Read to learn more on knowledge distillation and data generation from LLMs for real-world legal QA.


Enhancing Legal Question Answering with Data Generation and Knowledge Distillation from Large Language Models

by P. Italiani, L. Ragazzi, and G. Moro

Legal question answering (LQA) relies on supervised methods to automatically handle law-related queries. These solutions require a significant amount of carefully annotated data for training, which makes the process very costly. Although large language models (LLMs) show promise in zero-shot QA, their computational demands limit their practical use, making specialized small language models (SLMs) more favorable. Furthermore, the growing interest in synthetic data generation has recently surged, spurred by the impressive generation capabilities of LLMs. This paper presents Ace-Attorney, an LLM distillation approach devised to develop LQA data and supervised models without human annotation. Given a textual prompt, a frozen LLM generates artificial examples that are used as knowledge to train a student SLM with an order of magnitude fewer parameters. Taking into account a realistic retrieval-based scenario to fetch the correct document for answer generation, we propose Selective Generative Paradigm, a novel approach designed to improve retrieval efficacy. Extensive experiments demonstrate the effectiveness and efficiency of distilled models on Syn-LeQA, our human-free synthetic dataset, and a public expert-annotated corpus. Notably, by using only a few dozen training samples, our best SLM achieves LLM-comparable performance with ≈1200% less CO2 emissions.

  • The paper will be available soon!