NLG-Metricverse

An End-to-End Library for Evaluating Natural Language Generation

NLG-Metricverse

An End-to-End Library for Evaluating Natural Language Generation

Giacomo Frisoni, Antonella Carbonaro, Gianluca Moro, Andrea Zammarchi, Marco Avagnano

Proceedings of the 29th International Conference on Computational Linguistics (COLING-22)

Description

Driven by deep learning breakthroughs, natural language generation (NLG) models have been at the center of steady progress in the last few years, with a ubiquitous task influence. However, since our ability to generate human-indistinguishable artificial text lags behind our capacity to assess it, it is paramount to develop and apply even better automatic evaluation metrics. To facilitate researchers to judge the effectiveness of their models broadly, we introduce NLG-Metricverse—an end-to-end open-source library for NLG evaluation based on Python. Our framework provides a living collection of NLG metrics in a unified and easy-to-use environment, supplying tools to efficiently apply, analyze, compare, and visualize them. This includes (i) the extensive support to heterogeneous automatic metrics with n-arity management, (ii) the meta-evaluation upon individual performance, metric-metric and metric-human correlations, (iii) graphical interpretations for helping humans better gain score intuitions, (iv) formal categorization and convenient documentation to accelerate metrics understanding. NLG-Metricverse aims to increase the comparability and replicability of NLG research, hopefully stimulating new contributions in the area.

Keywords: natural language generation, evaluation metrics, meta-evaluation, language models.

Citing

If you use NLG-Metricverse in your research, please cite NLG-Metricverse: An End-to-End Library for Evaluating Natural Language Generation.

@inproceedings{frisoni-etal-2022-nlg,
    title = "{NLG}-Metricverse: An End-to-End Library for Evaluating Natural Language Generation",
    author = "Frisoni, Giacomo  and
      Carbonaro, Antonella  and
      Moro, Gianluca  and
      Zammarchi, Andrea  and
      Avagnano, Marco",
    booktitle = "Proceedings of the 29th International Conference on Computational Linguistics",
    month = oct,
    year = "2022",
    address = "Gyeongju, Republic of Korea",
    publisher = "International Committee on Computational Linguistics",
    url = "https://aclanthology.org/2022.coling-1.306",
    pages = "3465--3479",
    abstract = "Driven by deep learning breakthroughs, natural language generation (NLG) models have been at the center of steady progress in the last few years, with a ubiquitous task influence. However, since our ability to generate human-indistinguishable artificial text lags behind our capacity to assess it, it is paramount to develop and apply even better automatic evaluation metrics. To facilitate researchers to judge the effectiveness of their models broadly, we introduce NLG-Metricverse{---}an end-to-end open-source library for NLG evaluation based on Python. Our framework provides a living collection of NLG metrics in a unified and easy-to-use environment, supplying tools to efficiently apply, analyze, compare, and visualize them. This includes (i) the extensive support to heterogeneous automatic metrics with n-arity management, (ii) the meta-evaluation upon individual performance, metric-metric and metric-human correlations, (iii) graphical interpretations for helping humans better gain score intuitions, (iv) formal categorization and convenient documentation to accelerate metrics understanding. NLG-Metricverse aims to increase the comparability and replicability of NLG research, hopefully stimulating new contributions in the area.",
}