A Large-Scale Dataset for Biomedical Keyphrase Generation - l'unam - université nantes angers le mans Accéder directement au contenu
Communication Dans Un Congrès Année : 2022

A Large-Scale Dataset for Biomedical Keyphrase Generation

Résumé

Keyphrase generation is the task consisting in generating a set of words or phrases that highlight the main topics of a document. There are few datasets for keyphrase generation in the biomedical domain and they do not meet the expectations in terms of size for training generative models. In this paper, we introduce kp-biomed, the first large-scale biomedical keyphrase generation dataset with more than 5M documents collected from PubMed abstracts. We train and release several generative models and conduct a series of experiments showing that using large scale datasets improves significantly the performances for present and absent keyphrase generation. The dataset is available under CC-BY-NC v4.0 license at https://huggingface.co/datasets/taln-ls2n/kpbiomed.
Fichier principal
Vignette du fichier
15_Paper-2.pdf (174.78 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03959383 , version 1 (27-01-2023)

Identifiants

Citer

Mael Houbre, Florian Boudin, Beatrice Daille. A Large-Scale Dataset for Biomedical Keyphrase Generation. 13th International Workshop on Health Text Mining and Information Analysis (LOUHI 2022), Dec 2022, Abu-Dhabi, United Arab Emirates. ⟨hal-03959383⟩
52 Consultations
19 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More