Cristin-resultat-ID: 1734681
Sist endret: 2. mars 2020, 10:05
NVI-rapporteringsår: 2019
Resultat
Vitenskapelig Kapittel/Artikkel/Konferanseartikkel
2019

To Lemmatize or Not to Lemmatize: How Word Normalisation Affects ELMo Performance in Word Sense Disambiguation

Bidragsytere:
  • Andrei Kutuzov og
  • Elizaveta Kuzmenko

Bok

Proceedings of the First NLPL Workshop on Deep Learning for Natural Language Processing
ISBN:
  • 978-91-7929-999-6

Utgiver

Linköping University Electronic Press
NVI-nivå 1

Om resultatet

Vitenskapelig Kapittel/Artikkel/Konferanseartikkel
Publiseringsår: 2019
Sider: 22 - 28
ISBN:
  • 978-91-7929-999-6

Klassifisering

Fagfelt (NPI)

Fagfelt: Lingvistikk
- Fagområde: Humaniora

Beskrivelse Beskrivelse

Tittel

To Lemmatize or Not to Lemmatize: How Word Normalisation Affects ELMo Performance in Word Sense Disambiguation

Sammendrag

In this paper, we critically evaluate the widespread assumption that deep learning NLP models do not require lemmatized input. To test this, we trained versions of contextualised word embedding ELMo models on raw tokenized corpora and on the corpora with word tokens replaced by their lemmas. Then, these models were evaluated on the word sense disambiguation task. This was done for the English and Russian languages. The experiments showed that while lemmatization is indeed not necessary for English, the situation is different for Russian. It seems that for rich-morphology languages, using lemmatized training and testing data yields small but consistent improvements: at least for word sense disambiguation. This means that the decisions about text pre-processing before training ELMo should consider the linguistic nature of the language in question.

Bidragsytere

Andrei Kutuzov

  • Tilknyttet:
    Forfatter
    ved Forskningsgruppen for språkteknologi ved Universitetet i Oslo

Elizaveta Kuzmenko

  • Tilknyttet:
    Forfatter
    ved Università degli Studi di Trento
1 - 2 av 2

Resultatet er en del av Resultatet er en del av

Proceedings of the First NLPL Workshop on Deep Learning for Natural Language Processing.

Nivre, Joakim; Derczynski, Leon; Ginter, Filip; Lindi, Bjørn; Oepen, Stephan; Søgaard, Anders; Tidemann, Jorg. 2019, Linköping University Electronic Press. UIOVitenskapelig antologi/Konferanseserie
1 - 1 av 1