Cristin-resultat-ID: 1924811
Sist endret: 29. mars 2022, 12:46
NVI-rapporteringsår: 2021
Resultat
Vitenskapelig Kapittel/Artikkel/Konferanseartikkel
2021

The interplay between language similarity and script on a novel multi-layer Algerian dialect corpus

Bidragsytere:
  • Samia Touileb og
  • Jeremy Barnes

Bok

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021
ISBN:
  • 978-1-954085-54-1

Utgiver

Association for Computational Linguistics
NVI-nivå 1

Om resultatet

Vitenskapelig Kapittel/Artikkel/Konferanseartikkel
Publiseringsår: 2021
Sider: 3700 - 3712
ISBN:
  • 978-1-954085-54-1

Klassifisering

Fagfelt (NPI)

Fagfelt: Informatikk og datateknikk
- Fagområde: Realfag og teknologi

Beskrivelse Beskrivelse

Tittel

The interplay between language similarity and script on a novel multi-layer Algerian dialect corpus

Sammendrag

Recent years have seen a rise in interest for cross-lingual transfer between languages with similar typology, and between languages of various scripts.However, the interplay be-tween language similarity and difference in script on cross-lingual transfer is a less studied problem. We explore this interplay on cross-lingual transfer for two supervised tasks,namely part-of-speech tagging and sentiment analysis.We introduce a newly annotated corpus of Algerian user-generated comments comprising parallel annotations of Algerian written in Latin, Arabic, and code-switched scripts, as well as annotations for sentiment and topic categories.We perform baseline experiments by fine-tuning multi-lingual language models. We further explore the effect of script vs. language similarity in cross-lingual transfer by fine-tuning multi-lingual models on languages which are a) typologically distinct, but use the same script, b) typologically similar, but use a distinct script, or c) are typo-logically similar and use the same script. We find there is a delicate relationship between script and typology for part-of-speech, while sentiment analysis is less sensitive.

Bidragsytere

Samia Touileb

  • Tilknyttet:
    Forfatter
    ved ML Maskinlæring ved Universitetet i Oslo

Jeremy Claude Barnes

Bidragsyterens navn vises på dette resultatet som Jeremy Barnes
  • Tilknyttet:
    Forfatter
    ved Språkteknologigruppen ved Universitetet i Oslo
1 - 2 av 2

Resultatet er en del av Resultatet er en del av

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.

Zong, Chengqing; Xia, Fei; Li, Wenjie; Navigli, Roberto. 2021, Association for Computational Linguistics. Vitenskapelig antologi/Konferanseserie
1 - 1 av 1