Cristin-resultat-ID: 1935485
Sist endret: 17. september 2021, 14:55
Resultat
Mastergradsoppgave
2021

Appling active learning techniques in machine learning to minimize labeling effort

Bidragsytere:
  • Martin Haug

Utgiver/serie

Utgiver

Fakultet for informasjonsteknologi og elektroteknikk, Institutt for teknisk kybernetikk

Om resultatet

Mastergradsoppgave
Publiseringsår: 2021
Antall sider: 122

Klassifisering

Vitenskapsdisipliner

Teknologi

Emneord

Deep learning • Aktiv læring • Maskinlæring • Semisupervised deep learning

Fagfelt (NPI)

Fagfelt: Elektronikk og kybernetikk
- Fagområde: Realfag og teknologi

Beskrivelse Beskrivelse

Tittel

Appling active learning techniques in machine learning to minimize labeling effort

Sammendrag

The most prominent machine learning (ML) methods for classification rely heavily on a massive amount of labeled data to create and train neural network classifier models that perform their tasks accurately. With the complex structure of planktonic species and an immense amount of data captured from autonomous underwater vehicles (AUVs), a large burden is placed on the domain experts for plankton taxa labeling. Active Learning (AL) is an ML paradigm that reduces this manual effort by proposing algorithms that support the construction of the training datasets, thus enlarging the sets while minimizing human involvement. To build the training set, AL methods apply heuristics to select a subset of images, i.e., samples, from the entire data. The applied AL algorithm should select samples that capture the common statistical patterns or feature space and are likely to include all the information needed for the training and the learning processes. In addition, the algorithm should prioritize samples that are likely belonging to multiple classes, i.e., having close inter-class boundaries, and might lead to model confusion. Many of the current AL approaches fail to incorporate both types of samples representing the statistical pattern and the samples in which the particular machine learning model is uncertain about. Inspired by these limitations, this thesis presents a novel framework that combines these two types of sampling to utilize the full data distribution, prevent redundant sampling from correlated queries, and fine-tune the inter-class decision boundary. The results from extensive experiments on the proposed framework and methods from the AL literature show that several of the methods lack robustness to different experimental conditions. However, the proposed hybrid framework proves to be robust and accurate on complex active learning tasks and competitive with other active learning strategies under various experimental conditions. The thesis further shows that the employment of a data augmentation module enhances the overall classification performance and in particular can benefit the sampling strategy in an AL framework.

Bidragsytere

Martin Haug

  • Tilknyttet:
    Forfatter
    ved Institutt for teknisk kybernetikk ved Norges teknisk-naturvitenskapelige universitet

Annette Stahl

  • Tilknyttet:
    Veileder
    ved Institutt for teknisk kybernetikk ved Norges teknisk-naturvitenskapelige universitet

Aya Saad

  • Tilknyttet:
    Veileder
    ved Institutt for teknisk kybernetikk ved Norges teknisk-naturvitenskapelige universitet
1 - 3 av 3