Historikk

Cristin-resultat-ID: 1318323

Sist endret: 18. oktober 2016, 11:19

Resultat

Doktorgradsavhandling

2015

Nearest Neighbor Frame Classification for Articulatory Speech Recognition

Arild Brandrud Næss

Utgiver/serie Utgiver/serie

Utgiver

Norges teknisk-naturvitenskapelige universitet

NVI-nivå 0

Finn i kanalregisteret

Serie

Doktoravhandlinger ved NTNU

ISSN 1503-8181

NVI-nivå 0

Finn i kanalregisteret

Om resultatet Om resultatet

Doktorgradsavhandling

Publiseringsår: 2015

Hefte: 24

Antall sider: 173

ISBN: 978-82-326-0719-8

Lenker Lenker

ORIA

Søk i ORIA med 978-82-326-0719-8

Klassifisering Klassifisering

Vitenskapsdisipliner

Informasjons- og kommunikasjonsteknologi

Emneord

Språkteknologi • Talegjenkjenning • Maskinlæring • Anvendt statistikk

Beskrivelse Beskrivelse

Engelsk

Tittel

Nearest Neighbor Frame Classification for Articulatory Speech Recognition

Sammendrag

The paradigm of phone-based hidden Markov models has dominated automatic speech recognition since the early 1980s, and continuous improvements of this approach combined with the exponential increase in computational power have led to impressive improvements in the performance of such systems in the past 30 years. Of late, however, these gains have seemed to level off, and there is a growing interest in exploring alternative paradigms. This thesis concerns itself with two of these newer approaches: articulatory speech recognition and exemplar-based methods. Articulatory speech recognition considers speech not as a sequence of phones, but as an interplay between our articulators—the lips, the tongue, the glottis and the velum. This explicit modeling of the pronunciation process in the statistical framework of the speech recognizer allows for a better model of the pronunciation variation that occurs, particularly in spontaneous speech. Exemplar-based methods is a common name for all ways of using the training data directly rather than fitting a global statistical model to it. Most of these methods are based on finding nearest neighbors among the observation vectors. The main focus of this thesis is on the frame classification of articulatory features by nearest neighbors, and on using this classification to produce input feature vectors for two transcription systems. We consider nearest neighbor-based frame-level classification of a multi-valued set of articulatory features (AFs) inspired by the vocal tract variables of articulatory phonology. This entails that, for each frame of the audio signal, we try to determine the value of each of our eight AFs at the corresponding point in time. Partly for comparison purposes, we do a frame classification of phones in the same way. We explore a variety of linear and nonlinear transformations of the observation vectors, and use the k nearest neighbors in the resulting vector space to do the classification. Our best results compare favorably to a multilayer perceptron (MLP) baseline. Based on our k-nearest neigbhor (k-NN) frame classification, we make posterior-like feature vectors, which we incorporate into two systems for automatic transcription. The first of these is a conditional random field (CRF) for forced transcription of our set of AFs. The performance of our k-NN-based features in the CRF system is better than that of MLP-based features for most of the AFs, and on par with it for the rest of them. The second transcription system is a standard tandem hidden Markov model for phone recognition, where the k-NN-based features do not do as well as the MLP-based ones. Nevertheless, we argue that the flexibility and transparency of k-NN classification make it a very promising approach for articulatory speech recognition.

Vis fullstendig beskrivelse

Bidragsytere Bidragsytere

Arild Brandrud Næss

Forfatter
ved NTNU Handelshøyskolen ved Norges teknisk-naturvitenskapelige universitet

Torbjørn Karl Svendsen

Veileder
ved Institutt for elektroniske systemer ved Norges teknisk-naturvitenskapelige universitet
Veileder
ved IE fakultetsadministrasjon ved Norges teknisk-naturvitenskapelige universitet

Karen Livescu

Veileder
ved University of Chicago

1 - 3 av 3