Menu principal

Black-box optimization of deep convolutive neural network acoustic models

Start: Spring 2017
To apply: send a CV, a letter of motivation, and your BSc/MSc transcripts to Romain Serizel and Odile Mella

Automatic speech recognition relies on an acoustic model that relates the speech signal at a given time to the phoneme pronounced. State-of-the-art acoustic models are based on deep neural networks (DNNs) [1]. Similarly to other machine learning techniques, one of the difficulties arising when designing a DNN is to fix the values of the hyper-parameters, including the choice of input features (waveform, logmel), the number of layers, the number of neurons per layer, the initial learning rate, etc. DNNs typically involve 10 or more hyper-parameters to be fixed.

Black-box optimization techniques such as the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) [2] have been successfully used to optimize the hyper-parameters of multilayer perceptron acoustic models [3]. The goal of this internship is to extend this approach to state-of-the-art acoustic models based on deep convolutive neural networks (CNNs), including highway networks [4] and wide residual networks [5]. Sequence discriminative training [6] and speaker adaptation will be taken into account. The results will be evaluated on challenging speech recognition benchmarks, such as CHiME [7] or MGB [8].

[1] L. Deng and D. Yu, Deep Learning: Methods and Applications, NOW Publishers, 2014.

[2] N. Hansen, S.D. Müller, and P. Koumoutsakos, “Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES)”, Evolutionary Computation, 11(1):1–18, 2003.

[3] T. Moriya, T. Tanaka, T. Shinozaki, S. Watanabe, and K. Duh, “Automation of system building for state-of-the-art large vocabulary speech recognition using evolution strategy”, in Proc. ASRU, pp. 610–616, 2015.

[4] R.K. Srivastava, K. Greff, and J. Schmidhuber, “Highway networks”, arXiv:1505.00387, 2015.

[5] S. Zagoruyko and N. Komodakis, “Wide residual networks”, arXiv:1605.07146, 2016.

[6] L. Lu, “Sequence training and adaptation of highway deep neural networks”, arXiv:1607.01963, 2016.

[7] http://spandh.dcs.shef.ac.uk/chime_challenge/

[8] http://www.mgb-challenge.org/


Établissement
Inria Nancy - Grand Est
54600 Villers-lès-Nancy  
Équipe de recherche
Multispeech
Site Web
https://team.inria.fr/multispeech/internship-on-black-box-optimization-of-deep-convolutive-neural-network-acoustic-models/
Langues obligatoires
Anglais
Niveau
Bac +4; Bac +5; Bac +8
Prérequis

BSc in computer science, machine learning, or a related field. MSc/PhD ongoing. Programming experience in Python.
Experience with deep learning toolkits (Theano, Tensorflow, Keras, Chainer…) and Kaldi is a plus.

Durée
4 à 6 mois
Indemnité
gratification
Informations de contact

romain.serizel@loria.fr
odile.mella@loria.fr