Menu principal

Black-box optimization of deep convolutive neural network acoustic models

Start: Spring 2017
To apply: send a CV, a letter of motivation, and your BSc/MSc transcripts to Romain Serizel and Odile Mella

Automatic speech recognition relies on an acoustic model that relates the speech signal at a given time to the phoneme pronounced. State-of-the-art acoustic models are based on deep neural networks (DNNs) [1]. Similarly to other machine learning techniques, one of the difficulties arising when designing a DNN is to fix the values of the hyper-parameters, including the choice of input features (waveform, logmel), the number of layers, the number of neurons per layer, the initial learning rate, etc. DNNs typically involve 10 or more hyper-parameters to be fixed.

Black-box optimization techniques such as the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) [2] have been successfully used to optimize the hyper-parameters of multilayer perceptron acoustic models [3]. The goal of this internship is to extend this approach to state-of-the-art acoustic models based on deep convolutive neural networks (CNNs), including highway networks [4] and wide residual networks [5]. Sequence discriminative training [6] and speaker adaptation will be taken into account. The results will be evaluated on challenging speech recognition benchmarks, such as CHiME [7] or MGB [8].

[1] L. Deng and D. Yu, Deep Learning: Methods and Applications, NOW Publishers, 2014.

[2] N. Hansen, S.D. Müller, and P. Koumoutsakos, “Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES)”, Evolutionary Computation, 11(1):1–18, 2003.

[3] T. Moriya, T. Tanaka, T. Shinozaki, S. Watanabe, and K. Duh, “Automation of system building for state-of-the-art large vocabulary speech recognition using evolution strategy”, in Proc. ASRU, pp. 610–616, 2015.

[4] R.K. Srivastava, K. Greff, and J. Schmidhuber, “Highway networks”, arXiv:1505.00387, 2015.

[5] S. Zagoruyko and N. Komodakis, “Wide residual networks”, arXiv:1605.07146, 2016.

[6] L. Lu, “Sequence training and adaptation of highway deep neural networks”, arXiv:1607.01963, 2016.



Inria Nancy - Grand Est
54600 Villers-lès-Nancy  
Équipe de recherche
Site Web
Langues obligatoires
Bac +4; Bac +5; Bac +8

BSc in computer science, machine learning, or a related field. MSc/PhD ongoing. Programming experience in Python.
Experience with deep learning toolkits (Theano, Tensorflow, Keras, Chainer…) and Kaldi is a plus.

4 à 6 mois
Informations de contact