Automatic speech recognition relies on an acoustic model that relates the speech signal at a given time to the phoneme pronounced. State-of-the-art acoustic models are based on deep neural networks (DNNs) . Similarly to other machine learning techniques, one of the difficulties arising when designing a DNN is to fix the values of the hyper-parameters, including the choice of input features (waveform, logmel), the number of layers, the number of neurons per layer, the initial learning rate, etc. DNNs typically involve 10 or more hyper-parameters to be fixed.
Black-box optimization techniques such as the Covariance Matrix Adaptation Evolution Strategy (CMA-ES)  have been successfully used to optimize the hyper-parameters of multilayer perceptron acoustic models . The goal of this internship is to extend this approach to state-of-the-art acoustic models based on deep convolutive neural networks (CNNs), including highway networks  and wide residual networks . Sequence discriminative training  and speaker adaptation will be taken into account. The results will be evaluated on challenging speech recognition benchmarks, such as CHiME  or MGB .
 L. Deng and D. Yu, Deep Learning: Methods and Applications, NOW Publishers, 2014.
 N. Hansen, S.D. Müller, and P. Koumoutsakos, “Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES)”, Evolutionary Computation, 11(1):1–18, 2003.
 T. Moriya, T. Tanaka, T. Shinozaki, S. Watanabe, and K. Duh, “Automation of system building for state-of-the-art large vocabulary speech recognition using evolution strategy”, in Proc. ASRU, pp. 610–616, 2015.
 R.K. Srivastava, K. Greff, and J. Schmidhuber, “Highway networks”, arXiv:1505.00387, 2015.
 S. Zagoruyko and N. Komodakis, “Wide residual networks”, arXiv:1605.07146, 2016.
 L. Lu, “Sequence training and adaptation of highway deep neural networks”, arXiv:1607.01963, 2016.