Motivations and context

Semantic and thematic spaces are vector spaces used for the representation of words, sentences or textual documents. The corresponding models and methods have a long history in the field of computational linguistics and natural language processing. Almost all models rely on the hypothesis of statistical semantics which states that: statistical schemes of appearance of words (context of a word) can be used to describe the underlying semantics. The most used method to learn these representations is to predict a word using the context in which this word appears [Mikolov et al., 2013b, Pennington et al., 2014], and this can be realized with neural networks. These representations have proved their effectiveness for a range of natural language processing tasks [Baroni et al., 2014]. In particular, Mikolov’s Skip-gram and CBOW models et al. [Mikolov et al., 2013b, Mikolov et al., 2013a] have become very popular because of their ability to process large amounts of unstructured text data with reduced computing costs. The efficiency and the semantic properties of these representations motivate us to explore these semantic representations for our speech recognition system.

Robust automatic speech recognition (ASR) is always a very ambitious goal. Despite constant efforts and some dramatic advances, the ability of a machine to recognize the speech is still far from equaling that of the human being. Current ASR systems see their performance significantly decrease when the conditions under which they were trained and those in which which they are used differ. The causes of variability may be related to the acoustic environment, sound capture equipment, microphone change, etc.


The speech recognition (ASR) stage will be supplemented by a semantic analysis to detect the words of the processed sentence that could have been misrecognized and to find words having similar pronunciation and matching better the context. For example, the sentence « Silvio Berlusconi, prince de  Milan » can be recognized by the speech recognition system as : « Silvio Berlusconi, prince de mille ans ». Good semantic context representation of the sentence could help to find and correct this error.

The Master internship will be devoted to the innovative study of the taking into account of semantics through predictive representations that capture the semantic features of words and their context. Research will be conducted on the combination of semantic information with information from denoising to improve speech recognition. As deep neural networks (DNNs) can model complex functions and get outstanding performance, they will be used in all our modeling.


[Deng, 2014] Deng, L. Deep learning: Methods and applications. Foundations and Trends in Signal Processing, 7(3-4), 197–387, 2014.

[Goodfellow et al., 2016] Goodfellow, I., Bengio, Y., & Courville, A. Deep Learning. MIT Press., 2016.

[Mikolov et al., 2013a] Mikolov, T. Chen, K., Corrado, G., and Dean, J. Efficient estimation of word representations in vector space, CoRR, vol. abs/1301.3781, 2013.

[Mikolov et al., 2013b] Mikolov, T., Sutskever, I., Chen, T. Corrado, G.S.,and Dean, J. Distributed representations of words and phrases and their compositionality, in Advances in Neural Information Processing Systems, 2013, pp. 3111–3119.

[Pennington et al., 2014] Pennington, J., Socher, R., and Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532-1543.

[Povey et al, 2011] Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlıcek, P., Qian, Y., Schwarz, Y., Silovsky, J., Stemmer, G., Vesely, K. The Kaldi Speech Recognition Toolkit, Proc. ASRU, 2011.

[Sheikh, 2016] Sheikh, I. Exploitation du contexte sémantique pour améliorer la reconnaissance des noms propres dans les documents audio diachroniques”, These de doctorat en Informatique, Université de Lorraine, 2016.

[Sheikh et al., 2016] Sheikh, I. Illina, I. Fohr, D. Linares, G. Learning word importance with the neural bag-of-words model, in Proc. ACL Representation Learning for NLP (Repl4NLP) Workshop, Aug 2016.


Deep Learning, Natural Language Processing, Vision, Semantic Relations
Inria Nancy - Grand Est
54600 Villers-lès-Nancy  
Équipe de recherche
Date de début souhaitée
Langues obligatoires
Bac +4

Required skills: background in statistics, natural language processing and computer program skills (Perl, Python). Candidates should email a detailed CV with diploma

4-6 months
570 euros
Date limite
Informations de contact

Irina Illina and Dominique Fohr,