Menu principal

Research Engineer in Text mining/NLP

JOB DESCRIPTION

A major challenge for ecologists is to understand and predict the ecological consequences of climate change, land use change and disturbances. To meet this challenge, we need to account not only for environmental change effects on species performances and ranges but also for effects on species interactions. Indeed, species interactions are at the bottom of ecosystem functioning, like e.g. prey-predator or plant-pollinators interactions. It has been argued that determining the direction and magnitude of global change impacts on species interactions remains one of the greatest challenges for forecasting community and ecosystem dynamics. To reach this goal, new molecular techniques based on environmental DNA allow to collet huge amounts of data on species occurring, but not on species interactions. One possible way to face this challenge is to explore scientific resources to extract and organize the available knowledge.

Hence, the objective of the present work is the conception and implementation of a workflow that extracts targeted information on biological interactions from unstructured scientific knowledge.

To achieve that, the candidate will collaborate with biodiversity experts and modellers to define an initial vocabulary to be used as a glossary of keywords for the automated retrieval of scientific texts by web crawling tools. This glossary might be refined eventually along the process using reinforcement learning. Selection of articles of interest will be achieved via a content-based document classification approach applied to public abstracts. As the last task requires training machine learning models on large corpus of data, the candidate may rely on existing software or transfer learning from pre-trained models. This part will allow the construction of a corpus of documents describing biological interactions while laying foundation for a configurable methodological workflow. Collected documents will be fed to information extraction pipelines, adapted by the candidate to the purpose of extracting information on living organisms and their interactions.

 

This work takes place in the GlobNets project (https://anr.fr/Projet-ANR-16-CE02-0009) coordinated by W. Thuiller (UMR LECA Grenoble). The main objective of the GlobNets project is to decipher multi-trophic assemblages at biogeographic scales and to understand their responses to spatial segregation, environmental gradients and/or human activities. To do so, GlobNets builds on environmental DNA metabarcoding and new mathematical and computer science developments. GlobNets collects an unprecedented multi-trophic assemblage dataset of soil biodiversity across multiple forest plots along gradients of climate and land-use pressure in thirteen distinct forest sites around the globe (tropical, temperate and boreal forests).

SELECTION CRITERIA

Candidates should hold a PhD in computer science or a related field and possess a very good working knowledge of Text Mining and Machine learning. Experience with web-based information retrieval and reinforcement learning would be highly appreciated, as would an interest for ecological applications and interdisciplinary research.

TERMS AND TENURE

This 15-month position will be based at the Eco&Sols unit, Montpellier, France. The net salary will be about 2200 euro per month (depending on experience). The target start date for the position is September 2019, with some flexibility on the exact start date. HOW TO APPLY Applicants are requested to submit the following materials:

• A cover letter applying for the position

• Full CV and list of publications

• Academic transcripts (unofficial versions are fine)

 

Deadline for application is July, 5, 2019.

Applicants will be interviewed by an Ad Hoc Commission by July, 22, 2019.

Applications are only accepted through email. All documents must be sent to mickael.hedde@inra.fr


Mots-clés
fouille de textes; nlp; Semantic Web
Établissement
Institut National de la Recherche Agronomique (INRA)
75338 Paris  
Site Web
https://anr.fr/Project-ANR-16-CE02-0009
Date de début souhaitée
02/09/2019
Langues obligatoires
Anglais; Français
Type de contrat
CDD
Type de poste
Postdoc
Prérequis

Candidates should hold a PhD in computer science or a related field and possess a very good working knowledge of Text Mining and Machine learning. Experience with web-based information retrieval and reinforcement learning would be highly appreciated, as would an interest for ecological applications and interdisciplinary research.

Salaire indicatif
2200+
Date limite
06/07/2019
Informations de contact

Mickael HEDDE, mickael.hedde@inra.fr