Job offer

CNRS
2 days ago

Role details

Contract type
Temporary contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English

Job location

Tech stack

Artificial Intelligence
Artificial Neural Networks
Bioinformatics
Computer Programming
Linux
Python
Machine Learning
NumPy
Population Genetics
Scientific Computating
PyTorch
Large Language Models
Deep Learning
Scikit Learn

Job description

We are looking for a motivated postdoctoral researcher to join the AI for Genome Interpretation (AI4GI) group at the IGMM (CNRS, Montpellier) for 12 months. The contract can be renewed for extra 36 months if results allow it to access subsequent funding steps. The project :this project pioneers a new paradigm of General Genome Interpretation (GenGI) models by combining DNA Large Language Models (DLLMs) with Deep Neural Networks to predict human phenotypes directly from Whole Exome Sequencing samples from the UKBiobank. The project aims at the wide-spectrum prediction of human phenotypes, unlocking new frontiers in clinical genetics, precision medicine, disease risk prediction, and Explainable AI on genomics data

The candidate will:

  • Start by familiarizing with existing research and methods for genome interpretation
  • Familiarize with the sequencing data and its pre-processing
  • Study how DNA LLM work, and develop solutions to integrate them into the neural network architectures developed by the lab.
  • Focus on developing new solutions for the scalability of neural networks and large language models to whole genome sequencing data
  • Develop algorithms and neural network architectures for the prediction of structured outputs (i.e. trees, graphs)
  • Implement and develop methods for the interpretation of neural network predictions and outputs, including concept-based activation and conterfactual analyses.

Requirements

We are looking for a motivated and curious candidate, with a strong background in the development of machine learning methods for bioinformatics. The project focuses on the development of new neural network architectures to perform inference on sequencing data. Bioinformatics and Genome Interpretation are multi-disciplinary and rapidly evolving fields. Therefore, the candidate is expected to 1) be eager to continuously learn new skills, methods and concepts, and 2) to enjoy finding new solutions in the face of new and unforeseen difficulties. The ideal candidate has very good 1) python programming skills, 2) understanding of the mathematical foundations and principles of Machine Learning, Linear Algebra (vectorial and matricial operations, optimization), with a particular focus on Neural Networks (pytorch), 3) problem solving skills, 4) familiarity with GNU/Linux environment. A good understanding of the basic concepts of genetics and biology is not necessary but welcome. The project will consist in developing un-orthodox Neural Network models with Pytorch. At least the B2 level of English is required. Skills required: We are looking for someone with:

  • Strong background in neural networks, machine learning, linear algebra and an understanding of statistics.
  • Solid programming skills in Python and in scientific computing (PyTorch, scikit-learn, numpy, etc).
  • Familiarity with GNU/Linux.
  • Problem solving skills.
  • Good communication and teamwork skills.
  • Familiarity with GWAS, population genetics, or bioinformatics pipelines are a plus.
  • Experience with the processing of genomic biological data (whole exome or genome sequencing) is a plus Additional comments

About the company

The position is based at the Institute of Molecular Genetics of Montpellier (IGMM UMR5535, CNRS and University of Montpellier), in a highly international and interdisciplinary research environment. Montpellier is a dynamic Mediterranean city with an exceptional environment, culture and quality of life. It is home to numerous high-quality research institutes and Montpellier University, a vibrant 70,000 student population and one of the world's oldest medical schools. (https://www.igmm.cnrs.fr/en/) The Lab: The work will be carried out in the AI for Genome Interpretation (AI4GI) group, led by Dr. Daniele Raimondi. The group focuses on the development of advanced artificial intelligence and machine learning methods for genome interpretation, with a particular emphasis on modeling the relationship between genetic variation and phenotypic outcomes. AI4GI develops tailor-made neural network architectures, including sparse and biologically informed models, to predict disease risk and complex quantitative traits from large-scale genomic data such as whole-genome and exome sequencing. The group also integrates multi-omics data through nonlinear data fusion approaches and leverages pre-trained DNA language models as unsupervised feature extractors to improve predictive performance and interpretability.

Apply for this position