Job offer
Role details
Job location
Tech stack
Job description
We are looking for a motivated postdoctoral researcher to join the AI for Genome Interpretation (AI4GI) group at the IGMM (CNRS, Montpellier) for 12 months. The contract can be renewed for extra 36 months if results allow it to access subsequent funding steps. The project :this project pioneers a new paradigm of General Genome Interpretation (GenGI) models by combining DNA Large Language Models (DLLMs) with Deep Neural Networks to predict human phenotypes directly from Whole Exome Sequencing samples from the UKBiobank. The project aims at the wide-spectrum prediction of human phenotypes, unlocking new frontiers in clinical genetics, precision medicine, disease risk prediction, and Explainable AI on genomics data
The candidate will:
- Start by familiarizing with existing research and methods for genome interpretation
- Familiarize with the sequencing data and its pre-processing
- Study how DNA LLM work, and develop solutions to integrate them into the neural network architectures developed by the lab.
- Focus on developing new solutions for the scalability of neural networks and large language models to whole genome sequencing data
- Develop algorithms and neural network architectures for the prediction of structured outputs (i.e. trees, graphs)
- Implement and develop methods for the interpretation of neural network predictions and outputs, including concept-based activation and conterfactual analyses.
Requirements
We are looking for a motivated and curious candidate, with a strong background in the development of machine learning methods for bioinformatics. The project focuses on the development of new neural network architectures to perform inference on sequencing data. Bioinformatics and Genome Interpretation are multi-disciplinary and rapidly evolving fields. Therefore, the candidate is expected to 1) be eager to continuously learn new skills, methods and concepts, and 2) to enjoy finding new solutions in the face of new and unforeseen difficulties. The ideal candidate has very good 1) python programming skills, 2) understanding of the mathematical foundations and principles of Machine Learning, Linear Algebra (vectorial and matricial operations, optimization), with a particular focus on Neural Networks (pytorch), 3) problem solving skills, 4) familiarity with GNU/Linux environment. A good understanding of the basic concepts of genetics and biology is not necessary but welcome. The project will consist in developing un-orthodox Neural Network models with Pytorch. At least the B2 level of English is required. Skills required: We are looking for someone with:
- Strong background in neural networks, machine learning, linear algebra and an understanding of statistics.
- Solid programming skills in Python and in scientific computing (PyTorch, scikit-learn, numpy, etc).
- Familiarity with GNU/Linux.
- Problem solving skills.
- Good communication and teamwork skills.
- Familiarity with GWAS, population genetics, or bioinformatics pipelines are a plus.
- Experience with the processing of genomic biological data (whole exome or genome sequencing) is a plus Additional comments