Hybrid Hardware & Software Support Engineer - HPC

Atos

16 days ago

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Intermediate

Job location

Remote

Tech stack

Artificial Intelligence

Bash

Configuration Management

Debian Linux

Linux

General Parallel File Systems

Icinga

InfiniBand

Python

Kernel-Based Virtual Machine

Routing

OpenStack

Red Hat Enterprise Linux - RHEL

Ansible

Prometheus

TCP/IP

Virtualization Technology

Ceph

Scripting (Bash/Python/Go/Ruby)

Grafana

GIT

Kubernetes

Information Technology

Slurm

Puppet

Software Version Control

Docker

Job description

Primarily on-site at a customer facility near Reading, Berkshire, with occasional support for additional HPC installations across Europe., Bull's High-Performance Computing (HPC), Artificial Intelligence & Quantum Business Unit is seeking a Hybrid Hardware & Software Support Engineer to join our HPC Services team. This is a highly visible, customer-facing operational role supporting advanced HPC infrastructures in the UK. You will work across computing, storage, and networking layers, ensuring the deployment, stability, and performance of large-scale Linux-based systems. While prior HPC experience is an advantage, it is not mandatory - strong Linux and infrastructure engineers eager to grow into HPC & AI are encouraged to apply., Deployment & System Bring-Up

Install, configure, and integrate HPC cluster components (compute, storage, networking).
Perform system installation, initial configuration, and operational readiness checks.
Apply patches, updates, and conduct routine maintenance activities.

Hybrid Hardware & Software Support

Provide Level 1 and Level 2 operational support for HPC environments.
Diagnose and resolve issues involving:
Linux operating systems
Enterprise server hardware
High-speed interconnects
Storage subsystems

Conduct root cause analysis and implement corrective actions.

Escalate appropriately within the global support organisation when needed. Operations & Incident Handling

Monitor system health and respond to incidents proactively.
Perform troubleshooting in secure, mission-critical environments.
Maintain detailed and accurate documentation of incidents and resolutions.

Customer Interface

Act as the primary technical contact on-site.
Communicate effectively regarding incidents, planned maintenance, and system status.
Build trusted relationships with customer technical stakeholders.
Represent Bull professionally in sensitive and high-profile environments.

Requirements

Do you have experience in Virtualization?, * Strong Linux expertise (RedHat and/or Debian-based environments)

Solid understanding of enterprise server hardware (CPU, memory, storage, diagnostics)
Scripting skills in Bash and/or Python
Strong networking fundamentals (TCP/IP, routing, switching, security basics)
Hands-on experience with infrastructure deployment, configuration, and maintenance
Excellent troubleshooting and analytical abilities
Proactive mindset and ability to work independently

Desirable Skills & Experience Valuable, but not mandatory:

Experience with HPC clusters
High-speed networking (40/100GbE, InfiniBand)
Virtualisation technologies (KVM, OpenStack)
Storage systems (Ceph, SAN/NAS)
Parallel filesystems (Lustre, GPFS, BeeGFS)
Containers (Docker, Podman, Kubernetes)
Configuration management (Ansible, Puppet)
Monitoring and observability tools (Prometheus, Grafana, Icinga)
Workload managers (Slurm, PBS Pro)
Git version control, * Is hands-on, operationally focused, and detail oriented
Thrives in secure, mission-critical environments
Approaches troubleshooting methodically, even under pressure
Communicates clearly with both technical and non-technical stakeholders
Takes full ownership of incidents through to resolution
Is motivated to learn continuously and expand their technical expertise

Education & Experience Option 1:

Degree in Computer Science, Engineering, or related field + at least 2 years of relevant experience

Option 2:

5+ years of relevant industry experience

Strong early-career candidates with solid technical foundations will also be considered.

Benefits & conditions

Working on advanced HPC and digital infrastructure projects
Continuous learning and technical skill development
Career growth within a global technology organisation
Participation in internal initiatives and community-focused activities.

About the company

Bull is the Atos Group brand for high-performance computing, artificial intelligence and quantum innovations with 2,500 employees. Built on an open, end-to-end and trusted foundation, Bull designs, deploys and runs hardware and software while providing strategic services that unlock enterprise value, accelerate scientific research and drive society forward. Driven by world-class R&D with 1,500 patents, manufacturing capabilities and data science, Bull enables nations and industries to fully control their AI and data, advancing progress for the benefit of the planet. For more information, please visit our website and follow us on Instagram, LinkedIn, X, and Youtube. About Atos Group Atos Group is a global leader in digital transformation with c. 63,000 employees and annual revenue of c. €8 billion, operating in 61 countries under two brands - Atos for services and Eviden for products. European number one in cybersecurity, cloud and high-performance computing, Atos Group is committed to a secure and decarbonized future and provides tailored AI-powered, end-to-end solutions for all industries. Atos Group is the brand under which Atos SE (Societas Europaea) operates. Atos SE is listed on Euronext Paris. The purpose of Atos Group is to help design the future of the information space. Its expertise and services support the development of knowledge, education and research in a multicultural approach and contribute to the development of scientific and technological excellence. Across the world, the Group enables its customers and employees, and members of societies at large to live, work and develop sustainably, in a safe and secure information space