AI Infra Architecture

Luxoft
Charing Cross, United Kingdom
31 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

Charing Cross, United Kingdom

Tech stack

Artificial Intelligence
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Amazon Web Services (AWS)
Architectural Patterns
Cloud Engineering
Databases
Continuous Integration
Data Governance
DevOps
Disaster Recovery
Identity and Access Management
PostgreSQL
Data Logging
Data Storage Technologies
Autoscaling
System Availability
Large Language Models
Generative AI
AWS Lambda
Infrastructure as Code (IaC)
Amazon Web Services (AWS)
Cloudformation
Machine Learning Operations
Functional Programming
Cloudwatch
Terraform

Job description

Design and implement scalable AWS infrastructure to support Generative AI and LLM workloads, including training, fine-tuning, and inference.

Architect secure, high-performance environments using AWS core services such as Amazon SageMaker, Amazon Bedrock, Amazon EKS, AWS Lambda, and related cloud-native components.

Design GPU-based compute environments (e.g., EC2 P-series, G-series) optimized for distributed training, fine-tuning, and low-latency inference.

Implement secure VPC architectures, private endpoints, IAM policies, encryption (KMS), and enterprise-grade data governance controls.

Build and govern MLOps/LLMOps pipelines using SageMaker Pipelines, CodePipeline, and CI/CD best practices.

Architect RAG infrastructure, including vector databases (OpenSearch, Aurora PostgreSQL with pgvector) and scalable storage solutions (S3).

Establish monitoring and observability using CloudWatch, model monitoring tools, logging frameworks, and performance dashboards.

Optimize infrastructure for latency, autoscaling, high availability, and cost efficiency, leveraging Spot Instances, Savings Plans, and right-sizing strategies.

Define disaster recovery (DR) and backup strategies across multi-AZ and multi-region AWS setups.

Implement Infrastructure as Code (IaC) using Terraform or CloudFormation for consistent, repeatable provisioning of AI environments.

Collaborate with AI/ML teams to support LLM fine-tuning, prompt orchestration, inference endpoints, and model deployment workflows.

Stay current with AWS GenAI advancements, evaluating new services, architectural patterns, and best practices for enterprise adoption.

Requirements

Do you have experience in Terraform?, Do you have a Master's degree?, We are seeking an experienced AI Infrastructure Architect with deep expertise in designing and operating scalable, secure, and high-performance cloud environments for Generative AI and LLM workloads. This role is ideal for someone who combines strong AWS architectural skills with hands-on experience in GPU compute, MLOps/LLMOps, and enterprise-grade AI platform design. You should bring extensive experience building cloud-native AI infrastructure, optimizing large-scale model training and inference environments, and collaborating closely with AI/ML teams to enable advanced GenAI capabilities. You should bring strong experience in designing complex AI systems, creating detailed technical specifications, and collaborating across multidisciplinary teams to ensure seamless implementation., Must have

Extensive experience (typically 7+ years) in cloud architecture, infrastructure engineering, or platform engineering, with a strong focus on AWS.

Proven expertise designing and operating AI/ML and Generative AI infrastructure at scale.

Deep knowledge of AWS services relevant to AI workloads (SageMaker, Bedrock, EKS, EC2 GPU instances, Lambda, VPC, IAM, KMS, S3).

Hands-on experience with GPU compute, distributed training, and high-performance inference environments.

Strong understanding of MLOps/LLMOps practices, CI/CD pipelines, and model deployment workflows.

Experience architecting secure, compliant, and highly available cloud environments.

Proficiency with Infrastructure as Code (Terraform or CloudFormation).

Familiarity with vector databases, RAG architectures, and scalable data storage patterns.

Strong collaboration skills and the ability to work closely with AI/ML, DevOps, and engineering teams.

Excellent documentation and communication skills.

About the company

Luxoft, a DXC Technology Company, (NYSE: DXC), is a digital strategy and software engineering firm providing bespoke technology solutions that drive business change for customers the world over. Luxoft uses technology to enable business transformation, enhance customer experiences, and boost operational efficiency through its strategy, consulting, and engineering services. Luxoft combines a unique blend of engineering excellence and deep industry expertise, specializing in automotive, financial services, travel and hospitality, healthcare, life sciences, media and telecommunications.

Apply for this position