Manager - Site Reliability Engineering
Role details
Job location
Tech stack
Job description
We are looking for a Manager - Site Reliability Engineering to strengthen the Production Management leadership team of Clearing Technology Service. This role demands a proactive and hands-on leader with deep technical expertise and strong critical thinking. Role summary: You will be responsible for ensuring stability, resilience, and performance of our production systems while driving continuous improvement and SRE best practices across the platform. What you'll be doing:
- Service Ownership
Assume end-to-end accountability for Clearing production environment, ensuring high availability, optimal performance, and robust resilience of business-critical systems.
- Incident Management & Crisis Leadership
Act as Incident Commander during major incidents, leading resolution efforts, managing stakeholder communications, and driving root cause analysis and remediation.
- Team Leadership & Talent Development
Build and mentor a high-performing SRE team. Promote a culture of accountability, continuous improvement, and blameless postmortems to enhance operational excellence.
- Operational Excellence & SLA Compliance
Ensure consistency to response and resolution SLAs. Oversee efficient ticket management and escalation processes through ServiceNow, removing blockers promptly.
- Stakeholder Engagement & Relationship Management
Develop strong partnerships across LCH and LSEG teams. Ensure timely delivery of business-critical activities and transparent communication of risks and challenges.
- Process Optimisation & Continuous Improvement
Monitor and analyse technical processes to identify improvement opportunities. Implement enhancements to minimise business disruption and improve operational efficiency.
- Risk Management & Compliance
Ensure compliance with regulatory standards and internal governance. Proactively identify and mitigate operational risks.
- Metrics & Observability
Establish and maintain robust observability practices, employing metrics, logging, and tracing to drive data-driven decisions and improve system health.
- Out of hours support / On-call support
- Be available for overnight support of production services to ensure successful completion of processing
- Respond to overnight calls and deal with issues
- Participate in Disaster Recovery exercises
What you'll bring
Requirements
Do you have experience in ServiceNow?, * Degree educated or equivalent work experience, * Number of years in Production Support / SRE roles with at least 3 years in a leadership capacity.
- Deep technical expertise in Oracle database - troubleshooting, scalability, performance tuning and optimization.
- Demonstrated experience implementing SRE frameworks - including SLOs, SLIs, incident management, and chaos engineering.
- Experience leading teams supporting systems deployed across mixed infrastructure (Cloud and On-Premise, AWS preferred)
- Solid understanding of change management, risk posture, and production readiness.
- Strong track record of delivering automation at scale, reducing toil, and eliminating manual operational tasks.
- Excellent communication and stakeholder management skills, particularly under pressure.
- Expertise in automation (Python, Shell, PowerShell etc.)
- Familiarity with observability tools and practices (metrics, logging, tracing).
- Ability to lead capacity planning and scalability strategies to support growth.
- Knowledge of clearing and settlement processes in financial markets.
- Familiarity with regulatory requirements and governance frameworks in financial services.
- Demonstrated ability to build, mentor, and retain high-performing SRE teams.
- Good communication and stakeholder management skills under pressure.
Person Specification
- Demonstrable experience managing SRE or Production Support teams in a critically important financial services environment
- Experience managing teams located across multiple locations and time zones.
- Excellent analytical skills, Attention to detail and problem-solving abilities.
- Solid technical background in the core technologies with several years of experience.
- Ability to communicate clearly and concisely to IT and business teams and to senior management
- Ability to break down complex technical issues into easy to digest format
- Familiarity with financial products and terminology.