Site Reliability Engineer

Bare Metal Infrastructure
Manor Park, United Kingdom
3 days ago

Role details

Contract type
Temporary contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior
Compensation
£ 120K

Job location

Remote
Manor Park, United Kingdom

Tech stack

Cloud Computing
Openshift
Reliability Engineering
Virtualization Technology
Performance Monitor
Microservices

Job description

hackajob is collaborating with Rightmove to connect them with exceptional professionals for this role. Our vision is to give everyone the belief they can make their move. We aim to make moving simpler, by giving everyone the best place to turn to and return to for access to the tools, expertise, trust, and belief to make it happen. We're home to the UK's largest choice of properties and are the go-to destination for millions of people planning their next move, reading the latest industry news, or just browsing what's on the market.The Role: Platform Engineer (Developer Experience) Location: London / Hybrid (2 days per week in office) Reporting to: Platform Engineering Manager The Role The Platform and Reliability Engineering Teams are responsible for the services that underpin the Rightmove website and enable all of our product development teams to ship functionality rapidly and safely. We strive to deliver annual availability of at least 99.99% (less than 5 mins downtime a month). The Site Reliability Engineering Manager's role is to ensure operational excellence, drive observability and reliability at scale, and own the incident management processes and tools. This position blends people leadership, full stack reliability engineering, service management, influencing without authority. The successful candidate brings strong technical experience in reliability engineering, monitoring, alerting and observability for product-led technology companies combined with strong customer empathy and communication skills.Monitoring, Alerting, and ObservabilityProduct teams have high reliability confidence, incident detection and resolution are smooth due to proactive monitoring, well-maintained alerts/logs and high levels of observability coverage.Clear reliability expectations between platform, security, product & business. Prioritisation based on reliability risk and real data.Incident ManagementConsistency and standardisation of incident management resulting fast incident detection and resolutionMaintaining a culture of accountability, transparency, collaboration & learningGood data quality, insights & decision-making with strong feedback loops to all relevant stakeholdersReliability EngineeringClear reliability patterns and standards drive strong reliability and fewer cascading failures. E.g. probes, graceful termination/degradation, timeouts, retries, backoff, jitter, circuit breakers, bulkheads.Shared understanding how our system fails, where any weak points are with prioritised improvement plans in place.Delivery and ExecutionOwn and manage reliability roadmap and metrics, initiatives/projects, and OKR delivery in line with expectations.Align reliability strategy and delivery plans with business goals, partnering with technical product manager, DX, CF, DBA, security, product and data teams.People Leadership & Team DevelopmentSupports team with objectives and growth plans to improve skills, confidence, and impact aligned with business objectivesGuides engineers in designing scalable, secure, resilient platform servicesCreate an inclusive, psychologically safe environment; tailor leadership to individual strengths and motivations.What You'll BringProven experience in site reliability engineering management, overseeing observability, monitoring, reliability, and service delivery in production environments.Understanding of reliability in distributed software microservices and cloud-based environments.Experience implementing and running modern SRE tooling and incident management workflows, SRE service management frameworks e.g. SLO/SLIs.Familiarity with platform engineering concepts including developer platforms, reusable platform components, reducing friction for product teams.Experience improving operational processes and developing documented procedures (monitoring, DR, incident response, upgrade processes).Leadership, team management, collaboration and communication skills, aligned with expectations for managing technical/engineering

Requirements

Job Description Senior Site Reliability Engineer (AWS Python JS) - Lead Role - UK Location: London (Hybrid) Salary: £100k - £120k per year Do you have a software engineeringbackground with a passion for infrastructure, and are you ready to take the lead in building a...

Benefits & conditions

teams, What we offerCash plan for dental, optical and physio treatments.Private Medical Insurance, Pension and Life Insurance, Employee Assistance Plan.27 days holiday plus two (paid) volunteering days a year to give back, and holiday buy schemes.Hybrid working pattern with 2 days in the office.Contributory stakeholder pension.Life assurance at 4x your basic salary to a spouse, family member or other nominated person in your life.Competitive compensation package.Paid leave for maternity, paternity, adoption & fertility.Travel Loans, Bike to Work scheme, Rental Deposit Loan.Charitable contributions through Payroll Giving and donation matching.Access deals and discounts on things like travel, electronics, fashion, gym memberships, cinema discounts and more.As an Equal Opportunity Employer, Rightmove will never discriminate based on age, disability, sex, race, religion or belief, gender reassignment, marriage / civil partnership, pregnancy/maternity or sexual orientation.At Rightmove, we believe that a diverse and inclusive workforce leads to better innovation, productivity, and overall success. We are committed to creating a welcoming and inclusive environment for all employees, regardless of their background or identity, to develop and promote a diverse culture that reflects the communities we serve.By applying, you confirm that you are aged at least 18 or over and that you've read and understood our Privacy Policy, which explains how we handle and protect your personal information during the recruitment process.#J-18808-Ljbffr Similar jobs

About the company

Job Description We are At Synechron, we believe in the power of digital to transform businesses for the better. Our global consulting firm combines creativity and innovative technology to deliver industry-leading digital solutions. Synechron's progressive technologies and..., Job Title: Site Reliability Engineer (Bare Metal Infrastructure) Client: Most Elite FinTech Firm in London Salary: Up to £150k+ Bonus + Full Package Location: London (Hybrid) One of London's most elite fintech firms is hiring a Site Reliability Engineer to join a..., Halian | Managed Services, Recruitment Agency & Contract Staffing Our client is hiring experienced Senior Site Reliability Engineers in the UK or Germany to join a global engineering team supporting a high-availability, Java-based platform used by customers worldwide. This role is for true SREs - not DevOps engineers. If your passion is..., Overview Join to apply for the Principal Site Reliability Engineer role at Playson. Founded in 2012, Playson is a leading iGaming supplier recognized worldwide. We provide our customers with a high-end micro-service-based platform as a service that aims to process...

Apply for this position