Site Reliability Engineer

Agileengine
12 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Intermediate

Job location

Tech stack

Amazon Web Services (AWS)
Confluence
JIRA
Bash
Software as a Service
Continuous Integration
DevOps
Github
Python
Reliability Engineering
Prometheus
CircleCI
Scripting (Bash/Python/Go/Ruby)
Grafana
GIT
Containerization
Kubernetes
Information Technology
Terraform
Docker
Pagerduty
Jenkins

Job description

We are looking for a SRE Operations Engineer to maintain reliability across a cloud-based SaaS platform. You'll handle live incidents, improve observability, and reduce toil through automation using Kubernetes, Terraform, Grafana, and AWS. Hands-on, execution-focused, with real ownership across CI/CD pipelines, GitOps workflows, and on-call rotations., Monitor and support production and staging environments to ensure availability, performance, and stability;

  • Respond to incidents, perform triage and root cause analysis, and contribute to remediation efforts;
  • Participate in on-call rotations with defined SLAs;
  • Handle operational requests from internal teams;
  • Maintain and improve monitoring, alerting, dashboards, logs, and metrics;
  • Support CI/CD pipelines, production releases, and GitOps workflows;
  • Contribute to automation initiatives to reduce operational overhead;
  • Maintain and improve Kubernetes-based infrastructure and containerized workloads;
  • Support Infrastructure as Code practices and environment improvements.

Requirements

If you're looking for a place to grow, make an impact, and work with people who care, we'd love to meet you!, 2+ years of experience in Site Reliability Engineering, DevOps, or Production Operations ;

  • Experience with AWS supporting production environments;
  • Experience supporting production SaaS applications;
  • Strong understanding of CI/CD systems (GitHub Actions, Jenkins, CircleCI);
  • Experience with GitOps and Git fundamentals;
  • Experience using GitHub, Jira, and Confluence ;
  • Experience with Kubernetes (EKS, kOps or similar);
  • Experience with Docker and containerization ;
  • Experience with observability tools (Grafana, Prometheus, Loki, PagerDuty);
  • Proficiency in scripting ( Bash, Python, or Go );
  • Experience with Infrastructure as Code (Terraform, Helm);
  • Ability to work within structured operational processes and SLAs;
  • Strong written and verbal English communication skills;
  • Self-driven with a growth mindset.

NICE TO HAVES

  • AWS certifications such as Solutions Architect, DevOps Engineer, or SysOps Administrator;
  • Experience with multi-tenant SaaS environments;
  • Experience working in globally distributed teams;
  • Familiarity with ChatOps practices;
  • Experience improving monitoring quality and reducing alert fatigue.

Benefits & conditions

Competitive compensation: USD-based pay with education, fitness, and team activity budgets.

  • Exciting projects: Modern solutions with Fortune 500 and top product companies.
  • Flextime: Flexible schedule with remote and office options.

Meet Our Recruitment Process It includes main stages: ApplicationCoding ChallengeVideo Interview*Technical Interview or Interview with the Hiring Manager(s). Each step helps us understand your skills and overall fit. If it's a match, you'll receive an offer.

False

About the company

AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us multiple Best Place to Work awards.

Apply for this position