Site Reliability Engineer (SRE) - Cleveland Job in Cleveland

773 IT & Software Developer jobs in the US

Company Size

<50

Company Type

Services

Exp Level

Senior

Job Type

Full-Time

Language

English

Visa sponsorship

Requirements

Must:

- 3-5 years of relevant experience in site reliability, infrastructure, or DevOps engineering - Strong expertise in monitoring and observability tools such as Dynatrace, Grafana, Prometheus, Splunk, or similar - Experience with incident management and event correlation platforms, including BigPanda, ServiceNow, and Moogsoft - Proficiency with Linux/Unix systems (RHEL) and Windows Server environments - Hands-on experience with cloud platforms like AWS, Azure, or OpenShift - Strong understanding of containerization and orchestration technologies, including Kubernetes, Docker, and OpenShift - Experience in chaos engineering and fault injection frameworks such as Litmus, Gremlin, AWS FIS, and Azure Chaos Studio - Solid knowledge of networking, database systems (Oracle, SQL), and distributed architectures - Familiarity with event streaming platforms like Kafka and service mesh technologies, including Istio - Awareness of mainframe systems and legacy infrastructure - Knowledge of infrastructure as code and automation tools - Understanding of job scheduling systems like CA7 and middleware technologies - Proficiency in using Jira, Confluence, and ITSM tools - Previous experience in financial services or other highly regulated industries is preferred - Relevant certifications, such as AWS/Azure architecture, RHCE, VCP, and Kubernetes (CKA/CKAD), are valued - Strong analytical thinking, problem-solving abilities, and troubleshooting skills - Excellent written and verbal communication skills for cross-functional collaboration

Technologies

AWS

Lambda

Azure

Cloud

Confluence

Datadog

Dynatrace

Istio

ITSM

Responsibilities

- Coordinate responses to critical incidents with application support teams and the Site Reliability Center - Triage and respond to alerts generated through the BigPanda event correlation platform - Assess cross-domain impacts and engage appropriate support teams or escalate issues as necessary - Participate in on-call rotations to ensure 24/7 coverage for critical systems - Conduct blameless post-mortems and root cause analyses to foster continuous improvement - Design and implement automated monitoring and alerting systems using tools like Dynatrace, Grafana, and others - Develop robust dashboards and implement SLAs/SLOs through comprehensive monitoring practices - Analyze metrics from operating systems and applications for performance tuning and fault detection - Develop and implement chaos engineering practices using tools such as Litmus and Gremlin - Design fault injection experiments to validate system resilience using AWS Resilience Hub - Build self-healing capabilities and automated remediation workflows - Implement health checks and autoscaling solutions utilizing AWS Lambda, Kubernetes, OpenShift, and Istio - Manage infrastructure across mainframe systems, Windows, RHEL, and various cloud platforms - Work with containerized environments, event streaming platforms, and various database systems - Maintain virtualization infrastructure and storage systems - Leverage tools like ServiceNow for incident management and Jira for issue tracking - Identify opportunities to improve application stability while advocating for SRE best practices - Maintain comprehensive knowledge bases and runbooks in Confluence - Mentor junior team members on resiliency patterns and operational excellence

Description

At Ellofant, we are a modern consulting firm dedicated to transforming how businesses navigate change and complexity through strategic thinking and advanced technology. Our team is looking for an experienced Site Reliability Engineer to enhance our infrastructure resiliency efforts. This role offers a competitive compensation package and benefits, including medical and dental coverage, retirement savings plans, paid time off, and professional development support. Located in Cleveland, a city rich with cultural institutions and a burgeoning tech scene, youll find an affordable living environment that allows you to thrive both personally and professionally. If youre interested in solving real problems and making meaningful changes, we invite you to consider joining our team.

Something wrong or incorrect with this job? Tell us in the chat 💬 on the right ➡️

IT & Software developer jobs in the USDevOps Engineer jobs in the USDevOps Engineer jobs Cleveland, OH

You can find DevOps salaries in the United States here.

How many DevOps jobs are in the United States?

Currently, there are 773 DevOps openings. Check also: Cloud jobs, AWS jobs, Azure jobs, GCP jobs, Kubernetes jobs, Docker jobs, Terraform jobs - all with salary brackets.

Is the US a good place for DevOps?

The US is one of the best countries to work as a DevOps. It has a vibrant startup community, growing tech hubs and, most important: lots of interesting jobs for people who work in tech.

Which companies are hiring for DevOps jobs in the United States?

D3 Security Management Systems, Nurse Next Door, Snaplii, LYNKED Inc., Clarence Farm Services Ltd., DataAnnotation, Studio 3 Marketing among others, are currently hiring for DevOps roles in the United States.

The company with most openings is Peraton as they are hiring for 43 different DevOps jobs in the United States. They are probably quite committed to find good DevOps.