Passer au contenu principal

Site Reliability Engineer

A7405849

The Site Reliability Engineer is accountable for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their service(s). The role is accountable to build highly scalable, secure and compliant environments for apps and platforms, whilst keeping run costs optimised.

The SRE ensures that internal and external services exceed reliability and performance expectations, measured as SLIs, SLOs and SLAs determined through independent monitoring. The role provides app and platform support and works with software engineers to address relevant incidents and problem tickets. 

The role focuses on system support and incident management and on development activities supporting the build of new features, with automation (almost a 50-50 split in terms of time). A key objective of the role is to reduce manual toil.

What You’ll Do

  • Responsible for overall system operations - reliability, availability, performance, disaster recovery, monitoring, incident response and cost optimisation.

  • Enable segregation of duties by acting as stewards of production systems.

  • Work with relevant tools to build and operate tribe assets at scale.

  • Regularly conduct capacity planning and execute to keep systems operational with organic growth.

  • Build the automation scripts and CI/CD pipelines.

  • Deploy and roll out global platforms by working closely with SREs from other alliances.

  • Run system performance, stress and load tests and maintain performance baseline.

  • Conduct and participate in blameless postmortems and provide effective communications to customers

  • Respond to application incidents, restore services post failure and proactively address system issues and concerns. Learn from failures.

  • Improve application performance by tuning and optimising compute, networks and data storage technology. Influence architecture and design to attain optimum performance.

  • Run Disaster Recovery and failure (chaos) testing exercises to assure system recoverability and resilience.

  • Drive Mean Time to Restore (MTTR) down and MTBSI (Mean Time Between Service Interruptions) up as a measurement of improvement 

  • Keep focus on reducing planned system outages

  • Set system metrics to be monitored based on business requirements. 

  • Measure and constantly report on error budgets.

  • Build strong relationships and integrate within DevOps team and the DevOps culture

  • Passionately lead the need for automation, with a desire to eliminate toil, whenever possible 

  • Ensure security and segregation of non-production and production environments

  • Implement segregation of duties capabilities

  • Lead by example and live by the values of Equifax 

  • Deliver on any other responsibility given by the manager

Must Haves

  • BS degree in Technology or related technical field involving coding (e.g., physics or mathematics), or equivalent job experience required

  • 5+ years experience with GKE/EKS alongwith Istio/Anthos Service Mesh

  • 5+ years of experience developing and/or administering software in public cloud preferably GCP or AWS

  • Experience in languages such as Python, Ruby, Bash, Java, Go, Perl, JavaScript and/or Node.js and frameworks like Ansible, Terraform, PowerShell

  • Experience with IaaS and PaaS technologies

  • Expertise designing, analyzing and supporting large-scale distributed systems

  • Experience managing infrastructure as code via tools such as Terraform or CloudFormation

  • Knowledge of Agile/Scrum methodologies

  • Demonstrable cross-functional knowledge with systems, storage, networking, security and databases 

Extra Points For Any Of The Following:

  • Proficiency in system administration across Linux/Windows systems

  • Proficiency in continuous integration and continuous delivery tooling and practices 

  • Strong analytical and troubleshooting skills

  • Passionate about working within DevOps culture, learning new technologies and automation

  • Takes pride in and ownership of system performance, reliability and optimization

We offer a hybrid work setting, comprehensive compensation and healthcare packages, attractive paid time off, and organizational growth potential through our online learning platform with guided career tracks.

Are you ready to power your possible?  Apply today, and get started on a path toward an exciting new career at Equifax, where you can make a difference!

Equifax is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.

Travailler chez Equifax 

Nous croyons en une mentalité de croissance. Chez Equifax, cela comprend offrir à nos employés des occasions de donner le meilleur d’eux-mêmes et d’acquérir de nouvelles compétences en cours de route pour inspirer et bâtir des carrières épanouissantes.

 

DSC 3122 Edit

Joignez-vous à notre communauté de talents

En savoir plus sur les possibilités de carrière et les événements à venir chez Equifax

S’inscrire