Pro tip 👉 Startups don't care about your resume. Send a pitch instead.

Site Reliability Product Engineer at Esri (Redlands, CA)

Esri

Full Time

Site reliability engineers sit at the crossroads of operations and software development. In this role, you’ll design, build, and operate the cloud-based backend for various applications which are used by millions of users around the world.


You'll be directly supporting Esri's software developers by designing and implementing processes, tools, automations, and workflows that constantly strive to make software run reliably, in a predictable manner every time.


Responsibilities



  • Design, document, implement, and maintain the operational components for various applications

  • Manage infrastructure, respond to alerts, and troubleshoot problems to resolution

  • Configure and deploy containerized micro-service components

  • Oversee continuous integration builds

  • Work closely with development teams to improve workflows around build, test, and deployment of applications

  • Build and maintain monitoring, alerting, and trending operational tools within a cloud environment

  • Troubleshoot the system and provide root cause analysis


Requirements



  • Experience with one or more of the following languages: Python, Java, JavaScript

  • Knowledge of AWS, specifically VPC, EC2, ECS, CloudFront, Lambda, Cloudformation, Cloudwatch, RDS, ELB, AutoScaling, and WAF

  • Familiarity with provisioning cloud infrastructure using APIs

  • Good understanding of Linux and shell scripts

  • Experience with deploying containers (Docker/Anisble/Kubernetes)

  • Understanding of build/automation systems such as Jenkins

  • Strong problem-solving and debugging skills

  • Demonstrable programming or scripting experience

  • Ability to document processes, create post-incident reports, stay calm, and support operation escalation issues

  • Bachelor’s or master’s in computer science, information systems, mathematics, GIS, or a related field


Recommended Qualifications



  • 1+ years in site reliability engineering (SRE) or DevOps

  • Familiarity with monitoring tools such as Prometheus and Grafana and cloud provisioning tools such as Terraform and CloudFormation

  • Experience with CI/CD pipeline tool such as Jenkins

  • Knowledge of Git

  • Previous experience supporting mission-critical distributed systems

  • AWS certifications is a big plus


Questions about our interview process? We have answers.