ROLE: Site Reliability Engineer II
EXP: 5-8 years
TECH STACK:
Containerisation & Orchestration : Docker, Kubernetes, Rancher, EKS, ECS, GKE, Elastic Beanstalk, Google App Engine
Cloud Platform : AWS, GCP
IaaC : Terraform, AWS-CloudFormation / GCP-CloudDeploymentManager, Ansible
Infra Monitoring : Prometheus, Datadog, Alert Manager, Thanos, AWS Cloudwatch
CI/CD : GITLAB CI-CD, Jenkins
Scripting : Python, Golang
VCS : GITLAB, Perforce, Subversion
OS : UBUNTU, CENTOS, Amazon LINUX, Redhat Linux
Nice to Have : Experience with supporting systems orchestrated on AWS OpsWorks
RESPONSIBILITIES:
• Implement, Own, maintain, monitor & support the backend servers & micro-services infrastructure for the studio titles which runs on wide-variety of tech stack
• Implement/maintain various automation tools for development, testing, operations and IT infrastructure
• Be available for on-call duty during production outages in 24/7 PAGERDUTY support
• Work very closely with all the disciplines/stakeholders and keep them communicated on all impacted aspects
• Defining and setting development, test, release, update, and support processes for the SRE operations
• Excellent troubleshooting skills in areas of systems Infrastructure engineering
• Monitoring the processes during the entire lifecycle for its adherence and updating or creating new processes for improvement and minimising the workflow times
• Encouraging and building automated processes wherever possible
• Identifying and deploying cybersecurity measures by continuously performing vulnerability assessment and risk management
• Incidence management and root cause analysis.
Jobcode: Reference SBJ-g3e45x-64-137-71-234-42 in your application.