Anzy Global

Site ‌Reliability‌ ‌Engineer‌ 3 ‌(SRE3) ‌

Site ‌Reliability‌ ‌Engineer‌ 3 ‌(SRE3) ‌
3
Applications

Anzy Global

Bangalore

Full-Time

Mid-Level: 4 to 6 years

Posted on Apr 14 2022

About the Job

Skills

Job Title: Site ‌Reliability‌ ‌Engineer‌ 3 ‌(SRE3) ‌

Experience Required: 4 to 6 years of experience

Location: Bangalore

Role:

Site Reliability Engineer at Flipkart are developers‌ ‌with‌ excellent‌ ‌operations‌ ‌mindset.‌ As‌ ‌a‌ Site Reliability Engineer, you will be building solutions to scale ‌our‌ ‌platforms‌ ‌and‌ applications‌ reliably for ‌high ‌availability ‌and‌ make sure ‌Service‌ ‌Level‌ objectives (SLO) are‌ ‌met.‌ ‌You will own ‌all‌ ‌the‌ ‌SLOs‌ ‌of‌ ‌various Flipkart services across tiers.‌ ‌You will work‌ ‌directly‌ ‌with our Software‌ ‌Development teams to reduce the toil of developing, deploying and maintaining our software,by adopting engineered solutions and reliability engineering ‌best‌ ‌practices‌.‌ ‌You will be responsible for solving ‌ greenfield ‌ problems in ‌ reliability engineering and benchmarking, at ‌scale.‌ ‌ ‌‌

Responsibilities:

Help our engineers adopt Flipkart Reliability Engineering playbook by abstracting context and complexities of a hybrid cloud.
Build, coach and mentor teams of Site Reliability Engineers
Cover‌ ‌availability,‌ ‌reliability,‌ ‌security‌ ‌etc.‌ ‌considerations‌ ‌being‌ ‌imbibed‌ ‌and‌ ‌reviewed‌ ‌and‌ ‌adhered‌ ‌to‌ ‌at‌ ‌every‌ ‌stage‌ ‌of‌ ‌product‌ ‌development.‌
Monitor‌ ‌and‌ ‌resolve‌ ‌issues‌ ‌in‌ ‌all‌ ‌environments.‌ ‌Ensure‌ ‌SLO‌s ‌ ‌are‌ ‌met.‌ ‌Alert‌ ‌appropriately,‌ ‌build‌ ‌self-healing‌ ‌capabilities‌ ‌in‌ ‌the‌ ‌platforms,‌ ‌involve‌ ‌people‌ ‌when‌ ‌needed,‌ ‌and‌ ‌log‌ ‌tickets.‌ ‌Participate‌ ‌in‌ ‌a‌ ‌24x7‌ ‌on-call‌ ‌rotation.‌ ‌
Run periodic resilience ( chaos) experiments and continuously verify the state of reliability
Build‌ ‌and‌ ‌improve‌ ‌configuration‌ ‌and‌ ‌automation‌ ‌tools‌ ‌to‌ ‌remove‌ ‌toil ‌in‌ developing,‌ deploying and maintaining ‌software
Own‌ ‌the‌ ‌RCA‌ ‌lifecycle‌ ‌for‌ ‌the‌ ‌platform‌ ‌issues,‌ ‌be‌ ‌answerable‌ ‌to‌ ‌the‌ ‌stakeholders‌ ‌(internals‌ ‌and‌ ‌external)‌ ‌on‌ ‌most‌ ‌of‌ ‌the‌ ‌service‌ ‌internals.‌ ‌
Have‌ ‌a‌ ‌viewpoint‌ ‌on‌ ‌the‌ ‌distributed‌ ‌systems’‌ ‌performance,‌ ‌and‌ ‌should‌ ‌be‌ ‌able‌ ‌to‌ ‌drive‌ ‌the‌ ‌capacity‌ ‌plans‌ ‌and‌ ‌scale‌ ‌requirements.‌ ‌
Identifying‌ ‌bottlenecks‌ ‌and‌ ‌tuning‌ ‌areas‌ ‌as‌ ‌long‌ ‌as‌ ‌major‌ ‌code‌ ‌changes‌ ‌are‌ ‌not‌ ‌necessary.‌ ‌e.g.‌ ‌If‌ ‌working‌ ‌on‌ ‌a‌ ‌hive‌ ‌benchmark,‌ ‌and‌ ‌MySQL‌ ‌connection‌ ‌pool‌ ‌is‌ ‌not‌ ‌externally‌ ‌configurable‌ ‌and‌ ‌expansion‌ ‌policy‌ ‌is‌ ‌becoming‌ ‌a‌ ‌problem,‌ ‌you‌ ‌should‌ ‌be‌ ‌able‌ ‌to‌ ‌make‌ ‌code‌ ‌changes,‌ ‌build‌ ‌it‌ ‌and‌ ‌expose‌ ‌config‌ ‌and‌ ‌continue‌ ‌benchmark.‌ ‌
Partner‌ ‌the‌ ‌developer‌ ‌and‌ ‌devops‌ ‌teams‌ ‌in‌ ‌on-call‌ ‌load‌ ‌sharing,‌ ‌handle‌ ‌24/7‌ ‌platform‌ ‌support.‌ ‌

‌

Qualification:

BTech or Mtech in CS or‌ ‌equivalent with 5+‌‌ ‌‌years‌ ‌working‌ ‌w/‌ ‌highly‌ ‌available‌ ‌platforms‌ ‌in‌ ‌web-scale‌ ‌organizations.‌ Demonstrated‌ ‌experience‌ ‌of‌ ‌around‌ ‌1-2‌ ‌years‌ ‌as‌ ‌a‌ ‌developer‌ ‌is‌ ‌good‌ ‌to‌ ‌have.‌
Good‌ ‌troubleshooting‌ ‌skills‌ ‌of‌ ‌always‌ ‌available‌ ‌and‌ ‌high‌ ‌scale‌ ‌systems.‌ ‌
Should‌ ‌have‌ ‌the‌ ‌ability‌ ‌to‌ ‌effectively‌ ‌collect‌ ‌all‌ ‌the‌ ‌relevant‌ ‌data-points‌ ‌and‌ ‌debugging‌ ‌artefacts/snapshots‌ ‌so‌ ‌that‌ ‌the‌ ‌debugging‌ ‌at‌ ‌a‌ ‌later‌ ‌stage‌ ‌can‌ ‌be‌ ‌as‌ ‌effective‌ ‌as‌ ‌possible.‌ ‌ ‌
Expert‌ ‌level‌ ‌knowledge‌ ‌of‌ ‌at‌ ‌least‌ ‌one‌ ‌configuration‌ ‌management‌ ‌system‌ ‌(Ansible,‌ ‌Puppet,‌ ‌etc.).‌ ‌
Understanding‌ ‌of‌ ‌standard‌ ‌networking‌ ‌basics‌ ‌such‌ ‌as:‌ ‌HTTP,‌ ‌DNS,‌ ‌TCP/IP,‌ ‌ICMP,‌ ‌the‌ ‌OSI‌ ‌Model,‌ ‌Subnetting‌ ‌and‌ ‌Load‌ ‌Balancing,‌ ‌DB‌ ‌sharding,‌ ‌partitions‌ ‌etc..‌ ‌
Excellent‌ ‌written‌ ‌and‌ ‌verbal‌ ‌communication‌ ‌skills.‌ ‌
Understand‌ ‌CI/CD‌ ‌and‌ ‌ability‌ ‌to‌ ‌architect‌ ‌the‌ ‌workflow‌ ‌or‌ ‌a‌ ‌deployment‌ ‌plan.‌ ‌
Write‌ ‌software‌ ‌to‌ ‌automate‌ ‌API-driven‌ ‌tasks‌ ‌at‌ ‌scale;‌ ‌using‌ ‌Python,‌ ‌Go‌ ‌etc.,‌ ‌develop‌ ‌application‌ ‌components‌ ‌wherever‌ ‌required‌ ‌using‌ ‌Scala,‌ ‌Python,‌ ‌C++‌ ‌and‌ ‌Java

About the company

Anzy Global

Anzy Global is a leading HR Consultancy for IT and IT-related industry. Our solutions are powered by a deep expertise and an in depth understanding of technology. Our constant endeavour and commitment is to provide the best talent available in the industry to our clients. By the virtue of our strong network, not only do we put the best candidates across but also provide insight on their past pe ...Show More

Industry

Human Resources Services

Company Size

51-200 Employees

Headquarter

Bangalore

Site ‌Reliability‌ ‌Engineer‌ 3 ‌(SRE3) ‌

Site ‌Reliability‌ ‌Engineer‌ 3 ‌(SRE3) ‌3 Applications

Anzy Global

About the Job

Skills

Anzy Global

Site ‌Reliability‌ ‌Engineer‌ 3 ‌(SRE3) ‌
3
Applications