Site ‌Reliability‌ ‌Engineer‌ 3 ‌(SRE3) ‌



Mid-Level (4 to 6 years)
Posted on Apr 14 2022

About the Job


Job Title: Site ‌Reliability‌ ‌Engineer‌ 3 ‌(SRE3) ‌

Experience Required: 4 to 6 years of experience

Location: Bangalore


Site Reliability Engineer at Flipkart are developers‌ ‌with‌ excellent‌ ‌operations‌ ‌mindset.‌ As‌ ‌a‌ Site Reliability Engineer, you will be building solutions to scale ‌our‌ ‌platforms‌ ‌and‌ applications‌ reliably for ‌high ‌availability ‌and‌ make sure ‌Service‌ ‌Level‌ objectives (SLO) are‌ ‌met.‌ ‌You will own ‌all‌ ‌the‌ ‌SLOs‌ ‌of‌ ‌various Flipkart services across tiers.‌ ‌You will work‌ ‌directly‌ ‌with our Software‌ ‌Development teams to reduce the toil of developing, deploying and maintaining our software,by adopting engineered solutions and reliability engineering ‌best‌ ‌practices‌.‌ ‌You will be responsible for solving ‌ greenfield ‌ problems in ‌ reliability engineering and benchmarking, at ‌scale.‌ ‌ ‌‌


  • Help our engineers adopt Flipkart Reliability Engineering playbook by abstracting context and complexities of a hybrid cloud.
  • Build, coach and mentor teams of Site Reliability Engineers
  • Cover‌ ‌availability,‌ ‌reliability,‌ ‌security‌ ‌etc.‌ ‌considerations‌ ‌being‌ ‌imbibed‌ ‌and‌ ‌reviewed‌ ‌and‌ ‌adhered‌ ‌to‌ ‌at‌ ‌every‌ ‌stage‌ ‌of‌ ‌product‌ ‌development.‌
  • Monitor‌ ‌and‌ ‌resolve‌ ‌issues‌ ‌in‌ ‌all‌ ‌environments.‌ ‌Ensure‌ ‌SLO‌s ‌ ‌are‌ ‌met.‌ ‌Alert‌ ‌appropriately,‌ ‌build‌ ‌self-healing‌ ‌capabilities‌ ‌in‌ ‌the‌ ‌platforms,‌ ‌involve‌ ‌people‌ ‌when‌ ‌needed,‌ ‌and‌ ‌log‌ ‌tickets.‌ ‌Participate‌ ‌in‌ ‌a‌ ‌24x7‌ ‌on-call‌ ‌rotation.‌ ‌
  • Run periodic resilience ( chaos) experiments and continuously verify the state of reliability
  • Build‌ ‌and‌ ‌improve‌ ‌configuration‌ ‌and‌ ‌automation‌ ‌tools‌ ‌to‌ ‌remove‌ ‌toil ‌in‌ developing,‌ deploying and maintaining ‌software
  • Own‌ ‌the‌ ‌RCA‌ ‌lifecycle‌ ‌for‌ ‌the‌ ‌platform‌ ‌issues,‌ ‌be‌ ‌answerable‌ ‌to‌ ‌the‌ ‌stakeholders‌ ‌(internals‌ ‌and‌ ‌external)‌ ‌on‌ ‌most‌ ‌of‌ ‌the‌ ‌service‌ ‌internals.‌ ‌
  • Have‌ ‌a‌ ‌viewpoint‌ ‌on‌ ‌the‌ ‌distributed‌ ‌systems’‌ ‌performance,‌ ‌and‌ ‌should‌ ‌be‌ ‌able‌ ‌to‌ ‌drive‌ ‌the‌ ‌capacity‌ ‌plans‌ ‌and‌ ‌scale‌ ‌requirements.‌ ‌
  • Identifying‌ ‌bottlenecks‌ ‌and‌ ‌tuning‌ ‌areas‌ ‌as‌ ‌long‌ ‌as‌ ‌major‌ ‌code‌ ‌changes‌ ‌are‌ ‌not‌ ‌necessary.‌ ‌e.g.‌ ‌If‌ ‌working‌ ‌on‌ ‌a‌ ‌hive‌ ‌benchmark,‌ ‌and‌ ‌MySQL‌ ‌connection‌ ‌pool‌ ‌is‌ ‌not‌ ‌externally‌ ‌configurable‌ ‌and‌ ‌expansion‌ ‌policy‌ ‌is‌ ‌becoming‌ ‌a‌ ‌problem,‌ ‌you‌ ‌should‌ ‌be‌ ‌able‌ ‌to‌ ‌make‌ ‌code‌ ‌changes,‌ ‌build‌ ‌it‌ ‌and‌ ‌expose‌ ‌config‌ ‌and‌ ‌continue‌ ‌benchmark.‌ ‌
  • Partner‌ ‌the‌ ‌developer‌ ‌and‌ ‌devops‌ ‌teams‌ ‌in‌ ‌on-call‌ ‌load‌ ‌sharing,‌ ‌handle‌ ‌24/7‌ ‌platform‌ ‌support.‌ ‌


  • BTech or Mtech in CS or‌ ‌equivalent with 5+‌‌ ‌‌years‌ ‌working‌ ‌w/‌ ‌highly‌ ‌available‌ ‌platforms‌ ‌in‌ ‌web-scale‌ ‌organizations.‌ Demonstrated‌ ‌experience‌ ‌of‌ ‌around‌ ‌1-2‌ ‌years‌ ‌as‌ ‌a‌ ‌developer‌ ‌is‌ ‌good‌ ‌to‌ ‌have.‌
  • Good‌ ‌troubleshooting‌ ‌skills‌ ‌of‌ ‌always‌ ‌available‌ ‌and‌ ‌high‌ ‌scale‌ ‌systems.‌ ‌
  • Should‌ ‌have‌ ‌the‌ ‌ability‌ ‌to‌ ‌effectively‌ ‌collect‌ ‌all‌ ‌the‌ ‌relevant‌ ‌data-points‌ ‌and‌ ‌debugging‌ ‌artefacts/snapshots‌ ‌so‌ ‌that‌ ‌the‌ ‌debugging‌ ‌at‌ ‌a‌ ‌later‌ ‌stage‌ ‌can‌ ‌be‌ ‌as‌ ‌effective‌ ‌as‌ ‌possible.‌ ‌ ‌
  • Expert‌ ‌level‌ ‌knowledge‌ ‌of‌ ‌at‌ ‌least‌ ‌one‌ ‌configuration‌ ‌management‌ ‌system‌ ‌(Ansible,‌ ‌Puppet,‌ ‌etc.).‌ ‌
  • Understanding‌ ‌of‌ ‌standard‌ ‌networking‌ ‌basics‌ ‌such‌ ‌as:‌ ‌HTTP,‌ ‌DNS,‌ ‌TCP/IP,‌ ‌ICMP,‌ ‌the‌ ‌OSI‌ ‌Model,‌ ‌Subnetting‌ ‌and‌ ‌Load‌ ‌Balancing,‌ ‌DB‌ ‌sharding,‌ ‌partitions‌ ‌etc..‌ ‌
  • Excellent‌ ‌written‌ ‌and‌ ‌verbal‌ ‌communication‌ ‌skills.‌ ‌
  • Understand‌ ‌CI/CD‌ ‌and‌ ‌ability‌ ‌to‌ ‌architect‌ ‌the‌ ‌workflow‌ ‌or‌ ‌a‌ ‌deployment‌ ‌plan.‌ ‌
  • Write‌ ‌software‌ ‌to‌ ‌automate‌ ‌API-driven‌ ‌tasks‌ ‌at‌ ‌scale;‌ ‌using‌ ‌Python,‌ ‌Go‌ ‌etc.,‌ ‌develop‌ ‌application‌ ‌components‌ ‌wherever‌ ‌required‌ ‌using‌ ‌Scala,‌ ‌Python,‌ ‌C++‌ ‌and‌ ‌Java

About the company

Anzy Global is a leading HR Consultancy for IT and IT-related industry. Our solutions are powered by a deep expertise and an in depth understanding of technology. Our constant endeavour and commitment is to provide the best talent available in the industry to our clients. By the virtue of our strong network, not only do we put the best candidates across but also provide insight on their past pe ...Show More


Human Resources Services

Company Size

51-200 Employees




Made with heart image from India for the World

Expertia AI Technologies Pvt. Ltd, Sector 1, HSR Layout,
Bangalore 560101

© 2024 Expertia AI. Copyright and rights reserved

© 2024 Expertia AI. Copyright and rights reserved