Back

How WorkIndia moved closer to their vision of Zero-Touch operations

"One time I was woken up at 3am by a pager that escalated. I instantly asked DrDroid to investigate it and in a few minutes, I was able to close the issue directly from Slack."

— Moiz Arsiwala, CTO

WorkIndia is one of India's largest job marketplaces with 28M+ active users. With a large expanse of infrastructure and applications, incidents can impact their customers adversely. Their on-call process needed to scale without scaling headcount.

01Problem Context

WorkIndia had set up on-call processes and alerting to handle issues, but multiple challenges were slowing their team down.

Manual investigation overhead

Frequent alerts required 15-20 minutes of manual investigation each, jumping across k8s, ElasticAPM, Grafana dashboards, Loki logs, and code.

Escalation bottleneck

Given their tool sprawl and context expanse, escalation during on-call was frequent and often blocked identifying and fixing issues.

Engineers pulled off-rotation

Engineers who were not on-call were frequently involved in production issues, breaking focus and disrupting feature work.

Knowledge gaps

On-call engineers would get stuck without deep know-how of a specific component (e.g. k8s) or without understanding correlation across the full stack.

02The Vision

WorkIndia's CTO and tech team were working towards Zero Touch Production. They were hands-on with AI, actively using and building agents in their product, and wanted an agentic solution for on-call that would reduce the burden on engineers to investigate and debug production issues.

03Trying DrDroid

One of their engineers came across DrDroid and after checking the demo, decided to try it. Their evaluation criteria:

Relevant integrations

ElasticAPM, Grafana, k8s, PagerDuty, Loki, Jenkins, GitHub, Jira.

Slack-first workflows

Everything needed to work through Slack, where their on-call lived.

VPC integration support

Their infrastructure runs behind a VPC, so self-managed integration was a hard requirement.

Access management and security

Well-defined RBAC and audit capabilities.

04 What WorkIndia Achieved

Using DrDroid, the WorkIndia team can now:

Junior engineers own on-call end-to-end

New and junior engineers can investigate any production alert in minutes without escalations. They have the context DrDroid surfaces.

Automated runbook execution

Automatically take action and auto-resolve domain-specific alerts using prompt-based runbooks.

Continuous retrospectives

Manage daily on-call retrospectives to improve alert actionability via DrDroid.

Going forward
  • Further improve their autonomous detection stack to catch failures in deployment pipelines before alerts fire.
  • Further enhance operational efficiency by automating actions on more alert classes.
WHAT THE TEAM SAYS

"One time I was woken up at 3am by a pager that escalated. I instantly asked DrDroid to investigate it and in a few minutes, I was able to close the issue directly from Slack."

Moiz Arsiwala Moiz Arsiwala · CTO, WorkIndia

"DrDroid works amazingly for initial investigation. It gives exact alerting traces that help me understand what's happening quickly. With the time I save on debugging, I can actually focus on implementing long-term fixes instead of just firefighting all the time."

M Mayur Shinde · Software Engineer, WorkIndia