Back

How Stanza Living is driving AI adoption in production environments

"DrDroid fundamentally changed how we look at production operations. What earlier took hours of context-switching across dashboards, logs, and code is now a single prompt away."

— Prateek, Head of Technology

Stanza Living is India's largest tech-enabled managed accommodation platform with 50,000+ beds across 24+ cities, and a lean team of 20 engineers managing 100+ services. Engineers were spending unplanned hours in production operations, eroding productivity.

01Problem Context

With a lean team handling 100+ services, every hour lost to production noise was an hour not spent building. Three distinct problems were compounding.

Alert fatigue

Alerts from their APM were noisy, leading to fatigue and continuous disruptions for engineers in focus mode.

Cross-functional overhead

Engineers without specialisation in infrastructure or security were spending hours every week on cost or security-related work.

Tech debt accumulation

Long-term tech debt and optimisations were increasingly becoming urgent as they scaled, but there was no time to address them proactively.

02The Vision

The Stanza team wanted to embed AI in operations as a culture across the engineering team. Leadership wanted engineers to leverage agentic AI to reduce time spent on false positive alerts and drive proactive cost, performance, and security improvements.

03Trying DrDroid

They initially explored an MCP server for Signoz, which is where they discovered DrDroid. Their evaluation criteria:

Integration requirements

Signoz (self-managed), Grafana (self-managed), AWS, k8s, Jenkins, GitHub, Slack.

Complex alert handling

They needed application alert handling with trace analysis and log correlation, not just simple threshold alerts.

Slack-native with human-in-the-loop

A Slack-native agent that supports human involvement at critical decision points.

04 What Stanza Team Achieved

Independent junior engineers

Junior engineers can independently perform end-to-end alert triage, correlating traces, logs, metrics, and relevant code paths without escalating.

Proactive investigations

Engineers and leaders can run prompt-driven cost and security investigations and receive actionable optimisation recommendations.

Accelerated performance analysis

Senior engineers accelerate deep performance analysis by using AI to surface bottlenecks across traces and logs.

Real cost savings

DrDroid helped achieve 20-30% infrastructure cost reduction. Not by claiming a number upfront, but by training the agent on their cost parameters and delivering actionable, detailed reports.

Going forward
  • Use DrDroid to investigate self-managed infrastructure components wherever needed.
  • Build a culture of blameless learning with weekly retrospectives and AI-summarised insights.
  • Use AI to improve observability posture and move to 100% proactive incident detection and resolution.
WHAT THE TEAM SAYS

"DrDroid fundamentally changed how we look at production operations. What earlier took hours of context-switching across dashboards, logs, and code is now a single prompt away. More importantly, it helped us shift from reactive firefighting to proactive optimisation across reliability, cost, and security."

Prateek Prateek · Head of Technology, Stanza Living

"DrDroid excels at debugging our microservices architecture. It provides proper logs and traces of upstream and downstream services, making it easy to understand complex interactions across our 100+ services. It helped us in escalating and creating issues for our different production alerts while escalating to L1 and L2, and helps us create quick PRs for logging."

Anmol Chhabra Anmol Chhabra · Senior Software Engineer, Stanza Living