How Grab Built a Real-Time Metrics Platform for Marketplace Observability

In the ever-evolving landscape of digital platforms, few companies operate with the complexity and regional intensity of Grab. This Southeast Asian super app offers ride-hailing, deliveries, financial services, and more. To support hyper-localized operations across 300+ cities in 8 countries, Grab needed to reimagine how it handled real-time data and actionable insights. The result? A dual-system solution: MarketWatch for observability and MIDAS for scalable, real-time metrics.

This case study dives into how Grab addressed real-time observability at scale, why its architecture matters for the broader industry, and where it aligns with macro trends in theCUBE Research’s enterprise data analytics and AI observability research.

Real-Time Decision Support in a Fragmented Landscape

Grab’s core challenge centered around operational observability across a fragmented, high-volume environment. Grab’s Data Engineering Manager described that each of Grab’s regional marketplaces needed timely, contextual, and reliable metrics—from food delivery efficiency in Ho Chi Minh City to driver availability in Jakarta.

However, the metric sprawl and inconsistency from decentralized data sources created delays, manual overhead, and decision blind spots. Teams had to piece together insights manually or rely on inconsistent SQL queries—a familiar pain point echoed in many enterprises navigating digital operations.

In our 2024 “State of Observability and Real-Time Analytics” study, 72% of enterprises cited inconsistent metrics definitions across tools as a top barrier to operational intelligence. Grab’s initiative directly addresses this with an engineering-led, metadata-driven standardization model.

Grab’s Solution for A Two-Tier Architecture in Observability and Metrics

Grab tackled the issue by decoupling observability (MarketWatch) from metric serving and definition (MIDAS), achieving agility and consistency.

MarketWatch: Real-Time Portal for Actionable Insights

MarketWatch acts as the operational front end for Grab’s city teams. It helps users watch, detect, and diagnose anomalies in real time. Crucial to this portal are two key capabilities:

  • Automation: Minimizes manual triage by triggering adaptive responses to anomalies.
  • Explainability: Uses large language models (LLMs) and human-in-the-loop insights to explain what’s happening and why—a must for fast-paced teams.

MIDAS: Unified, API-Driven Metrics Platform

MIDAS is the powerhouse behind the scenes. It ingests real-time signals via Kafka, pre-processes them with Flink, and serves metrics using Apache Pinot. MIDAS ensures:

  • Consistent metric definitions across observability, experimentation, and reporting.
  • Declarative metric requests via API, eliminating the need to write complex SQL.
  • Flexible aggregation on the fly to support use cases from LLM-based analytics to performance diagnostics.

This decoupled, metadata-first approach aligns with the “Metrics Mesh” trend. Like a service mesh, it abstracts complexity and enforces governance, critical in real-time data environments. Our research finds that platforms with this model improve observability time-to-insight by up to 60%.

Scaling to Demand for 10M Metric Requests/Month at <1s Latency

In terms of scale, Grab’s metrics platform currently handles over 350,000 metric requests per day, with 95th percentile latency at ~1 second. This performance is notable given the platform’s flexibility, serving unique queries per request rather than pre-aggregated datasets.

MIDAS supports Pinot for on-demand aggregations, allowing different dimensions (e.g., by area or city) without duplicating pipelines.

Companies like LinkedIn and Uber have adopted similar real-time architectures using Pinot or Druid, but few apply it as systematically across such a wide range of operational roles. Grab’s focus on reusability—across LLMs, BI tools, and experiments—is a maturity marker.

From Observability to Intelligence

Grab is now pushing MIDAS beyond real-time observability into predictive and adaptive intelligence:

  • Metric Forecasting & RCA APIs: Moving these upstream into MIDAS enables broader reuse beyond MarketWatch, aligning with AIOps and self-healing systems.
  • Batch APIs for ML Training: Acknowledging the growing demand for time-series features in model development, Grab plans to support batch access to historical metrics.
  • Conversational Interfaces via LLMs: In line with broader industry trends, Grab is exploring LLM-native interfaces to democratize access to metrics.

This direction reflects a move from monitoring to metric intelligence. In our conversations with digital-native enterprises, there’s increasing convergence between metrics platforms and AI/ML platforms, especially for enabling real-time decision automation.

Grab Sets a Blueprint for Real-Time Metric Infrastructure

Grab’s observability stack reflects an advanced approach to real-time operational analytics, especially in multi-tenant, geographically dispersed platforms. Several key principles stand out:

  • Separation of concerns between metric definition (MIDAS) and consumption (MarketWatch).
  • Declarative metric access as a foundation for consistency and reuse.
  • Real-time, flexible aggregation that supports both interactive dashboards and programmatic access.
  • Explainability-first automation, signaling a shift from rule-based alerts to contextual insight.

For organizations building toward AI-powered operations, Grab’s strategy offers a repeatable architecture: combine a strong metrics backend with a consumable observability layer, and invest early in automation and explainability.

Final Thoughts – From Metrics to Market Intelligence

Real-time metrics aren’t just about dashboards—they’re about empowering on-the-ground teams to act faster, with context. Grab’s platform, MIDAS, is more than an observability engine. It’s a centralized truth for operational intelligence that scales with the pace of the business.

theCUBE Research will continue tracking this architecture pattern as it becomes foundational to the next generation of AI-native, digital-first operations.


Sources:

  • Grab Engineering
  • theCUBE Research 2024 “State of Observability and Real-Time Analytics”
  • Interviews with data engineering leads across digital platforms (2023–2025)

Author

  • Paul Nashawaty

    Paul Nashawaty, Practice Leader and Lead Principal Analyst, specializes in application modernization across build, release and operations. With a wealth of expertise in digital transformation initiatives spanning front-end and back-end systems, he also possesses comprehensive knowledge of the underlying infrastructure ecosystem crucial for supporting modernization endeavors. With over 25 years of experience, Paul has a proven track record in implementing effective go-to-market strategies, including the identification of new market channels, the growth and cultivation of partner ecosystems, and the successful execution of strategic plans resulting in positive business outcomes for his clients.

    View all posts