Campaign Monitoring Tool: System-Level and Algorithmic Observability Beyond Business KPIs

When I joined the project to develop the Campaign Monitoring Tool (CMT), the core challenge was clear: provide deep observability into ad flight performance at multiple layers—algorithm, interaction, and bid—within an oRTB environment. While most stakeholders focused on business metrics (e.g., CPA, ROAS, delivery), there was limited visibility into the algorithms and models that actually drive those outcomes.

At the same time, analytics teams were hesitant to engage directly with bid-level telemetry due to its scale and complexity: tens of millions of bid requests per second at best, and up to billions of transactions per second at peak. Traditional dashboarding and offline analytics workflows were not designed for this level of throughput or granularity.

Our goal with CMT was to build a monitoring suite that:

Exposes system and algorithm-level behavior, not just aggregate business outcomes.
Allows different scientists and engineers to onboard their models and pacing algorithms with minimal friction.
Scales to large-volume bid logs while remaining usable with basic programming and visualization skills.

We chose a pragmatic path: define a set of core metrics and build a Minimum Viable Product (MVP) that favored iteration, extensibility, and collaboration over a perfect “big bang” solution.

Architecture and Design Principles

The Campaign Monitoring Tool was built around a few key design principles:

Self-service for scientists and engineers
The tool needed to be easily extendable by practitioners who understand the algorithms but may not be front-end engineers. This shaped our choices in both stack and abstractions.
Query efficiency at scale
Since the input is high-cardinality bid-level data, we had to be extremely careful about how we query and aggregate logs (think: predicate pushdown, partition pruning, and minimal data movement).
Tight loop between metrics and algorithm logic
Observability needed to map directly to algortihms and theoretic concepts: tracking error, saturation, convergence, oscillations, and environment interactions.

Our tech stack:

Python as the orchestration and application layer.
SQL + Trino for distributed, federated querying of large log tables.
Streamlit for rapidly building interactive dashboards without a dedicated UI team.

This combination allowed us to treat CMT as a thin, composable layer on top of existing data infrastructure while keeping the contribution barrier low for new features.

Interface and Usability: Streamlit as a “UI Compiler” for Python

Given the lack of a dedicated front-end engineer and my familiarity with Python, we selected Streamlit as the UI framework. This decision had several practical advantages:

Python-first development: Scientists could express data transformations, metrics, and plots directly in Python without switching languages or stacks.
Rapid iteration: New algorithms, pacing types, or monitoring views could be onboarded through small pull requests that added new pages, or visualizations.
Type-safe, minimal input surface: We intentionally limited user input to a constrained set of parameters (e.g., campaign ID, time window, pacing type) and wired them directly to the query and aggregation logic.

From a systems perspective, we treated Streamlit as a “UI compiler” on top of our Python + SQL logic, which let us evolve quickly while still being rigorous about how we accessed data.

Data Access and Scaling: Predicate Pushdown on Bid Logs

The biggest engineering challenge was safely and efficiently querying large-scale bid-level logs without building a separate heavyweight data pipeline.

We approached this by:

Applying predicate pushdown in Trino, pushing filters such as:
- Time range
- Campaign or controller identifiers
- Pacing configuration
- Environment (e.g., region, experiment cohort)
Restricting queries to pre-partitioned tables and leveraging existing partitioning keys (e.g., by date, region, or exchange).
Performing aggregation as close to the data as possible (using SQL and Trino) before moving results into Python for visualization.

This design avoided the need for a new bespoke analytics system for algorithm-level monitoring, while still enabling us to explore:

Distribution of target vs actual spend over time.
Tracking error and its temporal structure (lags, overshoot, oscillations).
Controller saturation signals (e.g., maxed-out bids, pacing pressure ceilings).
Bid response rates conditioned on controller state.

By pushing complexity into the query layer and constraining the UI, we achieved responsive, interactive monitoring without the heavy infrastructure investment that some teams initially assumed was required.

Multi-Layer Observability: From Algorithm to Bid

CMT was designed to reveal behavior along three layers:

Algorithm-level metrics
- Reference signals (e.g., budget grants/trajectory).
- Controller outputs (e.g., pacing pressure, bid multipliers).
- Internal states (when available), such as accumulated error, integral terms, or mode switches.
Interaction-level metrics
- How the algorithms interacts with other components
- Impact of configuration flags, safeties, or fallback logic.
Bid-level metrics
- Bid price distributions over time.
- Win rates and clearing prices, conditioned on controller state.
- Outlier detection (e.g., extreme bids, abnormal win patterns).

Bringing these layers together in a single tool allowed us to move beyond “this campaign is underdelivering” toward explainable behavior: why the system was behaving a certain way, and how it interacted with the rest of the system.

Impact and Adoption

As CMT matured, it became a reference tool for bid-level monitoring, and it influenced the design of other internal tools, including supply-side monitoring.

Key outcomes:

Broad internal adoption across multiple organizations (DSP, SSP, publisher-facing teams), all using the same monitoring patterns and abstractions.
Standardization of observability: CMT helped define what “good” monitoring looks like for budget pacing and bidding algorithms.
Faster incident diagnosis and experimentation: Teams could reason about algorithmic behavior during anomalies, A/B tests, and new feature rollouts, using quantitative, controller-aware metrics instead of relying solely on business KPIs.

Lessons Learned

From building the campaign Monitoring Tool, a few key lessons stand out:

Business metrics are not enough: For algorithm-heavy systems like oRTB, you need system-level and algorithm-level observability to understand and improve behavior.
Leverage existing strengths: Choosing a Python + Streamlit + SQL/Trino stack allowed scientists and engineers to contribute directly without waiting for specialized front-end or platform support.
Constrain and compose: By carefully limiting user inputs and pushing computation down into the query engine, we made the tool both scalable and safe to operate at bid-log scale.
Iterate with real users: Early engagement with scientists and engineers shaped the metric set and UI flows, making the tool practical rather than theoretical.

CMT started as a way to answer basic questions about algorithmic behavior, but it evolved into a foundational observability layer for algorithmic control in oRTB systems—bridging the gap between business outcomes and the underlying loops that generate them.