Synthetic Monitoring vs. Real User Monitoring: When You Need Both
May 6, 2026 · 8 min read
Every web performance conversation eventually arrives at the same fork in the road: should you monitor from the outside in, running scripted checks from controlled locations, or from the inside out, collecting telemetry from real users on real devices? The answer, in nearly every production environment that takes reliability seriously, is both. But understanding what each approach actually tells you, and where each one has blind spots, is the difference between a monitoring stack that catches problems and one that generates dashboards nobody acts on.
What Synthetic Monitoring Actually Is
Synthetic monitoring runs scripted browser interactions against your application at fixed intervals from known locations. A synthetic check might load your homepage from a data center in Frankfurt every five minutes, navigate to the pricing page, and measure Time to First Byte, Largest Contentful Paint, and Total Blocking Time. If any metric exceeds a threshold, it fires an alert.
The defining characteristic of synthetic monitoring is control. You control the device (a consistent VM or container), the network (a known bandwidth and latency profile), the browser (a specific Chromium version), and the script (a deterministic sequence of actions). This control makes synthetic checks reproducible. When a metric degrades, you know the change came from your application or infrastructure, not from a user switching from WiFi to cellular mid-session.
Synthetic checks are implemented with browser automation tools. Playwright and
Puppeteer are the most common engines. The check script is typically 20-50 lines
that navigate to a URL, wait for a specific element, extract performance metrics
from the
PerformanceObserver
API, and report them to a time-series database. Services like Datadog Synthetic
Monitoring, Checkly, and Uptrends abstract away the infrastructure and provide
managed runtimes in dozens of global locations.
What Real User Monitoring Actually Is
Real User Monitoring (RUM) collects performance telemetry from actual user sessions in production. A JavaScript snippet embedded in your application captures navigation timing, Core Web Vitals, JavaScript errors, and user interactions, then beacons that data to a collection endpoint. Every page load by every user on every device contributes a data point.
The defining characteristic of RUM is breadth. You see performance across the full distribution of real conditions: a user on a 2019 Android phone on a 3G connection in rural Indonesia, a user on an M3 MacBook Pro on gigabit fiber in San Francisco, and everything in between. This distribution is impossible to replicate synthetically because you cannot anticipate every combination of device, browser, network, and geography that your users will bring.
RUM implementations fall into two categories. First-party collection
uses the
web-vitals
library from Google (or the raw
PerformanceObserver
API) to capture metrics and send them to your own analytics pipeline. This gives
full control over data and avoids third-party scripts, but requires building the
collection, aggregation, and visualization infrastructure.
Third-party RUM services
like Datadog RUM, New Relic Browser,
SpeedCurve, and Sentry Performance provide a drop-in script tag, pre-built dashboards,
and alerting. The tradeoff is cost (typically priced per million sessions) and an
additional third-party script on your page.
What Each Catches That the Other Misses
The blind spots of each approach are symmetric, which is exactly why you need both.
Synthetic catches, RUM misses:
- Outages during low-traffic hours. If your API goes down at 3 AM on a Tuesday, RUM has no data because no users are visiting. Synthetic checks are still running every five minutes and will fire an alert within the check interval. For B2B SaaS products with low overnight traffic, this is a critical gap.
- Performance degradation before users notice. Synthetic checks with tight thresholds detect a 200ms regression in TTFB immediately. RUM data requires enough sessions to make the regression statistically significant, which can take hours or days on lower-traffic pages.
- Pre-deployment validation. You can point synthetic checks at a staging environment or a canary deployment before promoting to production. RUM, by definition, only reports on production traffic.
- Geographic coverage gaps. If you have no users in South America but want to verify CDN performance there, only synthetic checks from Sao Paulo will tell you.
RUM catches, synthetic misses:
- Real device diversity. Your synthetic checks run on a modern Chromium instance with 4 GB of RAM. Your users include people on Samsung Galaxy A14s with 3 GB of RAM and 50 open tabs. RUM reveals the performance that the bottom quartile of your users actually experiences.
- Long-tail page performance. Synthetic checks typically cover your 10-20 most important pages. RUM covers every page, including the blog post that went viral on social media and is loading a 14 MB hero image that nobody on the team knew about.
- Third-party script impact. Your synthetic checks might not load the same ad network, analytics, and chat widget scripts that real users encounter. RUM captures the actual total page weight including every third-party resource.
- Business impact correlation. RUM data can be joined with conversion data. You can answer "what is the revenue impact of our LCP being above 2.5 seconds?" Synthetic data cannot answer this question because it has no connection to user behavior or business outcomes.
| Capability | Synthetic | RUM |
|---|---|---|
| 24/7 availability monitoring | Yes | Only during active traffic |
| Real device performance | No (controlled environment) | Yes |
| Pre-deploy validation | Yes | No |
| Long-tail page coverage | Limited to scripted pages | Every page visited |
| Third-party script impact | Partial | Full |
| Business metric correlation | No | Yes |
| SLA compliance tracking | Yes (deterministic) | Informational only |
Synthetic for SLA Tracking and Contractual Obligations
If your organization has SLA commitments, whether to customers, partners, or internal stakeholders, synthetic monitoring is the only defensible measurement method. SLAs require consistent, reproducible measurement conditions. You cannot base an uptime SLA on RUM data because RUM depends on user traffic patterns, which are inherently variable and outside your control.
A typical SLA monitoring setup uses synthetic checks from three to five geographic regions, running at one to five-minute intervals. The check must be simple and deterministic: load the page, verify a key element renders, record the response time. Uptime is calculated as the percentage of checks that return a successful response within the threshold. This gives you a clean, auditable metric that you can share with stakeholders.
The critical detail that many teams miss: your SLA check must be a dedicated, minimal script, not your full visual regression suite. A complex multi-step flow that occasionally fails due to a flaky selector will corrupt your uptime metric. Keep SLA checks simple: one URL, one assertion, one metric.
Cost Comparison
Cost is often the deciding factor in how teams split their monitoring investment, so it is worth being concrete about the numbers.
Synthetic monitoring costs scale with the number of checks, check frequency, and number of locations. Checkly charges roughly $1.20 per 10,000 check runs. If you monitor 20 pages from 5 locations at 5-minute intervals, that is approximately 864,000 check runs per month, costing around $100/month. Datadog Synthetic tests start at roughly $5 per test per month with included check runs. Self-hosted solutions using Playwright in a container are effectively free beyond compute costs, but require engineering time to maintain.
RUM costs
scale with traffic. Datadog RUM is approximately $1.50
per 1,000 sessions. A site with 500,000 monthly sessions would pay around $750/month
for RUM alone. New Relic Browser and SpeedCurve have similar per-session pricing.
The
web-vitals
library with a self-hosted collection endpoint (sending to BigQuery, ClickHouse, or
your own Postgres) is free beyond infrastructure costs, but you lose the pre-built
dashboards and alerting.
For most mid-size applications (100k-1M monthly sessions), expect to spend $100-300/month on synthetic monitoring and $200-1,500/month on RUM, depending on vendor and traffic volume. The combined cost is meaningful, which is why many teams start with synthetic only and add RUM as traffic and revenue justify the investment.
The Combined Approach: Synthetic as Baseline, RUM for Validation
The monitoring strategy that works best in practice uses synthetic checks as the controlled baseline and RUM data as the real-world validation layer.
Synthetic establishes the performance contract. Your synthetic checks define what performance should be under ideal conditions. If your homepage TTFB is 180ms from your synthetic check in us-east-1, that is your performance floor. Any regression in synthetic metrics indicates a change in your application or infrastructure, not in user conditions.
RUM reveals whether the contract holds in reality. If your synthetic TTFB is 180ms but the p75 RUM TTFB is 900ms, the gap tells you that real users are experiencing something your synthetic checks do not capture: CDN cache misses, third-party script contention, slow DNS resolution from specific ISPs, or server performance issues under production load that your synthetic checks (which hit an idle server) do not trigger.
The operational workflow looks like this:
- Deploy a change to production.
- Synthetic checks run immediately and validate that no hard regression occurred under controlled conditions. This takes minutes.
- Over the next 1-2 hours, RUM data accumulates from real users on the new version.
- Compare the RUM distribution (p50, p75, p95) against the previous deployment. If the RUM distribution shifts negatively while synthetic metrics are unchanged, the regression is device, network, or third-party-dependent.
- If synthetic metrics degrade but RUM appears stable, the regression may be masked by CDN caching or geographic distribution. Investigate the synthetic failure, as the regression is real and will eventually surface in RUM.
Common Pitfalls
Teams that rely on only one approach consistently get burned in predictable ways.
Synthetic-only teams miss mobile performance issues. Your synthetic checks run on a server with 8 GB of RAM and a wired connection. Meanwhile, 60% of your traffic is from mobile devices with constrained memory and variable network conditions. A JavaScript bundle that takes 200ms to parse on your synthetic check's CPU takes 1.8 seconds on a mid-range Android device. Without RUM, you have no visibility into this gap. Teams in this situation often discover their mobile performance problem from app store reviews or social media complaints rather than their monitoring system.
RUM-only teams miss outages during low-traffic hours. If your database connection pool exhausts at 2 AM, RUM has nothing to report because no users are generating sessions. The outage can persist for hours until morning traffic resumes and RUM data starts showing errors. A synthetic check would have caught it within five minutes.
Both approaches suffer from alert fatigue if thresholds are not calibrated. A common mistake is setting alerts on absolute thresholds (LCP must be under 2.5s) without accounting for baseline variance. Better practice is to alert on deviations from the rolling baseline: "LCP p75 is 30% higher than the trailing 7-day average." This catches regressions without firing on normal variance.
Ignoring geographic distribution. Both synthetic and RUM data should be segmented by region. A global average TTFB of 300ms might hide the fact that users in Asia are experiencing 1.2 second TTFB because your CDN has no point of presence there. Synthetic checks from multiple regions, combined with RUM data segmented by country, reveal these geographic performance gaps.
The fundamental principle is straightforward: synthetic monitoring tells you what your application can do under controlled conditions, and RUM tells you what it actually does for real people. Neither tells the full story alone. Together, they give you a monitoring system that both catches regressions quickly and reflects the experience your users actually have.
Monitor your Core Web Vitals with synthetic checks →