Performance Budgets in CI: Failing a PR on a 200ms LCP Regression

Yves SoeteFollow

6 min read · Dec 19, 2025

DEC 19, 2025 - Written by Yves SoeteBlacksight LLC — performance testing + CI budgets in one tool atkuality.io

Akamai's classic study showed that a 100ms delay in page load cuts conversion by 7%. Amazon put it at $1.6B in lost sales for every second. Google uses Core Web Vitals as a direct ranking signal. None of this is new information, and yet the single most common QA gap in 2026 is that performance regressions ship silently — caught weeks later by RUM dashboards and traced back to a PR from someone who has since forgotten why they added that particular npm package. The fix is performance budgets in CI: thresholds encoded as build assertions that fail the PR the instant they're exceeded. This post walks through how to set them up, which metrics actually matter, and the measurement traps that will bite you.

The three metrics worth budgeting

Google's Core Web Vitals narrowed the performance conversation to three numbers: LCP, CLS, and INP. All three are measurable in the field via the web-vitals library, measurable in the lab via Lighthouse, and have published "good / needs improvement / poor" thresholds that align with Google search ranking impact.

LCP (Largest Contentful Paint) is the single best proxy for perceived load speed. Threshold: < 2.5s good, > 4s poor. This is the hero image or above-the-fold text finishing render. If LCP regresses, everything downstream gets worse — bounce rate, conversion, SEO.

CLS (Cumulative Layout Shift) measures visual stability. Threshold: < 0.1 good, > 0.25 poor. Ads loading late, web fonts swapping in, images without dimensions — all contribute to CLS. Low-CLS sites feel stable. High-CLS sites feel cheap.

INP (Interaction to Next Paint) replaced FID in 2024 and measures how responsive your app feels after a tap or click. Threshold: < 200ms good, > 500ms poor. This is where heavy JavaScript bundles, un-deferred third-party scripts, and main-thread blocking work surface.

These are the three to budget. Ignore noise metrics like "First Meaningful Paint" (deprecated) and "First CPU Idle" (replaced by INP). TTFB is useful but infrastructure-owned; it rarely changes PR-over-PR.

Lab vs. field: why both matter in CI

Lab measurements (Lighthouse in CI, WebPageTest synthetic runs) are reproducible but artificial. Field measurements (RUM, CrUX data from real users) are realistic but lagging. CI pipelines can only use lab measurements — you cannot block a merge on data from a user who hasn't visited yet. That's fine. Use lab for the gate, field for the audit.

The trap is running your lab measurements on a fiber-connected laptop and shipping to users on 3G. Every performance regression happens on slow networks first. Always configure your CI Lighthouse runs with throttling (network: "Slow 4G", CPU slowdown: 4x). Without throttling, you're measuring how fast your PR loads on your laptop — information of zero value.

A related trap: measuring once. Performance metrics have 10-20% natural variance between runs. One Lighthouse score doesn't tell you anything. Run five, take the median, compare the median against the budget. Lighthouse CI has this built in via its `numberOfRuns` config.

The Lighthouse CI wiring

Lighthouse CI is the Google-maintained wrapper that runs Lighthouse in CI, stores historical data, and compares against budgets. Setup is three files: a `.lighthouserc.json` with your URLs and thresholds, a CI Action that runs `lhci autorun`, and an LHCI server (free, self-hosted, or the public Google one) that stores historical data.

A basic .lighthouserc.json for a PR gate might include 5 key URLs (home, pricing, product, login, checkout), 5 runs per URL, Slow 4G throttling, and thresholds like LCP under 2500ms, CLS under 0.1, INP under 200ms, and total JS bundle under 300KB gzipped. The PR fails if any one URL breaks its budget.

Ship this configured against your production URL for the baseline. Then in CI, run it against your preview deploy. The delta between preview and baseline is your PR's impact. If LCP went from 2.1s to 2.4s, you're still under the threshold, but you now know this PR added 300ms to perceived load. Some teams block on delta (e.g. "no single PR can regress LCP more than 200ms") even if the absolute number is still green.

Budgets that matter beyond the Vitals

Alongside the three Vitals, the highest-ROI secondary budgets:

Total JavaScript bundle size. Thresholds that hold up: < 170KB gzipped for a content site, < 300KB for a mid-complexity SaaS, < 500KB for a heavy dashboard. Enforce this with webpack-bundle-analyzer + a size-limit check. This is the single most effective anti-bloat lever; one fat npm package can tank LCP and nothing else catches it as early.

Number of render-blocking resources. Each additional render-blocking stylesheet or synchronous script adds one round trip to first paint. Budget: no more than 2 (your primary CSS + maybe your critical fonts). Everything else should be deferred, async, or lazy.

Third-party script count and weight. Third parties — analytics, marketing pixels, customer support widgets — are the biggest source of un-tracked regressions. Budget: total third-party script weight under 100KB gzipped, executed async. Marketing teams will push back. The budget is the negotiation.

Wiring PR comments for visibility

A budget that fails silently in a CI log gets ignored. A budget that posts a comment on the PR gets fixed. Lighthouse CI has a built-in GitHub app for this, and there's also `treosh/lighthouse-ci-action` which posts a table to the PR showing the metric, the threshold, the PR value, and the delta. That visibility is what turns "the build failed" into "we need to ship this anyway, someone explicitly decides what to do about the regression."

Make the comment actionable. Include the exact URL that regressed, the metric that broke, and a Lighthouse report link so the author can immediately see what's heavy. Without those, engineers will just force-merge.

When to break the budget anyway

Every budget will be broken eventually, usually for a legitimate reason — a new marketing pixel, a real feature that adds weight, a hero video. Your process should have an escape hatch: a PR label like `perf-waiver` that, when applied, documents the reason in a comment and merges despite the failure. This is healthier than lowering the budget under pressure or deleting the check entirely. Audit waivers monthly. If the same person is waiving every week, the budget is the wrong budget. If waivers happen once a quarter, the process is working.

What this is not

Performance budgets don't make your app fast. They stop your app from getting slower. Those are different problems. If your baseline LCP is already 5 seconds, budgets just lock in 5 seconds. You still need a separate project to optimize the baseline — image compression, code splitting, critical CSS inlining, server-side rendering. Budgets prevent regressions once you've done that work. They're the preservation pass, not the optimization pass.

The payoff compounds: every PR held to a budget is a PR that didn't become next quarter's performance incident. Over 12 months that's the difference between a web app that feels fast and one that mysteriously got slow — the kind of drift that's impossible to untangle without a git bisect spanning 2,000 commits.

If you'd like to see your current Vitals baseline across every URL on your site in one report, run a free test at kuality.io— we measure LCP, CLS, and INP for every discovered page and flag which ones are already above threshold.

Cookie settings