Accessibility Testing in CI: Why axe-core Belongs in Your PR Pipeline

Yves SoeteFollow

7 min read · Jan 2, 2026

JAN 2, 2026 - Written by Yves SoeteBlacksight LLC — QA testing + CI/CD gating in one tool atkuality.io

Every accessibility lawsuit settled in 2024 and 2025 had the same root cause: a regression shipped months ago that nobody caught. The fix was always cheap in isolation — a missing aria-label, a contrast ratio one point under threshold, a keyboard-trap in a modal nobody tested without a mouse. The cost multiplier was time. By the time the complaint arrived, the team had shipped 40 more deploys on top of it, and the regression was entangled with a dozen unrelated changes. This post is about how to stop writing that bill. The tool is axe-core. The right place for it is your PR pipeline. And the cost of running it on every commit is measured in seconds.

Why manual audits lose the race

The most common accessibility workflow I see at mid-size companies is "audit once a year." An agency does a WCAG 2.2 AA review, delivers a 70-page PDF, the team files the findings into Jira, and within three sprints the engineering priorities have shifted and half the tickets are stale. By the time the next audit arrives, the ground has moved under it. New components have been built on top of the regressions that were never fixed, and the new findings contaminate the old ones.

Manual audits are valuable — they catch things automation can't, like whether a screen reader actually makes sense of your checkout flow. But they're the wrong primary defense. The primary defense is machines running axe-core on every PR, failing the build when a new violation appears, and flagging the specific DOM node and WCAG criterion so the PR author fixes it before merge. That flow catches 30-50% of all WCAG violations automatically, with near-zero ongoing cost, and it prevents regression compounding. What's left for human reviewers is the interesting 50% — the stuff that actually requires judgment.

What axe-core catches (and what it doesn't)

axe-core, maintained by Deque, is the most widely-used accessibility testing engine on the web. It's the engine behind the browser devtools accessibility tabs in Chrome, Firefox, and Edge. It's also what backs Cypress-axe, Playwright's a11y assertions, jest-axe, and the GitHub Actions a11y linters. You're already using it transitively — the question is whether you're enforcing it.

Out of the box, axe-core evaluates around 90 rules grouped into WCAG 2.0, 2.1, and 2.2 levels A, AA, and (optionally) AAA. Typical catches: missing form labels, insufficient color contrast (4.5:1 for normal text, 3:1 for large text), missing lang attribute on html, invalid ARIA attributes, images without alt text, form controls without associated labels, link text that's ambiguous ("click here"), and heading-hierarchy violations.

What it can't catch, and which is why manual testing still matters: whether a modal correctly traps focus, whether your error messages make sense to a screen reader, whether your "skip to content" link actually works, whether your drag-and-drop has a keyboard alternative, whether the reading order of a visually-complex layout matches the DOM order. These are human-judgment tests. But automated catches clear the deck so the human testing can focus on them.

The GitHub Action wiring

The simplest pattern is axe-core running against a preview deployment of your PR. Concretely:

1. PR opens → your deploy pipeline spins up a preview URL (Vercel, Netlify, Render, or a self-hosted preview environment).

2. A GitHub Action runs Playwright against the preview URL, navigates to each route on a small curated list (homepage, login, pricing, checkout, account settings), and runs axe.run() at each stop.

3. Results are compared against a baseline. New violations fail the build. Existing violations are tracked but don't block the PR — otherwise legacy debt paralyzes new work.

The @axe-core/playwright package is the straightforward wrapper. In a Playwright test file you instantiate an AxeBuilder against the current page, call analyze, and then assert that the violations array is empty or matches your stored baseline. The whole Action typically takes 30-90 seconds on a modest repo.

Which violations should actually block the merge

Do not configure axe to fail the build on every single violation. You'll get revolt from your engineering team within a week. Instead, classify by severity:

Block: critical and serious violations newly introduced by the PR. These are things like missing form labels, inaccessible interactive elements, and contrast failures on primary CTAs. These are fast to fix and have real legal exposure.

Warn but don't block: moderate and minor violations. Things like heading-level skips, empty link text, or ambiguous button labels. These should generate a PR comment, and the author is expected to acknowledge, but the merge isn't held.

Track, don't flag: pre-existing violations. You maintain a baseline snapshot (say, `a11y-baseline.json` committed to the repo) and only fail the PR if its set of violations increased relative to that baseline. This lets you make progress without the baseline rejecting every new PR.

False positives and how to suppress them cleanly

About 5-10% of axe-core hits are context-dependent and not actually a11y bugs. Common patterns: third-party embeds (a YouTube iframe without a caption track), explicitly decorative images on a carousel that don't need alt text, contrast violations on disabled button states. The right fix is not to disable the rule globally — it's to disable it per-selector or per-URL.

axe-core supports per-selector rule disabling via its configure API — you can turn off the color-contrast rule for a specific selector like .btn-disabled without disabling it globally. Apply narrowly. If you find yourself writing more than 10-15 suppressions, you probably have real issues hiding — not a noisy rule. In that case, audit your suppressions list quarterly as part of QA hygiene.

Where this fits in the bigger QA picture

Accessibility CI is one station in a larger PR gating pipeline. In practice, the teams with the healthiest QA processes run four automated checks on every PR: unit + integration tests, Lighthouse performance budgets, axe accessibility baseline, and cross-browser smoke tests against Chromium + Firefox + WebKit. Each runs in parallel, each contributes to a single status check on the PR, and each has clear fix ownership.

If you're setting up from scratch, the order I'd recommend is: unit/integration first (obviously), then Lighthouse (the highest ratio of user impact to setup effort), then axe (legal exposure + low false-positive rate), then cross-browser (most complex to set up, pays off as the app matures). Most teams can get all four in place in a week.

What this is not

Running axe-core in CI does not make your app accessible. It makes your app free of a specific class of common, high-frequency, mechanically-detectable accessibility bugs. A screen reader user might still have a miserable experience on a site that passes axe 100%. That's where human testing, user research with actual disabled users, and manual keyboard-only walkthroughs come in. What axe gives you is the floor — below which you shouldn't ship. Above that floor, the interesting work begins.

The practical goal is to automate the cheap, mechanical stuff so your design review and QA cycles have bandwidth for the judgment stuff. That's the story for every automated QA tool worth deploying.

If you'd like to run a one-click accessibility audit across your entire site right now, try kuality.io— we run axe-core across all discovered URLs as part of our QA test and deliver a WCAG 2.2 AA compliance report you can hand to a legal team.

Cookie settings