Skip to main content
Security 9 min read

Headless Browser Detection: How to Spot Automation Behind a Real Browser

AntiProxies Team
Featured image for Headless Browser Detection: How to Spot Automation Behind a Real Browser

Modern bots don't look like bots. They arrive in a real Chromium instance, execute JavaScript, handle cookies, and render pages exactly the way a human browser would. The difference between a legitimate user and an automated script has never been harder to see - but the signals are still there if you know where to look.

What headless browsers actually are

A headless browser is a fully functional web browser running without a graphical user interface. It loads pages, executes JavaScript, handles redirects, stores cookies, and behaves like a normal browser - just without a visible window. Tools like Puppeteer (Chrome DevTools Protocol over Chromium), Playwright (multi-browser, Microsoft-maintained), and the older Selenium WebDriver are the most widely used.

These tools are legitimate and widely used for:

  • End-to-end testing: Most CI pipelines run headless Chrome to validate UI flows.
  • PDF generation and screenshots: Rendering HTML to image or document formats.
  • Web scraping: Extracting data from JavaScript-heavy pages that basic HTTP clients can't handle.
  • Automated form submission: Legitimate automation of repetitive workflows.

The problem is that the same tools power account creation fraud, credential stuffing, price scraping, ad fraud, and every other form of high-volume automated abuse. The underlying technology is identical - only the intent differs.

The old detection methods and why they failed

Early headless browser detection relied on a handful of obvious signals that early tools left behind. Some of these checks still appear in detection scripts, but skilled operators have patched around all of them.

navigator.webdriver - The WebDriver spec requires browsers under automation to expose navigator.webdriver = true. This was the canonical detection signal for years. It's trivially bypassed: Puppeteer added --disable-blink-features=AutomationControlled in 2019, and stealth libraries like puppeteer-extra-plugin-stealth patch this property to undefined before any page scripts run.

Missing browser plugins - Early headless Chrome had an empty navigator.plugins array. Real browsers have PDF viewers and media plugins installed. Stealth patches now inject realistic plugin entries. Playwright's browser contexts ship with this patched by default.

User agent strings - Some tools sent obviously bot-like user agents ("HeadlessChrome/"). Modern setups copy user agents directly from real browser releases and rotate them. Checking user agent strings alone is security theater.

Canvas and WebGL fingerprinting discrepancies - Headless environments historically produced different canvas renders due to software rendering vs. GPU rendering. Cloud environments running headless browsers now often have GPU passthrough or use SWIFTSHADER in ways that closely match expected rendering outputs. The signal still exists but requires careful baseline comparison.

Signals that still work

Despite the arms race, real browsers and automated browsers still behave differently in ways that are harder to spoof at scale:

Hardware concurrency and memory

navigator.hardwareConcurrency reports CPU core count. Real user devices cluster around 4, 6, 8, or 16 cores. Cloud servers running hundreds of headless instances often show values like 2 or 96 (matching the VM spec), which is statistically rare in real user populations. Combined with navigator.deviceMemory, you get a profile that can be anomalous even when individually plausible.

Chrome runtime object inconsistencies

A real Chrome browser populates window.chrome with an extensive object tree including chrome.runtime, chrome.loadTimes(), and extension APIs. Headless Chrome exposes a minimal or empty window.chrome by default. Stealth libraries inject this, but the injection has to happen before page scripts run - and it has to perfectly replicate a structure that changes with every Chrome release. Small discrepancies in method signatures, missing properties, or wrong return types are detectable.

Event timing and interaction patterns

Human mouse movements are noisy and curved. Even fast humans don't move in perfectly straight lines, and they don't click at pixel-perfect coordinates. Automated scripts that simulate mouse events to trigger form validation often produce movement trajectories that are geometrically too clean. More subtly, the timing distributions of keyboard events - the gap between keydown and keyup, the inter-key intervals - follow human patterns that random sleeps don't replicate well.

This matters particularly for credential stuffing attacks that need to trigger JavaScript validation on login forms. The interaction pattern is a signal even when the browser fingerprint looks clean.

TLS fingerprinting (JA3/JA4)

At the network layer, TLS client hellos have fingerprints based on cipher suite order, extension types, and other parameters. A Chromium instance controlled by Playwright sends a TLS fingerprint that's subtly different from a user-launched Chrome, because the underlying networking stack may be initialized differently or using different SSL library versions. JA3 and JA4 fingerprinting at the edge can flag automation even before any JavaScript runs on the page.

Inconsistent API surfaces

Real browsers accumulate years of installed state: fonts, audio output devices, connected peripherals. The AudioContext fingerprint, the list of available fonts detected via CSS, and the set of supported media codecs vary slightly between real machines. Headless instances running in clean containers have minimal variation - they cluster tightly around default values, which looks suspicious in aggregate even if any single instance looks normal.

The stealth toolkit ecosystem

There's an entire open-source ecosystem dedicated to bypassing headless detection:

  • puppeteer-extra-plugin-stealth: Patches 10+ detection vectors including navigator.webdriver, plugin arrays, Chrome runtime, and more.
  • undetected-chromedriver: A patched version of ChromeDriver designed specifically to evade bot detection services.
  • Playwright with custom launch args: Extensive community documentation on which flags to pass to minimize detection.
  • Residential proxy integration: The stealth browser is paired with residential proxies to give each session a clean IP that doesn't appear in blocklists.

This ecosystem is actively maintained and tested against major detection services. What works today may be detected tomorrow - and vice versa. It's an ongoing arms race, not a solved problem.

Why headless detection can't stand alone

No single headless browser detection check is reliable in isolation. A determined adversary with time and resources will patch any single signal you depend on. Effective detection requires combining multiple layers:

  • IP intelligence first: Most headless browser operations route through proxy networks. If the connecting IP is a known datacenter, VPN, or residential proxy, that's a strong prior signal before any browser analysis happens. This is the layer where VPN and proxy detection pays off most directly - it raises the baseline risk score before you even evaluate browser signals.
  • Behavioral signals: Interaction patterns, scroll behavior, and page dwell time add signals that are expensive to fake at scale.
  • Browser fingerprinting: Not as a pass/fail gate, but as a risk score contributor. Fingerprints that look too clean, too consistent, or too standard are suspicious.
  • Session-level patterns: A session that navigates directly to a checkout or login URL with no prior browsing, arriving from a fresh install with no cookies, is statistically unusual.

Layered detection is covered in more depth in our guide to building a fraud prevention stack.

The legitimate use case problem

One reason headless browser detection is genuinely hard is that you'll always have false positives. Security researchers, automated testing systems, internal scraping tools, and CI pipelines all look like bots because they are bots - legitimate ones. An overly aggressive detection policy that blocks all headless traffic will also block your own QA team and legitimate enterprise integrations.

The practical approach is step-up friction rather than hard blocks. When headless signals are present, add a challenge or require re-authentication rather than returning a hard 403. Real users behind automation (testing engineers, scraping their own data) can complete challenges; fully automated pipelines attacking you cannot do so at scale without significant human involvement that raises the cost of the attack.

Where IP intelligence fits in

Headless browser setups almost always route through proxy infrastructure. The automation tooling handles the browser side; the residential proxy or VPN handles making the IP look clean. Detecting the proxy layer doesn't require any browser analysis - it happens at the network level before the page loads.

This is why combining IP intelligence with browser-level signals is so effective: the IP signal catches proxy-routed automation at the network layer; the browser signals catch automation that tries to connect from clean IPs. Together they close most of the gap. For a comprehensive view of how the proxy layer enables attacks, see our post on why residential proxies are the hardest threat to detect. For the broader detection stack, see how device fingerprinting works and where it fails.

AntiProxies provides the IP intelligence layer: a downloadable database of VPN, proxy, Tor, and datacenter IP ranges that runs on your infrastructure with no API latency. Combined with your own behavioral and browser analysis, it gives you a solid foundation for detecting headless automation without depending on any single signal. Explore our bot detection capabilities or VPN and proxy detection for implementation details.

Want to see what's in the database?

Download once, query as many times as you need. €99/year for all 22 databases, unlimited servers, and a full year of monthly updates. No usage limits, no per-query fees, no data leaving your servers.

30-day money-back guarantee
All databases included
Monthly updates