· 16 min read Bot Detection Fraud Prevention Security

Bot Detection: How to Block Bad Bots in 2026

Piero Bassa

Piero Bassa

Founder & CEO

Summarize this article with ChatGPT Claude Claude Perplexity Perplexity
Bot detection guide showing a shield with fingerprint icon and bot icons on a dark blue circuit board background

In 2024, bots crossed a threshold: for the first time in a decade, automated traffic surpassed human traffic on the internet. Imperva reported that bots generated 51% of all web requests, and the majority were not friendly search engine crawlers. They were bad bots: scripts designed to scrape, steal, spam, and exploit. Heading into 2026, the problem is only getting worse as AI-powered automation tools make it cheaper and easier than ever to deploy sophisticated bots at scale.

If you run a website, these bots are already hitting it. They test stolen credentials against your login page. They extract your pricing and content. They flood your forms with junk. They pollute your analytics so badly that you cannot tell what your real users are doing.

Most of this happens silently. Your servers respond. Your bandwidth gets consumed. And unless you are specifically looking for it, you might never notice until the damage shows up in your chargeback rates, your support queue, or your monthly infrastructure bill.

Bot detection is the practice of identifying this automated traffic and separating it from real human visitors. Done well, it is invisible to your actual users and devastating to the bots.

Not all bots are bad

Before we go further, an important distinction. Not every bot is your enemy.

Good bots serve legitimate purposes:

  • Search engine crawlers (Googlebot, Bingbot) index your pages so people can find you
  • Monitoring tools check uptime and performance
  • Feed readers pull RSS and content updates
  • SEO auditors help you find and fix technical issues

Good bots typically identify themselves in their user agent string, respect your robots.txt rules, and crawl at reasonable rates. You want these on your site.

Bad bots are the problem. They disguise themselves, ignore your rules, and operate at a scale designed to exploit your systems. The rest of this article is about them.

What bad bots actually do

The damage depends on your business, but here are the most common attack patterns.

Credential stuffing

An attacker takes a list of username/password combinations leaked from another breach and tests them against your login page. Automated tools can try thousands of combinations per minute. When they find a match, they drain accounts, steal stored payment methods, or resell access.

This is the most financially destructive bot attack for most businesses. Credential stuffing losses have exceeded $6 billion annually and continue to grow as more breached credentials circulate online.

Web scraping

Bots systematically extract content, pricing, inventory levels, or product data from your site. Competitors use this to undercut your prices in real time. Aggregators republish your content without permission. In regulated industries like travel or financial services, unauthorized scraping can create compliance issues.

Inventory hoarding and scalping

Bots add high-demand items to shopping carts faster than any human can click. They hold inventory to create artificial scarcity, then resell at inflated prices. Sneaker drops, concert tickets, and GPU launches are the textbook examples, but it happens in any market with limited supply and high demand.

Form spam and fake accounts

Automated scripts flood sign-up forms, comment sections, and contact forms with junk data. They create fake accounts at scale to abuse free trials, exploit referral programs, or launder fraud through your platform.

Ad fraud and analytics pollution

Click bots inflate ad metrics, draining your ad budget without generating real engagement. Even if you do not run ads, bot traffic skews your analytics, making it harder to understand actual user behavior and make informed business decisions.

DDoS and application-layer attacks

Volumetric attacks aim to overwhelm your infrastructure. But the more subtle (and increasingly common) application-layer attacks target specific endpoints like search, checkout, or API routes that are expensive to process. A relatively small number of requests can bring a service down if they hit the right targets.

Signs your site has a bot problem

You do not always need sophisticated tooling to spot the first signs of bot traffic. These red flags in your analytics and server logs can point to automated visitors:

  • Sudden traffic spikes from unusual sources. A surge in visits from data center IPs, cloud hosting providers, or geographic regions you do not normally serve often signals a botnet. Real user traffic rarely jumps 10x overnight from a single region.
  • High bounce rates with near-zero session duration. If large portions of your traffic land on a page and leave within milliseconds, something is crawling your pages without reading them. Humans do not behave this way.
  • Strange conversion patterns. Seeing hundreds of newsletter signups, account creations, or form submissions with little matching engagement on the rest of the site? Bots fill forms programmatically, often with repetitive or nonsensical data.
  • Impossible metrics. Page views in the billions, sessions from browser versions that do not exist yet, or traffic from operating systems that represent 0.001% of the market. These irrational data points signal bots pretending to be real users.
  • Your content appearing elsewhere. If you find your product descriptions, articles, or pricing data copied verbatim on competitor or aggregator sites, scraping bots are harvesting your pages.
  • Login failure spikes. A sudden increase in failed authentication attempts, especially at odd hours or from distributed IPs, is the signature of credential stuffing.

If any of these look familiar, you have bot traffic. The question is how much, and what it is doing.

Why traditional defenses fail

Most teams start with one of these approaches. None of them hold up against modern bots.

IP blocking

Blocking known bad IPs sounds straightforward, but today’s bots rotate through millions of residential proxy IPs. Block one, and the next request comes from a completely different address. Residential proxies are particularly insidious because the IP belongs to a real household, making it nearly impossible to block without catching legitimate users.

Rate limiting

Setting request thresholds helps with brute-force attacks, but sophisticated bots throttle themselves to stay under your limits. They spread traffic across thousands of IPs and mimic human pacing. By the time a single IP triggers a rate limit, the bot has already accomplished what it came to do.

User agent filtering

Blocking known bot user agents catches only the laziest scripts. Any serious bot spoofs its user agent to match Chrome, Safari, or Firefox. This takes one line of code to do and defeats user agent checks entirely.

CAPTCHAs

CAPTCHAs were designed to be hard for machines and easy for humans. In 2026, neither is true. AI-powered solving services break most CAPTCHAs in seconds for fractions of a cent. Human CAPTCHA farms offer even higher solve rates. Large language models and vision models have made automated solving faster and cheaper than ever. Meanwhile, CAPTCHAs add friction that frustrates real users, hurts conversion rates, and creates accessibility barriers.

CAPTCHAs still raise the cost of an attack slightly, but they are no longer a reliable detection mechanism on their own.

How modern bot detection works

Effective bot detection does not rely on any single signal. It layers multiple detection techniques so that bypassing one layer still leaves several others in place. Here are the core techniques.

Device fingerprinting

Every browser exposes dozens of configuration details through standard web APIs: screen resolution, GPU renderer, installed fonts, audio processing characteristics, canvas rendering output, and more. Collected together, these signals form a fingerprint that is unique to each device.

This matters for bot detection because automated tools leave fingerprints that look fundamentally different from real browsers. A headless Chrome instance running in a data center has a distinct canvas output, a missing or inconsistent set of fonts, and hardware characteristics that do not match what its user agent claims.

Device fingerprinting also enables persistent identification. Even if a bot clears cookies, rotates IPs, or switches user agents between requests, its device fingerprint stays the same. You can link all of those “different” requests back to the same source.

TLS fingerprinting

When a browser connects to your server over HTTPS, the TLS handshake reveals which cipher suites, extensions, and protocol versions the client supports, and in what order. Real browsers have well-known TLS fingerprints (often called JA3 or JA4 fingerprints) that are consistent across millions of users.

Bots using HTTP libraries like requests, curl, or custom Go clients produce TLS fingerprints that look nothing like a real browser. Even bots running headless Chrome inside frameworks like Puppeteer or Playwright sometimes leak non-standard TLS behavior depending on how they are configured.

This is one of the hardest signals for bot operators to fake, because changing TLS behavior requires modifying the networking stack at a low level.

JavaScript environment analysis

A real browser has a rich, consistent JavaScript environment: standard APIs, expected object prototypes, specific error behaviors, and thousands of subtle implementation details. When a bot runs in a modified or emulated environment, inconsistencies emerge.

For example:

  • Headless browsers may lack certain navigator properties or have them set to contradictory values
  • Automation frameworks inject global variables (__selenium_unwrapped, webdriver, _phantom) that detection scripts can check
  • Overridden functions (like Date.now() or Math.random()) behave differently when they have been tampered with
  • The chrome object exists in real Chrome but may be missing or incomplete in headless variants

Detection scripts probe hundreds of these environmental signals to build a confidence score for whether the runtime environment is genuine.

Behavioral analysis

Humans and bots interact with web pages differently. Humans move the mouse in curved, imprecise paths. They scroll at variable speeds. They pause to read. They make typos. Bots, even sophisticated ones, struggle to replicate this organic randomness at every interaction point.

Behavioral analysis tracks:

  • Mouse movements: Trajectory curves, acceleration, micro-corrections
  • Scrolling: Variable speed, natural pauses, direction changes
  • Keystrokes: Timing rhythm, realistic errors and corrections
  • Page interaction timing: Time between load and first click, dwell time per section
  • Form completion: Humans skip optional fields, make typos, and tab between inputs at inconsistent speeds. Bots fill every field instantly with data that often follows predictable patterns.
  • Navigation flow: The sequence and timing of pages visited across a session

A human might spend 30 seconds reading a product page before clicking “Add to Cart.” A bot clicks in 200 milliseconds. Even when bots add artificial delays, the statistical distribution of their timing never quite matches human randomness.

Honeypots

Honeypots are invisible traps embedded in your pages that only bots interact with. A common implementation is a hidden form field: it is present in the HTML but styled so humans never see it. A real visitor will never fill it in. A bot parsing the DOM will.

Other honeypot techniques include:

  • Hidden links that lead to monitoring endpoints (only crawlers follow them)
  • Invisible <input> fields with enticing names like email2 or url that attract bot form-fillers
  • Fake API endpoints listed in your HTML source that legitimate users would never call

Honeypots are simple to implement, add zero friction for real users, and catch a surprising number of automated scripts. They are not enough on their own, but they are an excellent early signal in a layered detection system.

HTTP header and request analysis

Every HTTP request carries headers that reveal information about the client. Real browsers send a specific, predictable set of headers in a consistent order. Bots often get this wrong in subtle ways:

  • Missing or extra headers that real browsers always (or never) include
  • Header ordering that does not match the claimed browser
  • Inconsistent Accept-Language values that contradict the claimed locale
  • Missing or malformed Sec-CH-UA client hints

The combination of all request-level signals creates another layer that complements client-side detection.

Challenge-response verification

When passive detection is not conclusive enough, active challenges provide a definitive answer. These are not CAPTCHAs. Modern challenge-response systems are invisible:

  • Proof-of-work challenges require the client to solve a computational puzzle, trivial for a single browser but expensive at bot-farm scale
  • JavaScript execution challenges require the client to run specific code and return correct results, verifying a real JS runtime
  • Interaction challenges present micro-tasks (like a subtle animation) that require genuine browser rendering

The key is that legitimate users never see these challenges. They happen silently in the background. Only suspicious traffic gets tested.

Building a bot detection strategy

No single technique catches every bot. The most resilient detection systems combine multiple layers and continuously adapt. Here is how to think about building yours.

Start with visibility

You cannot block what you cannot see. Before deploying any blocking rules, instrument your traffic to understand what is hitting your site. Look at:

  • Traffic patterns by time of day and geography
  • Request rates per IP, per session, and per device fingerprint
  • JavaScript execution rates (bots that do not execute JS stand out immediately)
  • Conversion funnels (bot traffic creates huge dropoff anomalies)

This baseline tells you how much of your traffic is automated and where it concentrates.

Protect high-risk entry points first

Not every page on your site needs the same level of protection. Focus detection efforts on the endpoints where bots cause the most damage:

  • Login and authentication pages: Credential stuffing target
  • Checkout and payment flows: Payment fraud and card testing
  • Account registration: Fake account creation and promo abuse
  • Search and pricing pages: Scraping and competitive intelligence
  • APIs: Direct programmatic access bypasses your UI entirely

Prioritizing these high-value targets gives you the most protection for the least implementation effort.

Layer your defenses

Effective detection stacks multiple signals together:

  1. Network layer: TLS fingerprinting + IP reputation + header analysis
  2. Device layer: Browser fingerprinting + JavaScript environment checks
  3. Behavioral layer: Mouse, scroll, keystroke, and form-fill analysis
  4. Trap layer: Honeypots and invisible challenge-response verification

Each layer catches bots that might slip through the others. A bot that perfectly spoofs its user agent still gets caught by TLS analysis. One that nails TLS fingerprinting still fails device fingerprint checks. The more layers you stack, the harder and more expensive it becomes for attackers to get through.

Respond proportionally

Not every bot deserves the same treatment. Your response should match the threat:

  • Known good bots (Googlebot, verified monitoring): Allow through, possibly with rate limits
  • Suspicious but uncertain: Serve an invisible challenge, monitor closely
  • Likely bots: Throttle, serve cached/alternate content, or soft-block
  • Confirmed malicious: Block immediately and feed data back into your detection models

Hard-blocking everything that looks automated risks catching edge cases like assistive technology, corporate proxies, or unusual browser configurations. Graduated responses reduce false positives while still stopping real threats.

Log everything, analyze constantly

Every bot interaction is data you can learn from. Comprehensive logging and reporting on bot traffic lets you:

  • Spot new attack patterns before they scale
  • Fine-tune detection thresholds to reduce false positives
  • Measure the effectiveness of your blocking rules over time
  • Build evidence for compliance audits in regulated industries

Teams that treat bot detection as a “set and forget” system always fall behind. The ones that actively monitor and iterate stay ahead.

Keep adapting

Bot detection is an arms race, and 2026 is shaping up to be the most challenging year yet. AI-powered automation has lowered the barrier to entry dramatically. Bot operators no longer just use headless browsers, which are easier to spot. They use full browsers with automation tools, residential proxies that route requests through real people’s internet connections, and AI-generated behavioral patterns that mimic natural mouse movements and typing. Large language models even help bots fill forms with realistic, contextually appropriate data instead of the obvious gibberish that older bots produced.

Your detection needs to evolve at the same pace:

  • Monitor bypass attempts and adjust detection thresholds
  • Track new bot frameworks and automation tools as they emerge
  • Analyze blocked traffic for patterns that indicate evolving techniques
  • Use machine learning models that retrain on fresh data continuously

Static rule sets degrade over time. The detection systems that stay effective are the ones that learn.

Detecting bots with Guardian

Guardian combines 70+ device and browser signals with server-side machine learning to identify every visitor to your site with 99.5%+ accuracy. Its persistent device fingerprinting recognizes visitors across sessions, incognito mode, and cookie clears, making it extremely difficult for bots to disguise themselves as new users.

Here is how to add bot detection to your site in minutes.

1. Install the JavaScript agent

npm install @guardianstack/guardian-js

2. Identify visitors on the client side

Load the agent and collect browser signals when a visitor accesses a protected page. The agent runs asynchronously and does not block page rendering.

import { loadAgent } from '@guardianstack/guardian-js';

// Initialize the agent with your site key
const guardian = await loadAgent({
  siteKey: 'YOUR_SITE_KEY',
});

// Collect browser signals and get a request ID
const { requestId } = await guardian.get();

// Send the requestId to your backend for server-side verification
await fetch('/api/bot-check', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ requestId }),
});

3. Verify on the server and check Smart Signals

Use the server SDK to retrieve the full event data, including bot detection, VPN, tampering, and other Smart Signals.

import {
  createGuardianClient,
  isBot,
  isTampering,
  isVPN,
} from '@guardianstack/guardianjs-server';

const client = createGuardianClient({
  secret: process.env.GUARDIAN_SECRET_KEY,
});

const event = await client.getEvent(requestId);

// Check bot detection and Smart Signals
if (isBot(event) || isTampering(event)) {
  return res.status(403).json({
    error: 'Automated access detected.',
  });
}

if (isVPN(event)) {
  // Flag for additional verification
  return res.status(200).json({
    action: 'challenge',
  });
}

// Legitimate visitor, proceed normally
return res.status(200).json({ action: 'allow' });

The API response includes detailed signals you can use to build custom rules:

{
  "botDetection": {
    "detected": false,
    "score": 0,
    "automationSignalsPresent": false
  },
  "tampering": {
    "detected": true,
    "anomalyScore": 0.55,
    "antiDetectBrowser": false
  },
  "vpn": {
    "detected": false,
    "confidence": "none"
  },
  "velocity": {
    "5m": 3,
    "1h": 17,
    "24h": 68
  }
}

The velocity field is particularly useful for bot detection. A legitimate user might generate 3 requests in 5 minutes. A bot stuffing credentials could generate hundreds. Combined with the bot detection score and tampering signals, you have everything you need to make accurate, real-time decisions.

Start your free trial or talk to our team to see how Guardian protects your site from automated threats.

Frequently asked questions

What is a bad bot?
A bad bot is any automated program that interacts with your website or API in ways that harm your business. This includes scrapers that steal content or pricing data, credential stuffing bots that test stolen passwords, inventory hoarding bots that buy out limited stock, and spam bots that flood forms and comment sections. Bad bots are distinct from good bots like search engine crawlers, which follow your robots.txt rules and serve a legitimate purpose.
How much internet traffic is bots?
According to Imperva's Bad Bot Report, bots accounted for 51% of all internet traffic in 2024, surpassing human traffic for the first time in a decade. Bad bots alone made up roughly 37% of all traffic. By 2026, those numbers are expected to climb further as AI-powered automation tools become more accessible.
Can bots bypass CAPTCHAs?
Yes. Modern bot operators use CAPTCHA-solving services (both AI-powered and human farms) that solve challenges for fractions of a cent in under 10 seconds. Advanced bots can also solve simpler CAPTCHAs natively. CAPTCHAs still add friction for attackers, but they should not be your only line of defense.
Does bot detection slow down my website?
Not when implemented correctly. Client-side detection scripts are lightweight (typically under 20KB) and run asynchronously without blocking page render. Server-side analysis happens in milliseconds. Legitimate users experience zero added latency or friction. Only detected bots encounter challenges or blocks.
What is the difference between bot detection and bot management?
Bot detection identifies whether a request comes from a human or a bot. Bot management is the broader strategy of deciding what to do with that information: block the bot, serve it different content, throttle its requests, or let it through. Detection is the foundation that management is built on.
Share this post
Piero Bassa

Written by

Piero Bassa

Founder & CEO

Piero is the founder of Guardian, building privacy-first device intelligence to help businesses stop fraud and recognize trusted users.

Related articles

Stay in the loop

Get the latest on bot detection, fraud prevention, and device intelligence.

Get started for free

Create your free account today

Starting at $0 for 1,000 requests per month, with transparent pricing that scales with your needs.

Start for free