Bot & Datacenter Detection Database
Identify automated traffic from bots, scrapers, and datacenter infrastructure. A downloadable database of datacenter IP ranges, cloud providers, and known bot signatures you can query locally with zero latency. Pairs with reCAPTCHA and challenge-response flows - use IP intelligence to pre-filter obvious bot traffic before putting real users through a CAPTCHA.
How bot detection works
Most bots operate from datacenter and cloud infrastructure rather than residential ISPs. By identifying the source network of incoming traffic, you can distinguish genuine users from automated scripts with high accuracy.
Datacenter IP Range Mapping
We map IP ranges belonging to major cloud providers (AWS, GCP, Azure, DigitalOcean, Linode, Vultr, OVH, Hetzner) and hundreds of smaller hosting companies. Traffic from these ranges is almost never legitimate end-user traffic.
Known Bot Signature Database
We maintain a database of IP ranges used by known bots - both good (Googlebot, Bingbot, Applebot) and bad (SEO scrapers, content thieves, vulnerability scanners). Each entry is classified so you can allow legitimate crawlers while blocking bad actors.
Cloud Provider Identification
When a request comes from an AWS EC2 instance or a Google Cloud VM, that's a strong signal it's automated. Our database identifies the specific cloud provider so you can make granular decisions about which sources to trust.
Good Bot vs. Bad Bot Classification
Not all bots are harmful. Search engine crawlers, uptime monitors, and feed readers are beneficial. Our database flags each bot as "good" or "bad" so you can build nuanced access policies instead of blocking all automated traffic.
What traffic sources we identify
Major Cloud Providers
AWS, GCP, Azure, Oracle Cloud, IBM Cloud
VPS & Hosting
DigitalOcean, Linode, Vultr, OVH, Hetzner
Search Engine Crawlers
Googlebot, Bingbot, Applebot, YandexBot
SEO Tool Bots
AhrefsBot, SemrushBot, MJ12bot, DotBot
Monitoring Services
UptimeRobot, Pingdom, StatusCake
Content Scrapers
Known scraping infrastructure and IP pools
What's included in the database
The Bot & Datacenter database is delivered as CSV and JSON files. Each record maps an IP range to its hosting provider, bot classification, and risk level.
| Field | Type | Description |
|---|---|---|
| ip_start | string | Start of the IP range (IPv4 or IPv6) |
| ip_end | string | End of the IP range |
| provider | string | Name of the hosting/cloud provider (e.g., AWS, DigitalOcean, OVH) |
| type | enum | Classification: datacenter, cloud, hosting, crawler, known_bot |
| bot_name | string | Name of the known bot if identified (e.g., Googlebot, AhrefsBot, SemrushBot) |
| is_good_bot | boolean | Whether this is a legitimate crawler (search engines, monitoring tools) |
| country | string | Country code of the datacenter or hosting provider (ISO 3166-1 alpha-2) |
| risk_level | enum | Risk classification: low, medium, high, critical |
| last_seen | date | Date the IP was last confirmed as belonging to a datacenter or bot |
Use cases for bot detection
Automated traffic accounts for nearly half of all web traffic. Knowing which requests come from bots lets you protect your content, infrastructure, and revenue.
Content Scraping Defense
Detect and block scrapers running on cloud infrastructure that steal your pricing data, product catalogs, articles, or proprietary content for competitive advantage.
Credential Stuffing Prevention
Identify login attempts originating from datacenter IPs - a strong signal that automated tools are being used to test stolen username/password combinations against your authentication system.
Form Spam Filtering
Block automated form submissions from bots that pollute your contact forms, comment sections, and registration flows with spam content and phishing links.
Ad Fraud & Click Fraud Detection
Filter out non-human traffic from your analytics and advertising platforms. Ensure that ad clicks, impressions, and conversion events come from real users, not bots.
Inventory & Checkout Protection
Prevent scalper bots from hoarding limited-edition products, concert tickets, or flash-sale inventory. Detect datacenter-origin traffic before it reaches your checkout flow.
Good Bot Management
Distinguish between legitimate crawlers (Googlebot, Bingbot) and malicious bots. Allow search engines through while blocking everything else from datacenter IPs.
Detect bots with a local database lookup
Import the datacenter and bot database into your preferred data store. A single IP lookup tells you whether traffic is from a datacenter, and whether it's a known good or bad bot.
prepare(
"SELECT provider, type, bot_name, is_good_bot, risk_level
FROM bot_ips
WHERE ip_start <= INET6_ATON(:ip)
AND ip_end >= INET6_ATON(:ip)
LIMIT 1"
);
$stmt->execute(['ip' => $ip]);
return $stmt->fetch(PDO::FETCH_ASSOC) ?: null;
}
$ip = $_SERVER['REMOTE_ADDR'];
$bot = detect_bot($pdo, $ip);
if ($bot) {
if ($bot['is_good_bot']) {
// Allow legitimate crawlers (Googlebot, Bingbot, etc.)
// Optionally serve a simplified page for faster crawling
header('X-Bot-Detected: good');
} else {
// Datacenter or bad bot traffic
if ($bot['risk_level'] === 'critical') {
http_response_code(403);
exit('Automated access is not permitted.');
}
// Medium risk: serve a CAPTCHA challenge
header('X-Bot-Detected: suspicious');
require_captcha();
}
} import sqlite3
import ipaddress
from fastapi import FastAPI, Request, HTTPException
app = FastAPI()
DB_PATH = "antiproxies_bots.db"
def check_bot(ip: str) -> dict | None:
"""Look up an IP in the local bot/datacenter database."""
conn = sqlite3.connect(DB_PATH)
conn.row_factory = sqlite3.Row
ip_int = int(ipaddress.ip_address(ip))
row = conn.execute(
"""SELECT provider, type, bot_name, is_good_bot, risk_level
FROM bot_ips
WHERE ip_start <= ? AND ip_end >= ?
LIMIT 1""",
(ip_int, ip_int)
).fetchone()
conn.close()
return dict(row) if row else None
@app.middleware("http")
async def bot_detection_middleware(request: Request, call_next):
client_ip = request.client.host
bot = check_bot(client_ip)
if bot and not bot["is_good_bot"]:
if bot["risk_level"] in ("high", "critical"):
raise HTTPException(status_code=403, detail="Automated access blocked.")
# Flag suspicious traffic for logging
request.state.bot_detected = True
response = await call_next(request)
return response Related reading
The Rise of AI-Powered Bots: What's Changed in 2026
How AI-driven bots evade traditional defenses and what still works against them.
BlogWhy CAPTCHAs Alone Won't Stop Bots
The economics of CAPTCHA solving and why layered detection works better.
BlogThe Hidden Cost of Bot Traffic to Your Business
How bots distort analytics, drain ad spend, and inflate infrastructure costs.
BlogHeadless Browser Detection: How to Spot Automation Behind a Real Browser
How Puppeteer and Playwright evade detection - and what signals still work.
Want to see what's in the database?
Download once, query as many times as you need. €99/year for all 22 databases, unlimited servers, and a full year of monthly updates. No usage limits, no per-query fees, no data leaving your servers.