---
title: AI Bot UA Classifier
category: product
entity_type: skill
price: $15
canonical: https://forgehouse.ai/skills/ai-bot-ua-classifier/
lang: en
hreflang_alt: https://forgehouse.ai/tr/skiller/ai-bot-ua-classifier/
last_updated: 2026-06-20
---

# AI Bot UA Classifier

> Real-time bot classification engine for AI crawler traffic.

A real-time AI bot classification engine that proves whether traffic claiming to be GPTBot, ClaudeBot or ChatGPT-User actually is. It combines four-layer fingerprinting (User-Agent, IP range, ASN cross-check, reverse-DNS) with behavioral anomaly detection and Bayesian reputation scoring, deployed both as a sub-50ms edge classifier and a batch log re-classifier. Because a wrong call means either blocking a legitimate AI bot (lost citations) or burning bandwidth on a spoofer, every threshold is deliberately allow-biased.

## Use cases
- Verify that ChatGPT-User referral traffic is real and not a UA spoofer
- Detect aggressive crawlers burning bandwidth with zero citation value
- Close the measurement loop on an existing crawler allowlist policy
- Baseline AI bot traffic before a large multilingual site launch
- Flag behavioral anomalies like admin-path probing and request bursts
- Run monthly re-classification to catch new bots and behavior drift

## Benefits
- Stop trusting User-Agent claims: confirm bot identity with 4 independent signals
- Protect AEO citation revenue by never blocking a legitimate AI crawler
- Cut wasted bandwidth spend by tiering and rate-limiting low-value bots
- Get an auditable, replayable reason for every block, rate-limit or allow decision

## What’s included
- Cloudflare Worker edge classifier with KV-cached IP ranges and behavior buckets
- Python batch re-classifier for nginx and Cloudflare Logpush JSON logs
- Bayesian posterior scoring with per-bot priors and weighted signal fusion
- Tier + confidence + verdict output (ALLOW / RATE_LIMIT / FLAG / BLOCK)
- Cohort drift detection via chi-square test against last month's distribution
- Structured JSON audit schema with signal-by-signal breakdown and retention policy

## Who it’s for
Teams running AEO/GEO programs who need to measure and trust their AI bot traffic instead of guessing from raw User-Agent strings.

## How it runs
A User-Agent header is a claim, not an identity. The classifier tests each one against three independent signals before any allow, throttle or block verdict lands:
1. Parses each request's User-Agent against the known-bot registry (kept in sync with DarkVisitors) to extract the claimed identity: GPTBot, ClaudeBot, Bytespider and 20+ more.
2. Cross-checks the source IP against the vendor's official IP range (openai.com/gptbot.json and equivalents), the ASN (OpenAI is AS396982) and a reverse-DNS PTR lookup, three independent identity signals beyond the UA claim.
3. Aggregates behavior from the IP's last 100 requests: did it fetch robots.txt, did it probe /admin, is the request rate stable, what is the 2xx/3xx success ratio.
4. Combines the prior (how credible the UA claim is) with the fingerprint likelihood into a Bayesian posterior, then assigns TIER 1 allow, TIER 2 rate-limit, TIER 3 flag or spoof block. The block threshold is deliberately conservative (confidence above 0.9) because blocking a real AI bot costs you citations.
5. Writes a signal-by-signal audit trace for every verdict, then reruns the whole window as a nightly batch over access logs.
6. Runs a monthly registry diff plus a chi-square cohort drift test, so a new bot or a behavior shift in an existing one raises an alarm instead of silently skewing your AEO numbers.

## FAQ
### Where does this run, and does it need a particular CDN?
It is built to run at the edge so classification happens in the request path, in real time. You deploy it in front of your traffic rather than parsing server logs after the fact.

### User-Agents are trivial to fake, so how does this actually prove a bot is who it claims?
A UA string alone is exactly what it does not trust, which is why it cross-checks IP range, ASN and reverse-DNS on top of the User-Agent. A real GPTBot has to pass all four layers, not just send the right name.

### Does it block the spoofers, or only identify them?
It classifies and scores traffic with a reputation signal; it tells you what is real. Enforcing a block or allow decision is a separate layer, the job of an allowlist system.

## Price
$15, one-time, no subscription. VAT included.

Related guide: [How to automate SEO and AEO with Claude](https://forgehouse.ai/guides/automate-seo-claude/)
