---
title: AI Crawler Allowlist
category: product
entity_type: skill
price: $15
canonical: https://forgehouse.ai/skills/ai-crawler-allowlist/
lang: en
hreflang_alt: https://forgehouse.ai/tr/skiller/ai-crawler-allowlist/
last_updated: 2026-06-20
---

# AI Crawler Allowlist

> AI crawler User-Agent allowlist + robots.txt + ai.txt + Cloudflare WAF / nginx policy +…

A complete User-Agent allowlist and edge-enforcement system covering 24+ AI crawlers across a three-tier reputation matrix. It ships ready-to-use robots.txt, ai.txt, Cloudflare Worker and nginx configs with reverse-DNS verification, so you give the right AI bots access while rate-limiting or blocking the bandwidth-hungry ones. Critically, it treats the User-Agent header as an unverified claim and forces reverse-DNS plus official IP-range checks before granting trust.

## Use cases
- Open robots.txt and ai.txt so ChatGPT, Claude and Perplexity can cite you
- Speed up AI recrawl after a major content relaunch
- Block aggressive crawlers that cost bandwidth but deliver no citations
- Stop real-time user-query bots from ever hitting a rate limit
- Diagnose why an AI engine isn't citing you (leftover Disallow rules)
- Enforce GDPR/KVKK AI-training opt-out on paid or user-generated content

## Benefits
- Become citable in AI search by giving verified bots clean, fast access
- Defend against User-Agent spoofing with mandatory reverse-DNS verification
- Keep four signals (robots.txt, ai.txt, WAF, edge) in sync so policy never contradicts itself
- Trim recurring bandwidth cost by tiering bots instead of one-size-fits-all limits

## What’s included
- 24+ bot master list across Tier 1 (high value), Tier 2 (conditional), Tier 3 (block)
- Ready robots.txt and ai.txt with per-bot allow/disallow and license signals
- Cloudflare Worker with tiered token-bucket rate limits and DoH reverse-DNS
- nginx map-block + limit_req config mirroring the same tier matrix
- Reverse-DNS verification shell script with official IP-range cross-check
- Real-time user-query bot bypass so ChatGPT-User/Claude-User never get 429'd

## Who it’s for
Site owners and SEO teams who want AI search visibility without leaking bandwidth to spoofers or worthless crawlers.

## How it runs
Not every AI bot deserves the same door. The skill sorts 24+ crawlers into reputation tiers, then keeps robots.txt, ai.txt and the edge rules all telling the same story:
1. Sorts every known AI bot into the 3-tier reputation matrix: TIER 1 high value (GPTBot, ClaudeBot, PerplexityBot, Google-Extended and 7 more), TIER 2 conditional (CCBot, Amazonbot, AppleBot-Extended), TIER 3 low value or suspect (Bytespider, Diffbot), 24+ bots total.
2. Writes the four enforcement layers in sync: robots.txt allow/disallow blocks, ai.txt training opt-in/opt-out, the edge worker or nginx tier rules, and rate-limit zones (100, 30 and 10 requests per minute by tier). One layer contradicting another is the classic failure it prevents.
3. Verifies every UA claim with a reverse-DNS lookup plus the vendor's official IP range JSON; a failed check downgrades the bot to TIER 3 instead of trusting the header.
4. Exempts real-time user-query bots (ChatGPT-User, Claude-User, Perplexity-User) from rate-limits entirely, because a 429 there means the user waiting for an answer never sees your brand cited.
5. Blocks or throttles bandwidth-heavy, zero-citation crawlers like Bytespider, and keeps sensitive paths (/admin, /staging, /customer) closed to every bot.
6. Runs the live verification (a GPTBot UA curl must return 200, a Bytespider curl must return 403) and a monthly DarkVisitors registry diff so new bots get a tier assignment before they hit your origin.

## FAQ
### Do I just drop the robots.txt and ai.txt in, or is there more to it?
The files are ready to use, so opening access to ChatGPT, Claude and Perplexity is a drop-in. The edge-enforcement layer is the part that actually acts on aggressive crawlers, and that you deploy in front of your site.

### robots.txt is only a request a crawler can ignore, so how does this stop the bad ones?
That is the gap the edge-enforcement layer closes, acting on requests rather than politely asking. The robots and ai.txt files signal intent; the edge rules enforce it.

### Does the allowlist tell a real ChatGPT crawler from one spoofing its name?
The allowlist works from the three-tier reputation matrix and User-Agent identity. Proving a claimed crawler is genuinely that bot, against a spoofer, is what a dedicated classifier handles.

## Price
$15, one-time, no subscription. VAT included.

Related guide: [How to automate SEO and AEO with Claude](https://forgehouse.ai/guides/automate-seo-claude/)
