Blocking AI crawlers with Caddy
Blocking AI crawlers with Caddy
I was reading the reddit thread on Claude AI crawlers effectively DDOSing Linux Mint forums https://libreddit.lunar.icu/r/linux/comments/1ceco4f/claude_ai_name_and_shame/
and I wanted to block all ai crawlers from my selfhosted stuff.
I don't trust crawlers to respect the Robots.txt but you can get one here: https://darkvisitors.com/
Since I use Caddy as a Server, I generated a directive that blocks them based on their useragent. The content of the regex basically comes from darkvisitors.
Sidenote - there is a module for blocking crawlers as well, but it seemed overkill for me https://github.com/Xumeiquer/nobots
For anybody who is interested, here is the block_ai_crawlers.conf I wrote.
(blockAiCrawlers) {
@blockAiCrawlers {
header_regexp User-Agent "(?i)(Bytespider|CCBot|Diffbot|FacebookBot|Google-Extended|GPTBot|omgili|anthropic-ai|Claude-Web|ClaudeBot|cohere-ai)"
}
handle @blockAiCrawlers {
abort
}
}
# Usage:
# 1. Place this file next to your Caddyfile
# 2. Edit your Caddyfile as in the example below
#
# ```
# import block_ai_crawlers.conf
#
# www.mywebsite.com {
# import blockAiCrawlers
# reverse_proxy * localhost:3000
# }
# ```
You're viewing a single thread.
I got meaner with them :3c
66 2 ReplyI just want you to know that was an amazing read, was actually thinking "It gets worse? Oh it does. Oh, IT GETS EVEN WORSE?"
20 0 Replylmao that means a lot, thanks <3
4 0 Reply
The nobots module I've linked bombs them
10 0 ReplySuggestion at the end:
<a class="boom" href="https://boom .arielaw.ar">hehe</a>
Wouldn't it destroy GoogleBot (and other search engine) those making your site delisted from Search?
9 0 ReplyIn dark mode, the anchor tags are difficult to read. They're dark blue on a dark background. Perhaps consider something with a much higher contrast?
Apart from that, nice idea - I'm going to deploy the zipbomb today!
9 1 Replynice catch, thanks (i use light mode most of the time)
5 0 Reply
This is one of the best things I've ever read.
I'd love to see a robots.txt do a couple safe listings, then a zip bomb, then a safe listing. It would be fun to see how many log entries from an IP look like get a, get b, get zip bomb.... no more requests.
6 0 ReplyI really like your site's color scheme, fonts, and overall aesthetics. Very nice!
6 1 ReplyI agree, it's readable and very cute!
3 1 Reply
That’s devilishly and deliciously devious.
4 0 Reply