Ever wanted to mess with people scanning the web for vulnerabilities? I certainly did. This is the story how I found a way to punish them, then used Rust to improve it, and then killed my web server using a van.
有没有想过给那些总是利用网络漏洞搞爬虫的人添点堵?作者做到了。这个故事是如何找到一种惩罚他们的方式,然后使用 Rust 来改进它,然后用van干崩溃了服务器的故事。
Step 0: Getting Annoyed 生气
Alright, so if you’ve ever run a website at any scale, and happen to look at the access logs, you will soon find that a lot of requests coming in has nothing to do with your website. A lot of them instead look at paths like /wp-login.php
, /.env
and /.git/config
. Turns out a lot of different people want to either steal your database password or try to login to your WordPress site. While not surprising, it is a bit annoying when you try to check stats of your site.
好吧,如果你曾经运行过任何规模的网站,并且碰巧查看了访问日志,你很快就会发现很多请求与你的网站无关。他们中的很多人反而会看像 /wp-login.php
, /.env
和 /.git/config
这样的路径。事实证明,很多不同的人想要窃取您的数据库密码或尝试登录您的WordPress网站。虽然这并不奇怪,但当你试图检查网站的统计数据时,这有点烦人。
This is of course an automated process (or well, some maniac might do this manually, it’s a big internet after all). It won’t help updating your /robots.txt
(a file describing how bots are allowed to check your website), because no self-respecting password-stealing bot would ever bother to read it. However, big companies like Google do respect this file (with some exceptions). Could we somehow use this to our advantage?
这当然是一个自动化的过程(或者,一些疯子可能会手动完成,毕竟林子大了什么鸟都有)。它不会帮助更新你的 /robots.txt
,因为密码窃取机器人不会费心去读它。然而,像谷歌这样的大公司确实使用到这个文件(除了一些例外)。能不能利用这一点?
Step 1: Finding the Gates of Hell
第一步:寻找地狱之门
In looking into ways to mess with our annoying bot friends I stumbled upon HellPot, an HTTP honeypot designed to crash bots attempting to scrape a website by simply giving them what the asked for. Any HTTP request to HellPot on specified paths (like the aforementioned /wp-login.php) will be met with an eternal stream of data from The Birth of Tragedy (Hellenism and Pessimism) by Friedrich Nietzsche, that kind of looks like a website. We just make sure to put the same paths in our robots.txt to avoid bingbot experiencing Nietzsche at several MB/s.
在研究如何给这些烦人的机器人制造障碍时,作者偶然发现了HellPot,这是一个HTTP蜜罐技术,旨在通过简单地给他们所要求的内容来崩溃试图抓取网站的机器人。
任何在指定路径(如前面提到的 /wp-login.php
)上对HellPot的HTTP请求都会遇到无穷尽数据流,作者给他们来了一套尼采的《悲剧的诞生》全文;这些内容看起来像是一个真实的网站上的。只是确保在 robots.txt
中放置相同的路径,以避免bingbot以过快的速度获取这些“悲剧”。