Just got hit by a poorly-behaved bot, apparently something to do with this, which I’ve banned outright. Let’s look at what it did wrong to deserve this:
- It doesn’t have a User-Agent string.
- It doesn’t send a valid Accepts header (it was trying to retrieve an RSS file and got HTML).
- It doesn’t recognize <!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.1//EN” “http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd”>, instead attempting to parse it like a link.
- It doesn’t attempt to throttle its requests or recognize duplicates or errors, making over 30 requests in 8 seconds to 6 URLs, only two of which are valid at all or referred to in links from the files it did retrieve.
it’s a prototype, but thanks for the feedback
Still alive and still misbehaving the same way as of July 21-23. Blocked from the new domain too.