Forum Moderators: phranque

Message Too Old, No Replies

Proposed New Apache Result Code 666

Spider Go To Hell :)

         

incrediBILL

9:17 pm on Apr 2, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Since there is nothing official in the Apache codes to tell spiders they are completely unwelcome and should NEVER return, I propose a new result code '666' to tell unwanted crawlers to simply 'Go To Hell' and never come back :)

I'm only half joking here with the 666 number as many sure don't seem to understand that 403 forbidden on the root of the domain means they are forbidden from the entire domain.

Some also don't seem to get that being blocked in robots.txt also means keep out.

In particular, I'm getting a small attack from what appears to falsely claim to be "ia_archiver" from China [webmasterworld.com] lately and it's being blocked both in robots.txt and gets a 403 forbidden from the entire domain yet it keeps cranking up the number of requests per day. Not like it's a real DDoS or anything, but the volume and number of new IPs it's coming from daily is quite distressing as it appears it could become a real problem. Obviously I could just drop China in the firewall on the server, which I've done on other servers, but I'm trying to avoid that on this particular box.

Anyway, I'm thinking we need to propose a new code that literally states in no uncertain terms "GO AWAY, STAY AWAY, AND NEVER RETURN". Wondering if any of them honor "retry-after" as I could give them a nice retry number like "31536000" which would be about a year.

In the "amusing myself" further category, I'm also considering trying a "402 Payment Required" with instructions and a link to PayPal and see if anyone ever pays to gain access bad enough to pay for it. Maybe offer access at the rate of $0.01 per page for a $1 access payment, or 3 pages per $0.01 for a $10 payment, etc. IMO this is a far better solution than a 403 or my proposed 666 even. If they want it, pay to get it, or go away :)

Leosghost

9:26 pm on Apr 2, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That would tell them that they should be "slouching their way towards Bethlehem"..by inference :)

lucy24

10:22 pm on Apr 2, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I want a code that says "This page has moved. Get it? Here's its new URL. Notice how the title, the filename, the content, and all subsidiary material are the same. It has MOVED. I'm not just pointing you to some random page to avoid a 404. IT'S THE SAME PAGE. If you just saw it in Cleveland, your brother will not see it half an hour from now in El Paso."

incrediBILL

6:14 am on Apr 4, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Seriously, there needs to be a code that says NEVER COME BACK.

I'm getting an escalating number of the same bots hitting multiple times per day, every single day, although they are being told to go away every time they visit.

How many bots get on this bandwagon before it ramps up to thousands of hits per day just telling the same idiots to go away?

Come on, there has to be a code for this!

phranque

7:53 am on Apr 4, 2012 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Wondering if any of them honor "retry-after"...

according to the HTTP protocol the Retry-After response header is not indicated for 403 status codes.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.37:
The Retry-After response-header field can be used with a 503 (Service Unavailable) response to indicate how long the service is expected to be unavailable to the requesting client. This field MAY also be used with any 3xx (Redirection) response to indicate the minimum time the user-agent is asked wait before issuing the redirected request.

g1smd

9:03 am on Apr 4, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Seriously, there needs to be a code that says NEVER COME BACK.

Set this code for all users.

Sell the domain.

How does the new owner undo this action?

"Never" is too long.

lucy24

10:10 am on Apr 4, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



"Domain for Sale. Buyer must have deep-seated and ineradicable antipathy to Ukrainian robots, the entire nation of China, and all User Agents containing the string 'facebook'."

:: vague mental association with consequences of trying to get ex out of one's hair by saying "Ask me in six months" due to soft-hearted or -headed feeling that "When hell freezes over" is too harsh ::

! And what if the other domain gets sold? One day it's a slimy robot from Belarus, the next day it's a German university with unimpeachable reputation-- but they still can't read your definitive article on 18th-century widgets.

Some robots seem to react more powerfully to 127.0.0.1 than to a simple [F]. I remember it had a dramatic effect on my Ukrainians. They're still around, but never fully recovered their strength.

incrediBILL

11:37 am on Apr 4, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



according to the HTTP protocol the Retry-After response header is not indicated for 403 status codes.


@phrangue you're too literal, I knew that, and I don't think most bots even deal with it on a 503 either, just using it as an example of what I'd like.

Checking robots.txt up to 10+ times in a single day, or being given dozens of 403s when requesting the domain index page repeatedly, it's just silly.

Maybe a code for "come back in x days" would work except every x days your server suddenly gets a mini-DDoS of all the bots you told to go away for x days.

StoutFiles

12:21 pm on Apr 4, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Seriously, there needs to be a code that says NEVER COME BACK.

I'm getting an escalating number of the same bots hitting multiple times per day, every single day, although they are being told to go away every time they visit.


As if they would listen. More and more bots are ignoring the almost worthless robots.txt file.

incrediBILL

7:05 am on Apr 5, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



More and more bots are ignoring the almost worthless robots.txt file.


The easy solution is to back up and reinforce those rules in .htaccess

Come for the robots.txt, stay for the .htaccess 403 forbidden