Forum Moderators: open

Message Too Old, No Replies

please use the back door

         

lucy24

6:22 pm on Jul 31, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So help me. I cross-checked the raw logs and the error logs. They explicitly asked for this page. (It's my host's default name for a custom 403 page.) It no longer exists-- a relic of a former directory configuration. But hm, if people are going to go around asking to see it, maybe I need to return a 410.

95.108.158.238 - - [31/Jul/2011:00:55:52 -0700] "GET /games/forbidden.html HTTP/1.1" 404 561 "-" "Mozilla/5.0 (compatible; YandexImages/3.0; +http://yandex.com/bots)"


Error logs helpfully tell me that they were not allowed to see the resulting 404 page ("missing.html").

>> smiley I would use if we were allowed to (no worries, mods, it's a public site that allows hotlinking): [cosgan.de...] <<

g1smd

7:08 pm on Jul 31, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The 404 says there's nothing to see.

However, return 410 if you want: it is technically "more correct" to do so.

tangor

8:59 pm on Jul 31, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



G does understand 410, but in months following they still try to find it... again and again and again... Google (Bing, too) NEVER forget a URL they have met...

Want your logs to mean something? Make copy (log.1) Parse log.1 to strip any 4xx status results and go from there. Saves a lot of hair pulling. I went that way too late, now bald of pate...

lucy24

12:45 am on Aug 1, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Fortunately g### does not know about this particular 403 page, having never met it. I should say
:: pointedly not looking at g1::
this particular url, since the page itself-- and its brother 404-- were simply duplicates of my regular 403 and 404 pages.

Yandex knows it extremely well, because they saw it on a regular basis for several months. Somewhere along the line their robot must have gotten terminally confused and decided that "forbidden.html" was the actual name of the file they were trying to get :)

Oh, and it's the 403 that says "nothing to see" :-P The 404 says "Whoops!" and the 500 says "Ouch!" The one benefit to excessive messing with your htaccess is that sooner or later you get to see first-hand that your custom 500 page works as intended.

It just occurred to me that Yandex's robot ended up seeing the very page it wanted, though in a different location. Since /games/403 no longer exists, it was served a 404, and since it is barred from the 404 page along with everything else, the server would have had to fall back on the regular 403. Which, as noted above, is identical to the one it asked for.

Hee, hee.

Edit:
The 40x's and 301s are actually the very first thing I purge from my logs, but I eyeball them as I go-- and it's not every day you see a user request for "forbidden.html". They're followed by authorized robots, links to subordinate files (like images), and visits from me (an advantage to using a highly uncommon browser). I've got it down to maybe 90, 95% humans by the time I take a closer look.

Come to think of it, I'm surprised g### hasn't started indexing custom error pages. Surely they can find where they live...