Welcome to WebmasterWorld Guest from 23.20.230.24

Message Too Old, No Replies

Increased crawl activity to non-existent pages, concern?

     
10:54 am on Apr 7, 2013 (gmt 0)

WebmasterWorld Senior Member sgt_kickaxe is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Several months ago I worked over my site .htaccess file to eliminate the possibility of returning a 200 response for malformed urls. If a page doesn't exist a 404 response is the result, the site no longer makes a best guess redirect. Over the years this site had picked up countless links containing malformed urls, it was due for a cleanup.

As expected the number of 404 errors reported in GWT shot up and I diligently went through the list regularly and cleared it. After a couple of months of doing this daily, on March 15th to be specific, I stopped logging into GWT. Monitoring these 404 errors in GWT had become a chore without benefit. On March 23rd I received a 404 error email warning and every day since the crawl rate has been 3x normal.

All of the previously cleared 404 errors are back. My concern is that when I was clearing the errors the crawl rate was normal but now that I'm not it's 3 times higher than normal. Should I go back to GWT and clear the errors when 404 is the right response? I'm inclined to let Google get a good dose of the 404's and, eventually, stop crawling them as they weren't intended to exist in the first place.

Clearing 404 errors in GWT can seemingly return the crawl rate to normal, but is it a good idea? Traffic, and rankings, remain unchanged.
12:37 pm on Apr 7, 2013 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I'd be inclined to "let them figure it out" for a couple of months. The increased crawl rate means they are going to discover a lot of these dead URLs and clean up their database.

One thing I have found useful on occasions is to add a chunk of PHP that makes a note of every 404 served to Google from the CMS and next time they or anyone else asks for the same URL, return 410 every time thereafter.
1:08 pm on Apr 7, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Huh. When I saw the subject line, I thought it was going to be about something I've noticed unusually often in recent weeks:

66.249.73.132 - - {timestamp} "GET /paintings/tundra/lenkoljgkuhz.html HTTP/1.1" 404 1302 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.75.169 - - {timestamp} "GET /fonts/qttlasvciisxs.html HTTP/1.1" 404 1302 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.76.169 - - {timestamp} "GET /paintings/catsrats/oectnbcxczuzirt.html HTTP/1.1" 404 1292 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

That is: the kind of requests you get when the search engine suspects you've been throwing soft 404s (which I've never done, and I haven't done an unusual amount of redirecting). Either that, or the googlebot has a cat.

Or for variety's sake

66.249.75.169 - - {timestamp} "GET /value.png HTTP/1.1" 404 1302 "-" "Googlebot-Image/1.0"


Huh what? I've never had image files flopping around loose in the top level-- and I don't even own a png with this name! (I just checked. Not one on my entire HD.)

In the last few days they've also asked for several pages that have been 410 since January of 2012. You'd expect that of bing, who like 410s almost as much as they like robots.txt so they keep coming back for more. Google tends to get bored and go away after two or three doses.

It makes me nervous when g### does something different. And I still haven't figured out what that blasted snippetbot wants, let alone why it wants it so many times.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month