homepage Welcome to WebmasterWorld Guest from 54.82.229.76
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Website
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Increased crawl activity to non-existent pages, concern?
Sgt_Kickaxe




msg:4562323
 10:54 am on Apr 7, 2013 (gmt 0)

Several months ago I worked over my site .htaccess file to eliminate the possibility of returning a 200 response for malformed urls. If a page doesn't exist a 404 response is the result, the site no longer makes a best guess redirect. Over the years this site had picked up countless links containing malformed urls, it was due for a cleanup.

As expected the number of 404 errors reported in GWT shot up and I diligently went through the list regularly and cleared it. After a couple of months of doing this daily, on March 15th to be specific, I stopped logging into GWT. Monitoring these 404 errors in GWT had become a chore without benefit. On March 23rd I received a 404 error email warning and every day since the crawl rate has been 3x normal.

All of the previously cleared 404 errors are back. My concern is that when I was clearing the errors the crawl rate was normal but now that I'm not it's 3 times higher than normal. Should I go back to GWT and clear the errors when 404 is the right response? I'm inclined to let Google get a good dose of the 404's and, eventually, stop crawling them as they weren't intended to exist in the first place.

Clearing 404 errors in GWT can seemingly return the crawl rate to normal, but is it a good idea? Traffic, and rankings, remain unchanged.

 

g1smd




msg:4562344
 12:37 pm on Apr 7, 2013 (gmt 0)

I'd be inclined to "let them figure it out" for a couple of months. The increased crawl rate means they are going to discover a lot of these dead URLs and clean up their database.

One thing I have found useful on occasions is to add a chunk of PHP that makes a note of every 404 served to Google from the CMS and next time they or anyone else asks for the same URL, return 410 every time thereafter.

lucy24




msg:4562348
 1:08 pm on Apr 7, 2013 (gmt 0)

Huh. When I saw the subject line, I thought it was going to be about something I've noticed unusually often in recent weeks:

66.249.73.132 - - {timestamp} "GET /paintings/tundra/lenkoljgkuhz.html HTTP/1.1" 404 1302 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.75.169 - - {timestamp} "GET /fonts/qttlasvciisxs.html HTTP/1.1" 404 1302 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.76.169 - - {timestamp} "GET /paintings/catsrats/oectnbcxczuzirt.html HTTP/1.1" 404 1292 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

That is: the kind of requests you get when the search engine suspects you've been throwing soft 404s (which I've never done, and I haven't done an unusual amount of redirecting). Either that, or the googlebot has a cat.

Or for variety's sake

66.249.75.169 - - {timestamp} "GET /value.png HTTP/1.1" 404 1302 "-" "Googlebot-Image/1.0"

Huh what? I've never had image files flopping around loose in the top level-- and I don't even own a png with this name! (I just checked. Not one on my entire HD.)

In the last few days they've also asked for several pages that have been 410 since January of 2012. You'd expect that of bing, who like 410s almost as much as they like robots.txt so they keep coming back for more. Google tends to get bored and go away after two or three doses.

It makes me nervous when g### does something different. And I still haven't figured out what that blasted snippetbot wants, let alone why it wants it so many times.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved