homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

Strange Googlebot Spidering Activity
Why is Googlebot getting a 404

 2:16 pm on May 6, 2005 (gmt 0)

Recently one of my sites suddenly started to perform very badly in Google. I decided to have a look at the server logs, and I'm seeing some very strange behaviour.

Googlebot turns up and requests my homepage several times during the course of a day. However, when it requests the page, a HTTP code of 404 is frequently being returned. It comes back several hours later, requests the homepage again, but this time a HTTP code of 200 is returned.

There seems to be a loose correlation between the time and the HTTP code returned - 404s seems to be returned early morning (between 6-11), whereas the rest of the time a 200 is returned.

I don't understand why this is happening. As far as I'm aware there has been no server downtime, and even if the server was down, then it wouldn't be possible to record the log.

I've had a look at the homepage through a HTTP viewer and a 200 is always returned. I used Firefox to view the page as Googlebot, and the page is fine.

What's even weirder, is that Slurp appears to have no problems with the site. It requests the homepage during the same time periods as Googlebot, and a 200 is always returned.

Has anyone ever seen anything like this?



 10:28 am on May 9, 2005 (gmt 0)


I've never seen that myself.

Very interesting though.



 10:53 am on May 9, 2005 (gmt 0)

What type of web server are you using?


 11:25 am on May 9, 2005 (gmt 0)



 11:27 am on May 9, 2005 (gmt 0)

Are you sure that the requested page has an absolutely identical URL when you get a 404 served?


 11:37 am on May 9, 2005 (gmt 0)

IIS? If you can't find a rational reason for it, you might consider switching servers away from Microsoft. Recall the Opera thing they did? I wouldn't put it past MS to throw Google a few 404's.


 12:36 pm on May 9, 2005 (gmt 0)

Yes, it definitely gets a 404 on the homepage - index.asp - which if i copy and paste into my broswer gives me the homepage.


 12:48 pm on May 9, 2005 (gmt 0)

Try the Poodle Predictor once and see what happens.

Switching servers, obviously, is not exactly practical.


 1:02 pm on May 9, 2005 (gmt 0)

This can be caused by the www subdomain not being setup properly and that may be a possibility.

You need a 301 redirect to point example.com to www.example.com (or the other way around).

One single inbound link without the "www" subdomain in the URL would create what you're seeing in the logs.

Type http*://yourdomain.com in your browser and see what you get.



 2:03 pm on May 9, 2005 (gmt 0)

Poodle Predictor worked fine on the site.

trillianjedi - I think you're spot on - browsing the site without the www gave me a 404 error. I'll sort that out and see what happens.

Thanks for your help.


 2:11 pm on May 9, 2005 (gmt 0)

browsing the site without the www gave me a 404 error

Yes, that'll be the problem then. Someone has linked to you without the "www" and googlebot is following the link.

Can't help you with a 301 in iis I'm afraid - I suggest you head over to the Website Technology Issues forum to see if you can find out how it's done.



 2:13 pm on May 9, 2005 (gmt 0)



 2:25 pm on May 9, 2005 (gmt 0)

There you go - well hunted bcolflesh ;-)

I should just add that this is probably not the reason your site is performing badly in G, although it certainly won't do any harm to fix it and you definitely should - many of your repeat visitors will type in the URL without the WWW and think you've vanished.

Check your own internal link structure while you're about it. Correct any internal links you have to point to the "main" domain (either with or without the WWW sub - whatever you decide to do).



 2:47 pm on May 9, 2005 (gmt 0)

If you serve both non-www and www then you have duplicate content. Google will attach some pages to non-www and others to www each time, and these will vary on a random basis. The other version may appear as a URL-only listing or may not appear at all.

PR passed from www.domain.com/page1.html to www.domain.com/page2.html will be "lost" if for page 2 it is only domain.com/page2.html that is listed in the SERPs.

Use a 301 redirect to fix this. It will help a lot.

Additionally, when you link to an index file inside a folder, make sure that you use only the folder name followed by a trailing / on the URL. Do not include the actual filename of the index file in the link.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved