Forum Moderators: open
I recently did a total "facelift" of my site. Added a bunch of new pages - most of these were in a new directory - and deleted a few old pages.
Today, Googlebot has been crawling the heck out of it. I lost track of how many times she went thru all the pages - like she's stuck in a loop or something.
The "weird" thing is, Googlebot keeps requesting all the old page names (the majority of which are still there), including the deleted ones - repeatedly.
Also, I added a new directory containing lots of new content, and I want it to get crawled, but so far that's a no go. That directory is linked several times via FAQ page questions from the FAQ page in my root directory which HAS been crawled - repeatedly - this morning.
Is this normal? Does it usually take more than one crawl to get subdirectories of the root spidered?
Also, is it normal to get stuck in this non-ending crawl circle? I didn't notice this behavior - at least on my site - until after I did the facelift.
Just curious,
Jenny
It sounds like you are being visited by the "freshbots". These 'bots look at pages Google already knows about, and look for changes in the content. That's why they are requesting pages that are gone, and not requesting pages that are new. You can help them do their job more efficiently by including a last-modified header [webmasterworld.com] in your server responses.
A sign that a page has been picked up by one of the freshbots is that a date appears next to your URL in the search results. These "fresh" listings are quite volatile, and often last only a few days. After that, the listing reverts to whatever it was previously (including disappearing if the page is new and wasn't listed during the previous month).
Your new pages will get picked up during the next big crawl after the end-of-month update, and will appear consistently in the search results after the update at the end of November. They may pop in and out of the results until then, again due to the action of the fresh 'bots working with the new pages found at the end of October.
The repeated requests for the same pages may be a result of the fact that there are several fresh 'bots working on your site at once, and they only get together and compare notes rarely (if ever). If you have access to your raw logs, you will likely find different REMOTE_HOST names for several of these 'bots.
HTH,
Jim
Thanks for the info!
You were exactly right; the IP addresses for Googlebot were different on the various "cycles".
Stangely enough, after about 20 passes thru the site, right after I asked the question about picking up the new pages, I went back and checked the logs and the new ones all got spidered. (Wouldn't you know it?)
Thanks again,
Jenny