Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Googlebot crawling lots of old NOINDEX pages

         

stricknine

10:21 am on Jun 27, 2011 (gmt 0)

10+ Year Member



Hi,

The last 2 fays, Googlebot has been crawling my site like mad, and more interesting, 99% of the pages it is crawling are old pages that I NOINDEXed like 1 year ago, or old 404 pages gone long ago (and all links to them removed long ago, too).

And now it is crawling all of them again 1 year later ?

Usually my website gets about 1000 pages/day from Googlebot, and for the last 2 days, googlebot has crawled more than 5000 pages/day.

Does this mean some ranking changes are to be expected for my website soon?

Thanks,
Stricknine

balibones

2:18 pm on Jun 27, 2011 (gmt 0)

10+ Year Member



It is possible that this is a rogue bot that is spoofing Googlebot. If you can get the IP address used by this bot many times you can search for that IP address on Google and there will be forum comments about it if it's a rogue bot.

If it is the real Googlebot you should make sure that the pages you have noindexed also have a nofollow tag (if you don't want them to be crawled). It seems odd that they would suddenly recrawl old 404 pages if there are no links out there. Make sure these pages aren't in any of your sitemaps or internal links as well.

No, I don't think it is an indication that your rankings are about to change, but it's certainly possible.

deadsea

2:35 pm on Jun 27, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Haven't specifically looked at my logs to see if something is up in the last couple days for me. However, that sounds like normal Googlebot behavior to me. Every once in a while Google seems to open the vault and find a reel of urls from my site that it hasn't crawled in a while. Then it goes nuts crawling them. The hallmarks of this crawl mode are:
* Crawls lots of urls that typically haven't been crawled in a while
* Crawls old urls with no current inbound links (even internal ones).
* Crawls urls in order of length, starting with the shortest urls.

This kind of crawl appears to happen in addition to other normal googlbot behaviors: recrawling higher pr pages periodically, and greedily crawling new pages and sections of the site.

g1smd

2:44 pm on Jun 27, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It's normal periodic behaviour. They're just checking that the status of the pages hasn't changed. They can't do that unless they access them all again and they usually do that once or twice per year.

PCInk

2:46 pm on Jun 27, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A no-index does not mean no-crawl. Perhaps Google is re-crawling pages that were once in their index to see if they still exist and if the no-index tag has changed.

After all, if you no-index a page by mistake and then change it to index, Google would never re-index the page if they never crawled it again. Also 404 pages can be an error and the page could be back up now.

If you don't want them to even see the page, you should specify this in your robots.txt which Google obey. A rogue bot would probably not obey this, however.