Forum Moderators: open
A little over a year ago I did a massive renaming of most the pages on one of my sites. With careful planning and judicious use of mod_rewrite, by the end of the year Google was crawling the new URLs and had (seemingly) forgotten the old ones, my rankings were unaffected (in a negative fashion), and it was virtually transparent to new & old visitors.
Over two years ago, I deleted several pages and let them die a 404 death. Google (and others) caught onto this pretty quick, and of course the pages were dropped.
Earlier today, in a real "blast from the past," Google requested all these old URLS, filling my error log with 404's.
FWIW, all the bad requests came from the "new" "Mozilla/5.0 (compatible; Googlebot/2.1;[...]" bot, coming from the 66.249.65.x range. This new bot has previously scraped the site successfully.
>sets his tin foil hat back on the desk.
regards,
Mark
Also my index page is showing again (as well as www.domain.com). It's just a url without a snippet but it has a PR3. I had earlier gone to a lot of trouble to get rid of it by ensuring internal links pointed to ROOT. And anyway I thought Google had fixed this double-entry problem long ago.
I thought they were long removed from google's directory, but I wondered whether they may have just been doing a massive "spring clean" and double checking all their old urls before deleting them for good?
Mozilla/5.0 (compatible; Googlebot/2.1;[...]" bot, coming from the 66.249.65.x range. This new bot has previously scraped the site successfully
Don't forget this version of Googlebot now requests GZIP compressed pages using HTTP 1.1 vs 1.0, so it can go typically 4 times faster. The old Googlebot did not request GZIP'd pages.
On Sep 30th, Oct 6th and Oct 28th I noticed the new bot requesting GZIP'd pages in my logs.
See thread:
[webmasterworld.com...]
Google's also got to be working on the Hijacked websites problem so perhaps this is somehow related to crawling old pages, figuring out who owned the material first?
I thought they were long removed from google's directory, but I wondered whether they may have just been doing a massive "spring clean" and double checking all their old urls before deleting them for good?Hmmmm, perhaps they are on a crusade to delete as many old URL as possible as a stop gap measure to allow some newer pages into the main index.
If they are just asking for old pages, then it is a status check of their old data and nothing to be worried about at all. If however, they are putting pages references back into their index for pages that don't actually exist then they have a big problem.
If however, they are putting pages references back into their index for pages that don't actually exist then they have a big problem.
I think they have already done so, and are now trying to sort it out. Many old urls seemed to appear at the time of the last PR update. And as been noted in other threads, there were anomalies in the updated PR.
There was a massive crawl just before (or during?) the update, and now there's another massive crawl. Possibly they are attempting to repeat the process and this time get it right.
I think we should expect hiccups like this. The number of pages and links has grown enormously and is still growing. Google will probably have to continuously modify it's procedures in order to cope.
Yes, they most definitely have added some of my long-gone pages back into the index as supplemental results. Three dead pages that weren't there yesterday are there today. No title or ransom note/snippet, just a URL.
This situation makes me wish I did allow Google to cache my pages. I'd love to see what date they'd report.
I can find the pages with just site:www.example.com, but trying site:www.example.com UniqueWidgetFromDeadPage fails to bring them up. That's a small comfort since I expect the dead pages will not show up in anyone's regular SERPs. (But on the other hand, they'll miss out on my nice 404 page.)