A couple of years ago I added an entire section to my site for classifieds, both user submitted and what's on other sites like craigslist and ebay. My niche is relatively small so there were only 3500 or so pages in the section.
Since many of them were duplicate from other sites and the original ones used a repeating template with useless information much like a standard profile page I opted for the noindex meta tag.
The site has been steady at PR4 for two years but last week webmaster tools shows I had 3500 404 error pages, it was the entire classifieds section. That has been completely removed from my site for at least 10 months now, it didn't work as intended so I removed it long ago.
PR has gone up to 5 with the last update. I'm not suggesting that google crawling every page and not finding the classifieds is what caused the increase but I am curious as to the reason old urls with noindex meta tags (when the pages were live) would have been crawled and reported as missing. Do they figure in pagerank calculation anyway?
Google never forgets a url so purge may be the wrong words, disassociate may be more appropriate.
the reason old urls with noindex meta tags (when the pages were live) would have been crawled and reported as missing. Do they figure in pagerank calculation anyway?
Yes - noindex pages still circulate PageRank unless the robots meta is also "nofollow". And since the meta tags are on the page itself, the URLs do need to be spidered even thought they are not added to the public index - it's the only way the meat tags can be read and confirmed over time.
I checked, it says not available across the board. No links to remove.
I'll leave them as 404 as well since that's the correct code for these pages, they don't exist anymore.
I noticed a spike in crawl activity around the same time so I investigated and Google didn't just pull up every page on my site, they tried to fetch all old urls and a bunch of urls that nobody would ever want indexed. ie: core files for wordpress and drupal even though the site isn't likely to be both.
xmlrpc.php was fetched repeatedly but I don't use remote publishing, perhaps a standard security test?