Welcome to WebmasterWorld Guest from 54.145.235.72

Using XML Sitemap to speed removal of blocked pages?

   
1:52 pm on Mar 5, 2013 (gmt 0)

10+ Year Member



I'm in the process of blocking thousands of files on a dynamic site with meta robots noindex.

I want these removed from the index as soon as possible.

The site has a few hundred thousand pages indexed, but is not getting crawled to heavily by Google. I have a feeling that it will take Google months/years to hit some of these files and discover the noindex.

The files I'm blocking are all parameters on URLs, so I can't remove them in batches via Google Webmaster Tools, since it is not possible to remove a directory using wildcards.

So:
I thought of creating an XML sitemap with a dump of all the URLs I blocked with noindex, hoping it will speed up the removal process.

Is this a legitimate approach? Will Google actually hit the files and notice the noindex, or will these files (or the entire XML sitemap) just get ignored?

If this is not a valid approach, does anyone have any ideas on how to speed up the removal process?

Would it make sense to create an HTML index to these files instead?
5:06 pm on Mar 7, 2013 (gmt 0)

WebmasterWorld Senior Member andy_langton is a WebmasterWorld Top Contributor of All Time 10+ Year Member



It's a perfectly reasonable approach. People confuse sitemaps with the idea of presenting Google with a complete 'view' of your site that they will follow. In fact, sitemaps are just a way of adding on data to the usual crawl process. So it doesn't matter whether you have a 'positive' or 'negative' reason to submit.

The other method would be to create a page with links, as you suggest. if you do so, I would recommending using the "submit to index" feature in GWT. But to be honest, creating something with enough equity to get speedy crawling of thousands of URLs would not be a good idea IMO.
6:58 pm on Mar 7, 2013 (gmt 0)

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



The files I'm blocking are all parameters on URLs, so I can't remove them in batches via Google Webmaster Tools, since it is not possible to remove a directory using wildcards.

This current thread discusses dealing with irrelevant parameters, and you might want to take a look at it....

Canonical Question - About multiple querystring with similar content
http://www.webmasterworld.com/google/4551376.htm [webmasterworld.com]
 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month