Welcome to WebmasterWorld Guest from 34.204.176.189

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google Webmaster Tools URL removal tool

     
4:00 pm on Feb 28, 2011 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 10, 2005
posts:124
votes: 0


Hi,

We currently have a situation where we have many hundreds of thousands of URLs that we need to get rid of as we believe they have damaged our rankings.

Because of the scale and complexity of the problem, its not possible to list each URL in the WMT URL removal tool as it would take months to input.

We are currently investigating the use of the "directory" removal option. Let me give you a quick scenario:

e.g. we have 1000 URLs like so:

www.domain.com/a/keyword/
www.domain.com/a/keyword/b/123/ - set to meta NOINDEX
www.domain.com/a/keyword/b/456/ - set to meta NOINDEX
www.domain.com/a/keyword/b/789/

We want to be able to remove the directory at the top level, and by doing so, we must disallow in robots.txt as follows:

disallow /a/keyword/

The main question is, will Google remove the whole directory, or will it only remove the pages that have a meta NOINDEX set to it?

Has anyone had any experience on how to remove hundreds of thousands of URLs that are inside a directory where many URLs need protecting from a removal - would it be easier to just remove the whole lot and start from scratch? Are there ranking issues based on full removal of good URLs when they are reindexed? How quick will it come back to the good URLs to get rank back?

Really appreciate your help.

Cheers,
SS
6:20 pm on Feb 28, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 7, 2003
posts:753
votes: 0


Google would remove the whole directory. Pages that have noindex would get removed. Pages that are currently crawlable would get removed.

Its a bit unclear to me from your post but it sounds like you have some pages in that directory that you still want to be crawled and indexed. If that is the case, robots.txt is NOT the way to go.
6:38 pm on Feb 28, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


Googlebot has good support for pattern matching wild cards (asterisk, question mark, dollar sign) in the robots.txt file.

1. See [google.com...] and expand the plus + sign for "Manually create a robots.txt file."

2. Before you go live with the file, use the robots.txt tool in your Webmaster Tools account to make sure your directives achieve exactly what you want.
10:19 pm on Feb 28, 2011 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 10, 2005
posts:124
votes: 0


@deadsea yeh you are right, I just tested it, and unfortunately it removes the lot. I have some pages that need to remain in a directory that has over 2 million URLs, but many need to be removed - the URL removal tool is pretty redundant for this type of work. The URLs in question have been NOINDEXed via meta tag (they haven't been disallowed in robots.txt that was for testing a specific directory as above), but these could take 6 months to disappear naturally and continue to hurt rankings.

@tedster Unfortunately, the wildcards just don't help in this case as we have one main directory with many different thousands of URL permutations inside it that need removing.
10:34 pm on Feb 28, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


I hear you on the wild cards. Sometimes there just is no pattern you can use for leverage. Whenever I create new URL structures, that is one of the points I try to keep in mind - how easily can I address various parts of the website if I use this structure. That goes for robots.txt and analytics, too.
5:46 am on Mar 1, 2011 (gmt 0)

Senior Member

joined:Dec 29, 2003
posts:5428
votes: 0


The URLs in question have been NOINDEXed via meta tag (they haven't been disallowed in robots.txt that was for testing a specific directory as above), but these could take 6 months to disappear naturally and continue to hurt rankings.


Submit a sitemap with their URLs to Google, maybe indexing speeds up.

On 2/24 I too suspected very thin, tag like, pages as a culprit so I removed their /dir/ via robots. Next day it was completed and nothing can be seen on Google.com. Now however, I removed the robots entry and will let Google get them, as they are noindex. Maybe it's better this way, just in case removing via webmastercentral only hides them from the public but still count in algo (unlikely but never know)
5:45 pm on Mar 7, 2011 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 10, 2005
posts:124
votes: 0


Hi all,

Can anyone tell me what the maximum number of URLs you can submit to the URL removal tool each day?

We have several hundred thousand that we want rid of, and are trying to work out the best method (we already have NOINDEX meta, but the tool is much quicker!)

Thanks in advance.
6:33 pm on Mar 7, 2011 (gmt 0)

Full Member

10+ Year Member

joined:Sept 30, 2006
posts:332
votes: 0


As others have mentioned,the best is to try to figure out a wild-card system you can use to bang them out en masse.

You can do 100 URLs at a time, as I recall. I once had thousands and thousands to do, and I had to do them all manually. So I understand your dilemmna. ; ) I set up a program to create text files each containing 100 URLs per file... Then, one by one I cut and pasted the files in to the URL Removal tool. Totally sucked. Took forever. Then I discovered a Firefox Greasemonkey add-on that would do some of the monotonous work. That helped. Get that script online somewhere. Beyond that, the next problem you will have is that Google is always nervous about bots, so I was constantly running into issues because I was going so fast with the sets of 100 URLs, that after 20-30 sets of URLs, it would lock me out for a few hours. Obviously, I was trying to solve a problem, and that delay just added to the stress of the situation. ; )

Anyway, in the end, it all worked out.

And yes, first, fix your site with noindex. Do that first. Start now to tell googlebot to pull those urls out via noindex. Then use the URL Removal tool to force them out without having to wait for googlebot to discover the noindex.

You see, when you use the URL Removal tool, it is temporary. I think it used to be 90 days. After the 90 days, it wants to put them all back in the index. But if you already have them all noindexed, they stay out after the 90 days. I blocked some with robots.txt, but I find that messy and I don't like the 'errors' posted in my Webmaster account, I'm suspicious that to many errors may label my site poorly somehow with google ;), so I prefer the noindex as the long term solution. I cleaned out my robots.txt file, and just leaned on the noindex.

Again, I speak from experience. This worked for us for thousands and thousands of URLs.

Get the noindex in. Do wild-card removals wherever you can. Then, for whatever URLS are left, create a pile of txt files with 100 URLs in each, and use the Greasemonkey script to cut-and-paste them in for you. Get organized for the job, and stick with it. For the unique URLs that we had to cut an paste in, it took us about 6 weeks to eventually get them all out. But, every time you upload a list of 100 URLs, they are gone in less than 24 hours, so slowly but surely you will see the net result.
6:36 pm on Mar 7, 2011 (gmt 0)

Full Member

10+ Year Member

joined:Sept 30, 2006
posts:332
votes: 0


P.S. As I recall also, you can do DIRECTORIES ('wildcards') in the list of URLs, not just individual URLs, so you can input 100 DIRECTORIES at a time too. That may help you...
6:38 pm on Mar 7, 2011 (gmt 0)

Full Member

10+ Year Member

joined:Sept 30, 2006
posts:332
votes: 0


P.S.S. Sorry. Also as I recall, you may have to initially block it in robots for the URL removal tool to accept it. But once it does and it is gone, if your noindex is in, I'd pull it out of of robots. Up to you. Maybe you don't trust your team over the months/years ahead to leave the noindex in you code, so maybe you need the added insurance of the robots.txt to keep them out to.
6:38 pm on Mar 7, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


The low-maintenance solution is for non-valid URLs to serve an error page with the correct HTTP 404 or 410 header and the error page containing onward links pointing to the right content.

With the correct response, most of the URLs will be gone from the index within weeks. For those that remain, the error page helps the visitor find useful content.
6:47 pm on Mar 7, 2011 (gmt 0)

Full Member

10+ Year Member

joined:Sept 30, 2006
posts:332
votes: 0


Yes, that is the preferred solution. I tried that and it does take weeks. At best. Sometimes, you are in such deep s**t you cannot bear to wait weeks. The URL Removal Tool does in less than 24 hours with 100% accuracy what the proper approach of 404s and 410s will take weeks to do, and 100% accuracy is not guaranteed because you need to wait for a crawl. And if you are already in deep s**t, and these are dead / thin pages that googlebot doesn't crawl often, that crawl might seemingly never come. Screw it. If you're 100% sure the URLs have to go, do everyone a favor and dump them now.
8:43 pm on Mar 7, 2011 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 10, 2005
posts:124
votes: 0


unfortunately we cannot 404 these pages, as they are pages which are triggered by our users, hence why they still exist but have NOINDEX.

The wildcard is just not possible for these URLs as they are individually constructed.

We've thought about 301 redirecting them to a specific directory so we can hit delete on them, but this would take too long.

So, we may have no alternative but to go down the URL removal tool route, but at 100 entries before a lockdown, this too could take too long based on the number of URLs we need to delete.

Think we could be well and truly snookered...
8:50 pm on Mar 7, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


If they have the noindex meta tag, then they'll soon be dropped.
10:04 pm on Mar 7, 2011 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 10, 2005
posts:124
votes: 0


@helpnow

P.S. As I recall also, you can do DIRECTORIES ('wildcards') in the list of URLs, not just individual URLs, so you can input 100 DIRECTORIES at a time too. That may help you...

Do you mean you can add a wildcard to a directory, such as /xyz/*/123/ and it will remove all within these directories including ALL the different permutations in *?
10:11 pm on Mar 7, 2011 (gmt 0)

Full Member

10+ Year Member

joined:Sept 30, 2006
posts:332
votes: 0


@speedshopping Well, if you do /xyz/*, it should remove everything below /xyz, right?

I think /xyz/*/123/ is moot. 99% sureit ignores everything after the *, so it isn't a true wildcard.

Are you saying, for example, you want to remove /xyz/abc/123/, but not delete /xyz/abc/456/? In which case I don't think /xyz/*/123/ would preserve /xyz/abc/456/. I think /xyz/*/123/ would be treated as /xyz/*.

Obviously, be very careful before you go crazy with the delete tool. Do small tests on before you really open up the wildcards and mass directory deletions. ; )

Oh, you can do more than 100 at a time before a lockdown. Pretty sure I was getting 20 to 30 sets of 100 (2000-3000) at a time before I'd get locked out for a bit...
10:14 pm on Mar 7, 2011 (gmt 0)

Full Member

10+ Year Member

joined:Sept 30, 2006
posts:332
votes: 0


In your opeing post, you said:

www.domain.com/a/keyword/
www.domain.com/a/keyword/b/123/ - set to meta NOINDEX
www.domain.com/a/keyword/b/456/ - set to meta NOINDEX
www.domain.com/a/keyword/b/789/

Are you trying to wipe out:

www.domain.com/a/keyword/b/123/ - set to meta NOINDEX
www.domain.com/a/keyword/b/456/ - set to meta NOINDEX

... but preserve:

www.domain.com/a/keyword/b/789/

?
10:43 am on Mar 8, 2011 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 10, 2005
posts:124
votes: 0


Thanks for your info helpnow...

Oh, you can do more than 100 at a time before a lockdown. Pretty sure I was getting 20 to 30 sets of 100 (2000-3000) at a time before I'd get locked out for a bit...

How long is a bit? A day, an hour?
10:45 am on Mar 8, 2011 (gmt 0)

Full Member

10+ Year Member

joined:Sept 30, 2006
posts:332
votes: 0


As I recall, the lock out was for a few hours. This was summer of 2010, your experience may be different. ; )
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members