Welcome to WebmasterWorld Guest from 54.224.160.42

Forum Moderators: Robert Charlton & aakk9999 & andy langton & goodroi

Message Too Old, No Replies

WMT URL parameters - Anyone had luck with these working?

     
4:39 pm on Oct 10, 2011 (gmt 0)

Junior Member

5+ Year Member

joined:Oct 25, 2010
posts:81
votes: 0


About a month ago I assigned Google to index no URLS from their parameter tools in Google Webmaster Tools. I immediately saved site:mysite.com inurl:parameter and it had around 1000 urls in the index. After one month the site search still has the same amount of urls in the index even though I told Google not to index any of them.

Has anyone had luck with this working for their site? Or does it usually take longer then a month for this to work?

I know these are only suggestions to Google, but I thought a little progress would be made if I told them not to index it as opposed to none.
7:06 am on Oct 11, 2011 (gmt 0)

Preferred Member

5+ Year Member

joined:Mar 29, 2007
posts:592
votes: 0


I also tried to remove our site search URL's but nothing happened even when we have blocked them through robots.txt

The only think that happened after about 3-4 months is that all the site search urls now appear in crawl error report as 404....
8:48 pm on Oct 11, 2011 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13210
votes: 347


Changes to your gwt prefs tend not to be retroactive. That is, they won't index new stuff but they'll keep the old stuff unless you explicitly tell them to delete it. Has anyone tried removing pages with parameters from the existing cache and index? Can it even be done, or do you just get a message saying "We've already removed that page"?

If you formerly had 10,000 and still have 10,000 I suspect something is working, because otherwise it would be 15,000 or 20,000 by now.
9:11 pm on Oct 11, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


Changes to your gwt prefs tend not to be retroactive.

Right - exactly. If a type of URL is already indexed (such as Site Search) I use a 2-step approach for the clean-up. First, add a robots noindex to the template for about 4 weeks. Then add a robots.txt Disallow rule.
12:10 am on Oct 12, 2011 (gmt 0)

Junior Member

5+ Year Member

joined:Oct 25, 2010
posts:81
votes: 0


Hmm, ok that is interesting, I assumed that it would be retroactive. The parameter i am trying to remove is from dynamic urls that were created with a index problem we had a year ago. They have been fixed so that it cant happen however the urls are still valid. It would be almost impossible to add a disallow to this page because they are dynamic, is there a better way to do it? Can you disallow a parameter in robots.txt?

Thanks for all of the great info!
5:18 am on Oct 12, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


Can you disallow a parameter in robots.txt?

Most likely, if the character string that's used as your parameter name doesn't also appear the file and directory structure that the site uses.

This requires a pattern matching wild card "*" within the Disallow rule - that's an extension of the earlier robots.txt specification that Google supports. So imagine you want to disallow crawling of any URL that uses the parameter "pdq".

The rule Disallow: /*pdq would do it. But if your parameter is "sch" and you also have a URL like /kirschwasser.php - then you're in a bit of trouble.
6:26 am on Oct 12, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


If it is the first parameter then these would work:

Disallow: /*?pdq
Disallow: /*?sch
11:16 am on Oct 12, 2011 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13210
votes: 347


If it is a non-first parameter, could you use

Disallow: /*&pdq 
Disallow: /*&sch


? Or does the ampersand have a robots.txt-specific meaning that I've forgotten about?
3:05 pm on Oct 12, 2011 (gmt 0)

Junior Member

5+ Year Member

joined:Oct 25, 2010
posts:81
votes: 0


So if the problem parameter is amp;amp; due to the way the system would generate & then if I set up Disallow: /*amp;amp; then it should take care of the problem?

THanks!
6:53 pm on Oct 12, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Yes. All those code snippets look like they would each be valid for their specific purposes.