homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

Editing Robots.txt to block parameters

 3:03 pm on Nov 26, 2010 (gmt 0)

I own an affiliation website where I cannot change a lot because the script is from the vendor.
But I can edit the robots.txt file.
My problem is that I have two pages with the same content:

is there a way to block the second page by robots.txt?
Disallow: /?lc will work?



 5:31 pm on Nov 29, 2010 (gmt 0)

it depends on how your website is setup. i have a feeling that the /?lc=en is a mirror of your index page. if that is the case you might want to try using a canonical tag. also make sure no links point to the duplicate page.

depending on your situation you could also use robots.txt wildcards aka pattern matching.


 6:36 pm on Nov 29, 2010 (gmt 0)

Disallow: /*?lc=en

will also do it.

The * is the crucial character, as it will also work for /index.php?lc=en too.


 7:41 pm on Nov 29, 2010 (gmt 0)

Just do it g1smd. Thanks, it works and it has already been cancelled by webmaster google tool. But I only added
Disallow: /?lc=en

without *


 12:31 pm on Nov 30, 2010 (gmt 0)

what happens when you request the default directory index document with that query string?
for example if your index document is index.php, do you 301 redirect www.example.com/index.php?lc=en to www.example.com/?lc=en or does it resolve with a 200 OK?


 8:14 pm on Mar 4, 2011 (gmt 0)

- Disallow: /*?lc=en -

Can you also...

Disallow: /*?

To disallow multiple in one shot?


 8:52 pm on Mar 4, 2011 (gmt 0)

The rule is still potentially incomplete.

Disallow: /?lc=en does not block requests for /index.php?lc=en for example.

That's another reason why
Disallow: /*?lc=en was suggested.

 9:52 pm on Mar 4, 2011 (gmt 0)

Disallow: /*?

will disallow all requests for URLs with parameters.


 10:37 pm on Mar 4, 2011 (gmt 0)

Webmaster Tools has a parameter handling page.

1) Can wildcards, or simply "?" be input?
2) Is it better to use their tool? Robots.txt? Both?


 11:01 pm on Mar 4, 2011 (gmt 0)

robots.txt (note case) works for all search engines and is preferred.

 12:58 am on Mar 5, 2011 (gmt 0)

Got it! Thanks! And thanks also for a post you made back in December re: .htaccess. I just came across it -- really helped me out. Thanks for all your valuable posts.


 6:24 pm on Mar 15, 2011 (gmt 0)

Is there any syntax to disallow all URLs that (mistakenly) have double slashes like this: www.domain.com/catagory//productname.html ?

I was trying add a slash at the end of: category and mistakenly added two just as Googlebot came by. The // has been corrected but I'd like Google to stop looking for it.


 8:16 pm on Mar 15, 2011 (gmt 0)

Disallow: *// would stop those URLs being crawled.
That won't help remove those URLs from the SERPs all that fast.

This code in your
.htaccess file would be far more effective:
RewriteRule // - [G]

 11:04 pm on Mar 15, 2011 (gmt 0)

Thank you!

I had tried *// but the robots.txt testing area in GWT didn't disallow it. I appreciate the .htaccess code. Panda kicked my butt and I'm trying everything.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved