Welcome to WebmasterWorld Guest from 18.206.48.142

Forum Moderators: goodroi

Message Too Old, No Replies

Editing Robots.txt to block parameters

     
3:03 pm on Nov 26, 2010 (gmt 0)

Full Member

10+ Year Member

joined:Jan 14, 2009
posts: 281
votes: 2


I own an affiliation website where I cannot change a lot because the script is from the vendor.
But I can edit the robots.txt file.
My problem is that I have two pages with the same content:
www.mywebsiteexample.com/
www.mywebsiteexample.com/?lc=en

is there a way to block the second page by robots.txt?
Disallow: /?lc will work?
5:31 pm on Nov 29, 2010 (gmt 0)

Administrator from US 

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 21, 2004
posts:3493
votes: 380


it depends on how your website is setup. i have a feeling that the /?lc=en is a mirror of your index page. if that is the case you might want to try using a canonical tag. also make sure no links point to the duplicate page.

depending on your situation you could also use robots.txt wildcards aka pattern matching.
6:36 pm on Nov 29, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Disallow: /*?lc=en


will also do it.

The * is the crucial character, as it will also work for /index.php?lc=en too.
7:41 pm on Nov 29, 2010 (gmt 0)

Full Member

10+ Year Member

joined:Jan 14, 2009
posts: 281
votes: 2


Just do it g1smd. Thanks, it works and it has already been cancelled by webmaster google tool. But I only added
Disallow: /?lc=en

without *
12:31 pm on Nov 30, 2010 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11817
votes: 236


what happens when you request the default directory index document with that query string?
for example if your index document is index.php, do you 301 redirect www.example.com/index.php?lc=en to www.example.com/?lc=en or does it resolve with a 200 OK?
8:14 pm on Mar 4, 2011 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 13, 2006
posts: 103
votes: 0


- Disallow: /*?lc=en -

Can you also...

Disallow: /*?

To disallow multiple in one shot?
8:52 pm on Mar 4, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


The rule is still potentially incomplete.

Disallow: /?lc=en
does not block requests for
/index.php?lc=en
for example.

That's another reason why
Disallow: /*?lc=en
was suggested.
9:52 pm on Mar 4, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Disallow: /*?


will disallow all requests for URLs with parameters.
10:37 pm on Mar 4, 2011 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 13, 2006
posts: 103
votes: 0


Webmaster Tools has a parameter handling page.

1) Can wildcards, or simply "?" be input?
2) Is it better to use their tool? Robots.txt? Both?
11:01 pm on Mar 4, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Using
robots.txt
(note case) works for all search engines and is preferred.
12:58 am on Mar 5, 2011 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 13, 2006
posts: 103
votes: 0


Got it! Thanks! And thanks also for a post you made back in December re: .htaccess. I just came across it -- really helped me out. Thanks for all your valuable posts.
6:24 pm on Mar 15, 2011 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 13, 2006
posts: 103
votes: 0


Is there any syntax to disallow all URLs that (mistakenly) have double slashes like this: www.domain.com/catagory//productname.html ?

I was trying add a slash at the end of: category and mistakenly added two just as Googlebot came by. The // has been corrected but I'd like Google to stop looking for it.
8:16 pm on Mar 15, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Disallow: *//
would stop those URLs being crawled.
That won't help remove those URLs from the SERPs all that fast.

This code in your
.htaccess
file would be far more effective:
RewriteRule // - [G]
11:04 pm on Mar 15, 2011 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 13, 2006
posts: 103
votes: 0


Thank you!

I had tried *// but the robots.txt testing area in GWT didn't disallow it. I appreciate the .htaccess code. Panda kicked my butt and I'm trying everything.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members