Welcome to WebmasterWorld Guest from 50.16.24.12

Forum Moderators: goodroi

Editing Robots.txt to block parameters

   
3:03 pm on Nov 26, 2010 (gmt 0)

5+ Year Member



I own an affiliation website where I cannot change a lot because the script is from the vendor.
But I can edit the robots.txt file.
My problem is that I have two pages with the same content:
www.mywebsiteexample.com/
www.mywebsiteexample.com/?lc=en

is there a way to block the second page by robots.txt?
Disallow: /?lc will work?
5:31 pm on Nov 29, 2010 (gmt 0)

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



it depends on how your website is setup. i have a feeling that the /?lc=en is a mirror of your index page. if that is the case you might want to try using a canonical tag. also make sure no links point to the duplicate page.

depending on your situation you could also use robots.txt wildcards aka pattern matching.
6:36 pm on Nov 29, 2010 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Disallow: /*?lc=en


will also do it.

The * is the crucial character, as it will also work for /index.php?lc=en too.
7:41 pm on Nov 29, 2010 (gmt 0)

5+ Year Member



Just do it g1smd. Thanks, it works and it has already been cancelled by webmaster google tool. But I only added
Disallow: /?lc=en

without *
12:31 pm on Nov 30, 2010 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



what happens when you request the default directory index document with that query string?
for example if your index document is index.php, do you 301 redirect www.example.com/index.php?lc=en to www.example.com/?lc=en or does it resolve with a 200 OK?
8:14 pm on Mar 4, 2011 (gmt 0)

5+ Year Member



- Disallow: /*?lc=en -

Can you also...

Disallow: /*?

To disallow multiple in one shot?
8:52 pm on Mar 4, 2011 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



The rule is still potentially incomplete.

Disallow: /?lc=en
does not block requests for
/index.php?lc=en
for example.

That's another reason why
Disallow: /*?lc=en
was suggested.
9:52 pm on Mar 4, 2011 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Disallow: /*?


will disallow all requests for URLs with parameters.
10:37 pm on Mar 4, 2011 (gmt 0)

5+ Year Member



Webmaster Tools has a parameter handling page.

1) Can wildcards, or simply "?" be input?
2) Is it better to use their tool? Robots.txt? Both?
11:01 pm on Mar 4, 2011 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Using
robots.txt
(note case) works for all search engines and is preferred.
12:58 am on Mar 5, 2011 (gmt 0)

5+ Year Member



Got it! Thanks! And thanks also for a post you made back in December re: .htaccess. I just came across it -- really helped me out. Thanks for all your valuable posts.
6:24 pm on Mar 15, 2011 (gmt 0)

5+ Year Member



Is there any syntax to disallow all URLs that (mistakenly) have double slashes like this: www.domain.com/catagory//productname.html ?

I was trying add a slash at the end of: category and mistakenly added two just as Googlebot came by. The // has been corrected but I'd like Google to stop looking for it.
8:16 pm on Mar 15, 2011 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Disallow: *//
would stop those URLs being crawled.
That won't help remove those URLs from the SERPs all that fast.

This code in your
.htaccess
file would be far more effective:
RewriteRule // - [G]
11:04 pm on Mar 15, 2011 (gmt 0)

5+ Year Member



Thank you!

I had tried *// but the robots.txt testing area in GWT didn't disallow it. I appreciate the .htaccess code. Panda kicked my butt and I'm trying everything.
 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved