homepage Welcome to WebmasterWorld Guest from 54.205.254.108
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Editing Robots.txt to block parameters
serenoo




msg:4235298
 3:03 pm on Nov 26, 2010 (gmt 0)

I own an affiliation website where I cannot change a lot because the script is from the vendor.
But I can edit the robots.txt file.
My problem is that I have two pages with the same content:
www.mywebsiteexample.com/
www.mywebsiteexample.com/?lc=en

is there a way to block the second page by robots.txt?
Disallow: /?lc will work?

 

goodroi




msg:4236350
 5:31 pm on Nov 29, 2010 (gmt 0)

it depends on how your website is setup. i have a feeling that the /?lc=en is a mirror of your index page. if that is the case you might want to try using a canonical tag. also make sure no links point to the duplicate page.

depending on your situation you could also use robots.txt wildcards aka pattern matching.

g1smd




msg:4236385
 6:36 pm on Nov 29, 2010 (gmt 0)

Disallow: /*?lc=en

will also do it.

The * is the crucial character, as it will also work for /index.php?lc=en too.

serenoo




msg:4236410
 7:41 pm on Nov 29, 2010 (gmt 0)

Just do it g1smd. Thanks, it works and it has already been cancelled by webmaster google tool. But I only added
Disallow: /?lc=en

without *

phranque




msg:4236745
 12:31 pm on Nov 30, 2010 (gmt 0)

what happens when you request the default directory index document with that query string?
for example if your index document is index.php, do you 301 redirect www.example.com/index.php?lc=en to www.example.com/?lc=en or does it resolve with a 200 OK?

spina45




msg:4276784
 8:14 pm on Mar 4, 2011 (gmt 0)

- Disallow: /*?lc=en -

Can you also...

Disallow: /*?

To disallow multiple in one shot?

g1smd




msg:4276810
 8:52 pm on Mar 4, 2011 (gmt 0)

The rule is still potentially incomplete.

Disallow: /?lc=en does not block requests for /index.php?lc=en for example.

That's another reason why
Disallow: /*?lc=en was suggested.
g1smd




msg:4276854
 9:52 pm on Mar 4, 2011 (gmt 0)

Disallow: /*?

will disallow all requests for URLs with parameters.

spina45




msg:4276866
 10:37 pm on Mar 4, 2011 (gmt 0)

Webmaster Tools has a parameter handling page.

1) Can wildcards, or simply "?" be input?
2) Is it better to use their tool? Robots.txt? Both?

g1smd




msg:4276874
 11:01 pm on Mar 4, 2011 (gmt 0)

Using
robots.txt (note case) works for all search engines and is preferred.
spina45




msg:4276921
 12:58 am on Mar 5, 2011 (gmt 0)

Got it! Thanks! And thanks also for a post you made back in December re: .htaccess. I just came across it -- really helped me out. Thanks for all your valuable posts.

spina45




msg:4282069
 6:24 pm on Mar 15, 2011 (gmt 0)

Is there any syntax to disallow all URLs that (mistakenly) have double slashes like this: www.domain.com/catagory//productname.html ?

I was trying add a slash at the end of: category and mistakenly added two just as Googlebot came by. The // has been corrected but I'd like Google to stop looking for it.

g1smd




msg:4282150
 8:16 pm on Mar 15, 2011 (gmt 0)

Disallow: *// would stop those URLs being crawled.
That won't help remove those URLs from the SERPs all that fast.

This code in your
.htaccess file would be far more effective:
RewriteRule // - [G]
spina45




msg:4282212
 11:04 pm on Mar 15, 2011 (gmt 0)

Thank you!

I had tried *// but the robots.txt testing area in GWT didn't disallow it. I appreciate the .htaccess code. Panda kicked my butt and I'm trying everything.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved