homepage Welcome to WebmasterWorld Guest from 54.166.113.249
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Editing Robots.txt to block parameters
serenoo

5+ Year Member



 
Msg#: 4235296 posted 3:03 pm on Nov 26, 2010 (gmt 0)

I own an affiliation website where I cannot change a lot because the script is from the vendor.
But I can edit the robots.txt file.
My problem is that I have two pages with the same content:
www.mywebsiteexample.com/
www.mywebsiteexample.com/?lc=en

is there a way to block the second page by robots.txt?
Disallow: /?lc will work?

 

goodroi

WebmasterWorld Administrator goodroi us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4235296 posted 5:31 pm on Nov 29, 2010 (gmt 0)

it depends on how your website is setup. i have a feeling that the /?lc=en is a mirror of your index page. if that is the case you might want to try using a canonical tag. also make sure no links point to the duplicate page.

depending on your situation you could also use robots.txt wildcards aka pattern matching.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4235296 posted 6:36 pm on Nov 29, 2010 (gmt 0)

Disallow: /*?lc=en

will also do it.

The * is the crucial character, as it will also work for /index.php?lc=en too.

serenoo

5+ Year Member



 
Msg#: 4235296 posted 7:41 pm on Nov 29, 2010 (gmt 0)

Just do it g1smd. Thanks, it works and it has already been cancelled by webmaster google tool. But I only added
Disallow: /?lc=en

without *

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4235296 posted 12:31 pm on Nov 30, 2010 (gmt 0)

what happens when you request the default directory index document with that query string?
for example if your index document is index.php, do you 301 redirect www.example.com/index.php?lc=en to www.example.com/?lc=en or does it resolve with a 200 OK?

spina45

5+ Year Member



 
Msg#: 4235296 posted 8:14 pm on Mar 4, 2011 (gmt 0)

- Disallow: /*?lc=en -

Can you also...

Disallow: /*?

To disallow multiple in one shot?

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4235296 posted 8:52 pm on Mar 4, 2011 (gmt 0)

The rule is still potentially incomplete.

Disallow: /?lc=en does not block requests for /index.php?lc=en for example.

That's another reason why
Disallow: /*?lc=en was suggested.
g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4235296 posted 9:52 pm on Mar 4, 2011 (gmt 0)

Disallow: /*?

will disallow all requests for URLs with parameters.

spina45

5+ Year Member



 
Msg#: 4235296 posted 10:37 pm on Mar 4, 2011 (gmt 0)

Webmaster Tools has a parameter handling page.

1) Can wildcards, or simply "?" be input?
2) Is it better to use their tool? Robots.txt? Both?

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4235296 posted 11:01 pm on Mar 4, 2011 (gmt 0)

Using
robots.txt (note case) works for all search engines and is preferred.
spina45

5+ Year Member



 
Msg#: 4235296 posted 12:58 am on Mar 5, 2011 (gmt 0)

Got it! Thanks! And thanks also for a post you made back in December re: .htaccess. I just came across it -- really helped me out. Thanks for all your valuable posts.

spina45

5+ Year Member



 
Msg#: 4235296 posted 6:24 pm on Mar 15, 2011 (gmt 0)

Is there any syntax to disallow all URLs that (mistakenly) have double slashes like this: www.domain.com/catagory//productname.html ?

I was trying add a slash at the end of: category and mistakenly added two just as Googlebot came by. The // has been corrected but I'd like Google to stop looking for it.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4235296 posted 8:16 pm on Mar 15, 2011 (gmt 0)

Disallow: *// would stop those URLs being crawled.
That won't help remove those URLs from the SERPs all that fast.

This code in your
.htaccess file would be far more effective:
RewriteRule // - [G]
spina45

5+ Year Member



 
Msg#: 4235296 posted 11:04 pm on Mar 15, 2011 (gmt 0)

Thank you!

I had tried *// but the robots.txt testing area in GWT didn't disallow it. I appreciate the .htaccess code. Panda kicked my butt and I'm trying everything.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved