homepage Welcome to WebmasterWorld Guest from 54.161.220.160
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
How to block this type of URL's
iCyborg

5+ Year Member



 
Msg#: 3815763 posted 7:29 pm on Dec 29, 2008 (gmt 0)

well I want to block the url's

http://example.com/blog/page/6/?theme=xyz
http://example.com/blog/this-is-post/?theme=xyz

but want to keep these url's
http://example.com/blog/page/6/
http://example.com/blog/this-is-post/

so basically I just want to block all pages which are ending with "?theme=xyz" as this is causing unnecessary content duplication.

 

iCyborg

5+ Year Member



 
Msg#: 3815763 posted 8:16 pm on Dec 29, 2008 (gmt 0)

I want to prevent them from being indexed

netchicken1

5+ Year Member



 
Msg#: 3815763 posted 6:32 am on Dec 30, 2008 (gmt 0)

Disallow: theme

(I think)

iCyborg

5+ Year Member



 
Msg#: 3815763 posted 12:03 pm on Dec 30, 2008 (gmt 0)

Shouldn't there be some * too ?

firefoxin

5+ Year Member



 
Msg#: 3815763 posted 7:52 am on Jan 1, 2009 (gmt 0)

How to block this type of url in robot.txt

http://example.com/member.php?action=email&id=10039‎

[or]

http://example.com/comments.php?shownews=605&highlight=‎

[or]

http://example.com/member.php?action=list‎

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3815763 posted 3:10 pm on Jan 1, 2009 (gmt 0)

First Question:

Disallow: /[b]*[/b]theme

.

Second Question:

Which URLs are similar to those, but do not need to be blocked?

Otherwise is this what you want?

Disallow: /member
Disallow: /comments

.

Be aware that Google will still list "blocked" URLs as URL-only entries in the SERPs.

firefoxin

5+ Year Member



 
Msg#: 3815763 posted 12:47 pm on Jan 2, 2009 (gmt 0)

thanks g1smd

/comments.php?id=3740&ocid=30562&replyid=0&catid=1 [remove]
/comments.php?id=3740&replyid=30574&catid=1 [remove]
/comments.php?shownews=3740 [OK]

i want to remove first two line and 3th one is my primary link.

i think i must use this code :

User-agent: googlebot
Disallow: /*id
Disallow: /*replyid

is that true ?

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3815763 posted 7:56 pm on Jan 2, 2009 (gmt 0)

Your last suggestion is probably too wide. It will block anything with "id" or "replyid" anywhere in the URL or in the parameters. That's likely going to block some stuff that you don't want blocked. That is, "id" would block anything with "catid" and "ocid" and "docid" as well.

Do you know if the parameters ever appear in a different order?

Do you know if URLs with shownews ever have additional parameters and that you will not want to block those?

Otherwise, I would do:

User-agent: *
Disallow: /comments.php?id=
Disallow: /comments.php?action=
Disallow: /comments.php?highlight=
Disallow: /comments.php?ocid=
Disallow: /comments.php?replyid=
Disallow: /comments.php?catid=

Maybe others too?

You need to be aware of every possible format that could be requested.

You also need to be aware that with parameters in a different order you have a duplicate URL for the same content.

Jonesy

5+ Year Member



 
Msg#: 3815763 posted 9:32 pm on Jan 4, 2009 (gmt 0)

Yes, the last suggestion is too wide (by far) and doubly redundant. :)
Referring to:

Disallow: /*id
Disallow: /*replyid

the case of "/*id" will disallow
/avoidupois, /ridiculous, /rancid, /recidivism,
/riddle, /typhoid, /zircofluoride, etc., usw.,
and
anything that looks like: "/*replyid".

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved