homepage Welcome to WebmasterWorld Guest from 54.161.236.92
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Robots.txt best way for blocking urls with ? in them?
MrBlack




msg:3936872
 4:46 pm on Jun 19, 2009 (gmt 0)

I have a site that uses a ? in the url for pagination.

e.g mywidgets.com/widget.html is the base url and all the urls for pagination of this url is are mywidgets.com/widget.html?page=2, mywidgets.com/widget.html?page=3 etc

So I am thinking to block the spiders from these paginated urls (which all share exactly the same metadata, title etc) I need to add the following line to my robots.txt:

disallow *?

What you guys think? is this the best method?

 

goodroi




msg:3937922
 11:43 pm on Jun 21, 2009 (gmt 0)

The metadata and the title are the same but is the text on the actual page the same? If the content is different I would not want to block it from the search engines. If the content is the same you may want to block it.

Using wildcards in the robots.txt is an easy way to deal with the situation but smaller bots do not support it. My personal preference would be to change the url structure of your site.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved