Welcome to WebmasterWorld Guest from 54.147.236.192

Forum Moderators: goodroi

Message Too Old, No Replies

Robots.txt best way for blocking urls with ? in them?

     
4:46 pm on Jun 19, 2009 (gmt 0)

Junior Member

10+ Year Member

joined:July 12, 2006
posts:53
votes: 2


I have a site that uses a ? in the url for pagination.

e.g mywidgets.com/widget.html is the base url and all the urls for pagination of this url is are mywidgets.com/widget.html?page=2, mywidgets.com/widget.html?page=3 etc

So I am thinking to block the spiders from these paginated urls (which all share exactly the same metadata, title etc) I need to add the following line to my robots.txt:

disallow *?

What you guys think? is this the best method?

11:43 pm on June 21, 2009 (gmt 0)

Administrator from US 

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 21, 2004
posts:3154
votes: 129


The metadata and the title are the same but is the text on the actual page the same? If the content is different I would not want to block it from the search engines. If the content is the same you may want to block it.

Using wildcards in the robots.txt is an easy way to deal with the situation but smaller bots do not support it. My personal preference would be to change the url structure of your site.