Welcome to WebmasterWorld Guest from 54.198.200.157

Forum Moderators: goodroi

Message Too Old, No Replies

Denying one page with multiple variables

page.html?item=1, page.html?item=2, page.html?item=3

     
2:18 pm on Apr 30, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 25, 2003
posts:664
votes: 0


Hey everyone,

I was curious if there is an entry you can put into your robots.txt to deny the same page that has multiple variables. For example, I want to deny page.html but I also want to deny page.html?item=1 and page.html?item=2. Is it possible to put something in your robots.txt file that will deny all those pages instead of specifying each page with the item number? I have over 1 thousand items so this will take up a lot of room in my robots.txt file.

Is it possible?

Thanks!

2:39 pm on Apr 30, 2007 (gmt 0)

New User

10+ Year Member

joined:Apr 27, 2007
posts:11
votes: 0


At least for Google, a robots.txt as follows would exclude anything starting with "/page.html" including "/page.html?item=1".

User-agent: *
Disallow: /page.html

But don't take my word for it since I'm completely new to this; check it at Google Webmaster Tools which has a robots.txt analysis tool which allows you to test robots.txt rules against specific urls. V handy indeed.
Dave.

2:20 pm on May 1, 2007 (gmt 0)

Administrator from US 

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 21, 2004
posts:3241
votes: 202


robots.txt can block entire directories. if you place all of those pages in one directory you can block it with just one line in robots.txt

ps dont forget google, yahoo, msn and all the other robots act differently from each other. just because something works for google does not guarantee it will work for the others.

3:07 pm on May 1, 2007 (gmt 0)

New User

10+ Year Member

joined:Apr 27, 2007
posts:11
votes: 0


Any idea where you can find out the difference between bots when interpreting robots.txt rules?
3:18 pm on May 1, 2007 (gmt 0)

Administrator from US 

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 21, 2004
posts:3241
votes: 202


the big search engines all have help centers with faqs, i dont know of a document that clearly shows the differences.
6:10 pm on May 3, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 25, 2003
posts:664
votes: 0


Thank you for all your replies. It looks like Google has removed the pages viewcart.html?item_id= for all items.

I guess the Disallow: /viewcart.html in the robots.txt works for all variables and values inputed along with the file.

Thanks for your help! :)

6:21 pm on May 7, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Disallow performs a match starting from the left.

If the start of the URL completely matches that in the disallow statement, even if the real URL is longer than the disallow rule, it will still be disallowed.

So, Disallow: /page will disallow any url that starts with / p a g e for example.