Welcome to WebmasterWorld Guest from 54.167.174.11

Forum Moderators: goodroi

Message Too Old, No Replies

Denying one page with multiple variables

page.html?item=1, page.html?item=2, page.html?item=3

   
2:18 pm on Apr 30, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hey everyone,

I was curious if there is an entry you can put into your robots.txt to deny the same page that has multiple variables. For example, I want to deny page.html but I also want to deny page.html?item=1 and page.html?item=2. Is it possible to put something in your robots.txt file that will deny all those pages instead of specifying each page with the item number? I have over 1 thousand items so this will take up a lot of room in my robots.txt file.

Is it possible?

Thanks!

2:39 pm on Apr 30, 2007 (gmt 0)

5+ Year Member



At least for Google, a robots.txt as follows would exclude anything starting with "/page.html" including "/page.html?item=1".

User-agent: *
Disallow: /page.html

But don't take my word for it since I'm completely new to this; check it at Google Webmaster Tools which has a robots.txt analysis tool which allows you to test robots.txt rules against specific urls. V handy indeed.
Dave.

2:20 pm on May 1, 2007 (gmt 0)

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



robots.txt can block entire directories. if you place all of those pages in one directory you can block it with just one line in robots.txt

ps dont forget google, yahoo, msn and all the other robots act differently from each other. just because something works for google does not guarantee it will work for the others.

3:07 pm on May 1, 2007 (gmt 0)

5+ Year Member



Any idea where you can find out the difference between bots when interpreting robots.txt rules?
3:18 pm on May 1, 2007 (gmt 0)

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



the big search engines all have help centers with faqs, i dont know of a document that clearly shows the differences.
6:10 pm on May 3, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thank you for all your replies. It looks like Google has removed the pages viewcart.html?item_id= for all items.

I guess the Disallow: /viewcart.html in the robots.txt works for all variables and values inputed along with the file.

Thanks for your help! :)

6:21 pm on May 7, 2007 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Disallow performs a match starting from the left.

If the start of the URL completely matches that in the disallow statement, even if the real URL is longer than the disallow rule, it will still be disallowed.

So, Disallow: /page will disallow any url that starts with / p a g e for example.