| 2:39 pm on Apr 30, 2007 (gmt 0)|
At least for Google, a robots.txt as follows would exclude anything starting with "/page.html" including "/page.html?item=1".
But don't take my word for it since I'm completely new to this; check it at Google Webmaster Tools which has a robots.txt analysis tool which allows you to test robots.txt rules against specific urls. V handy indeed.
| 2:20 pm on May 1, 2007 (gmt 0)|
robots.txt can block entire directories. if you place all of those pages in one directory you can block it with just one line in robots.txt
ps dont forget google, yahoo, msn and all the other robots act differently from each other. just because something works for google does not guarantee it will work for the others.
| 3:07 pm on May 1, 2007 (gmt 0)|
Any idea where you can find out the difference between bots when interpreting robots.txt rules?
| 3:18 pm on May 1, 2007 (gmt 0)|
the big search engines all have help centers with faqs, i dont know of a document that clearly shows the differences.
| 6:10 pm on May 3, 2007 (gmt 0)|
Thank you for all your replies. It looks like Google has removed the pages viewcart.html?item_id= for all items.
I guess the Disallow: /viewcart.html in the robots.txt works for all variables and values inputed along with the file.
Thanks for your help! :)
| 6:21 pm on May 7, 2007 (gmt 0)|
Disallow performs a match starting from the left.
If the start of the URL completely matches that in the disallow statement, even if the real URL is longer than the disallow rule, it will still be disallowed.
So, Disallow: /page will disallow any url that starts with / p a g e for example.