homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

Denying one page with multiple variables
page.html?item=1, page.html?item=2, page.html?item=3

 2:18 pm on Apr 30, 2007 (gmt 0)

Hey everyone,

I was curious if there is an entry you can put into your robots.txt to deny the same page that has multiple variables. For example, I want to deny page.html but I also want to deny page.html?item=1 and page.html?item=2. Is it possible to put something in your robots.txt file that will deny all those pages instead of specifying each page with the item number? I have over 1 thousand items so this will take up a lot of room in my robots.txt file.

Is it possible?




 2:39 pm on Apr 30, 2007 (gmt 0)

At least for Google, a robots.txt as follows would exclude anything starting with "/page.html" including "/page.html?item=1".

User-agent: *
Disallow: /page.html

But don't take my word for it since I'm completely new to this; check it at Google Webmaster Tools which has a robots.txt analysis tool which allows you to test robots.txt rules against specific urls. V handy indeed.


 2:20 pm on May 1, 2007 (gmt 0)

robots.txt can block entire directories. if you place all of those pages in one directory you can block it with just one line in robots.txt

ps dont forget google, yahoo, msn and all the other robots act differently from each other. just because something works for google does not guarantee it will work for the others.


 3:07 pm on May 1, 2007 (gmt 0)

Any idea where you can find out the difference between bots when interpreting robots.txt rules?


 3:18 pm on May 1, 2007 (gmt 0)

the big search engines all have help centers with faqs, i dont know of a document that clearly shows the differences.


 6:10 pm on May 3, 2007 (gmt 0)

Thank you for all your replies. It looks like Google has removed the pages viewcart.html?item_id= for all items.

I guess the Disallow: /viewcart.html in the robots.txt works for all variables and values inputed along with the file.

Thanks for your help! :)


 6:21 pm on May 7, 2007 (gmt 0)

Disallow performs a match starting from the left.

If the start of the URL completely matches that in the disallow statement, even if the real URL is longer than the disallow rule, it will still be disallowed.

So, Disallow: /page will disallow any url that starts with / p a g e for example.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved