Forum Moderators: goodroi

Message Too Old, No Replies

How do I use a wild card for many pages?

wild card?

         

maha

5:38 am on Mar 3, 2006 (gmt 0)

10+ Year Member



My current robots.txt looks like this:

User-agent: *
Disallow: /admin/
Disallow: /download/
Disallow: /temp/
Disallow: /pub/
Disallow: /product1product_info.html
Disallow: /product1/product_info.html?manufacturers_id=6
Disallow: /product2product_info.html
Disallow: /product2/product_info.html?manufacturers_id=6
Disallow: /product3product_info.html
Disallow: /product3/product_info.html?manufacturers_id=6
Disallow: /product4product_info.html
Disallow: /product4/product_info.html?manufacturers_id=6
Disallow: /product5product_info.html
Disallow: /product5/product_info.html?manufacturers_id=6

I need to add about 250 more product Disallow pages. Is it possible to add a wild card so I don't have to enter every single one of my products pages?

say something like:

Disallow: /product*product_info.html
Disallow: /product*/product_info.html?manufacturers_id=6

Is this valid?

Thanks in advance!

ChadSEO

8:51 pm on Mar 3, 2006 (gmt 0)

10+ Year Member



Maha,

Wildcards are not valid in the robots.txt standard. If you don't have 500 lines in your robots.txt file, another option would be to add the appropriate meta tag at the top of the product_info page:

<META NAME="ROBOTS" CONTENT="NOINDEX">

CHad

coho75

9:32 pm on Mar 3, 2006 (gmt 0)

10+ Year Member



maha, you may want to play with this tool a bit:
[searchengineworld.com ]

maha

9:39 pm on Mar 3, 2006 (gmt 0)

10+ Year Member



Hi Chad,

thanks for the reply. Unfortunatelly, adding the metatag is not an option since these product pages are generated by the shopping cart software.

I guess I just have to enter each product one by one into the robots.txt.. :-(

Is there a size limit to robots.txt file?

---------------------------------------
Wildcards are not valid in the robots.txt standard. If you don't have 500 lines in your robots.txt file, another option would be to add the appropriate meta tag at the top of the product_info page:

<META NAME="ROBOTS" CONTENT="NOINDEX">

CHad

Dijkgraaf

10:37 am on Mar 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You might want to restucture your site, so that all the things you don't want spidered are in a below a (virtual) folder. Then you just have one line exclude that one folder, and there you are done.

watercrazed

1:38 am on Mar 12, 2006 (gmt 0)

10+ Year Member



While the wildcards are not valid in the standard implict wildcards are

Disallow: /product*product_info.html
Disallow: /product*/product_info.html?manufacturers_id=6

So that
Disallow: /product
would get rid of everything following product
as in /product*/*

Googlebot does recognize wildcards
Not sure about ones in the middle of an expression though.

Besure to test with one of the validators