How can I allow ia_archive to index just the index.html page and maybe one or two others? Although googlebot seems to support more sophisticated commands and wildcards, i'm not sure what ia_archive can understand.
I have too many pages to individually disallow, and I don't want to allow everything since in my experience ia_archive can be a bandwidth hog if left unchecked.
physics
6:58 am on Dec 13, 2006 (gmt 0)
Welcome to WebmasterWorld.com sosoo!
You might not be able to do this with robots.txt but you can with .htaccess if you're using Apache (are you?).
phranque
7:08 am on Dec 13, 2006 (gmt 0)
you might find some answers in this relevant thread: [webmasterworld.com...]
sosoo
7:15 am on Dec 13, 2006 (gmt 0)
Yes, i'm using Apache, and I could use .htaccess.
It just seems to make perfect sense if you could so something like this:
rules. you must either put the allowed files in a directory above disallowed files or specifically disallow a list of files after allowing all others. allow is not part of the robots.txt standard. wildcarding is not supported for path names. the correct way to allow all paths is
Disallow
(without a path). the correct way to disallow all paths is
Disallow /
. please see this for details and examples: [robotstxt.org...] you might also try the google webmaster tools for robots.txt verification and testing.