Msg#: 3596679 posted 12:51 pm on Mar 12, 2008 (gmt 0)
Yeah, this does the trick, however I was hoping to avoid listing my site's structure for security reasons. I wonder why the specs are missing the "allow" directive - it seem logical to be able to block the entire site and allow individual files or folders....
Msg#: 3596679 posted 1:11 pm on Mar 12, 2008 (gmt 0)
If you don't want to reveal your site structure, remember that robots.txt matches partial names. You don't need to put the full directory name, just the first letter (except for 'i' and 's', since you're allowing files that start with those).
Msg#: 3596679 posted 1:14 am on Mar 13, 2008 (gmt 0)
You could use .htaccess to 'physically' block access to anything but index.html
The files I'm trying to block from the bot, needs to be accessible by the index.html The way I architected the site, the index.html loads dynamic pages from JOOMLA cms into dynamic DIV. This way I can control what google and other bots index on my site as I don't whish every page of my site to be indexed Another reason is that I am able to load pages into the DIV without having to refresh the entire page.
I think setting .htaccess to block access to these files and folders may cause the site to malfunction.