Forum Moderators: goodroi
IP of requester: 209.249.67.146
What they are asking for: -"get/robots.txt HTTP/1.0"
Status: 404
User-agent: *
Disallow:
This will allow robots to spider all your files. Information that you didn't want robots to look at that are sensitive and not for the public or executable files that would fire on a page request would be reasons for exclusion.
Also, how do you know who the codes belong to? ex(209.248.67.146_
That bit of code is known as the IP address. You cannot really trace who the individual person is, but you can trace the company/ISP. A good site for tracing IP's is:
InetCheck [dataphone.se]
Hope this helps
ratman
Do this in notepad or a text editor. Then upload it to your root directory. Make sure its at www.yoursite.com/robots.txt
Make sure the name is robots.txt also.
404 is an error code meaning the document cannot be found.
When dealing with robots.txt and similar "special files" like .htacess, the devil is in the details. If someone says "open it in Notepad" and you can't find Notepad, then it's time for another question - MSWord won't do. After you get the file created and uploaded, then use Brett's robots.txt validator [searchengineworld.com] to test it. One typo (including a missng space) in that file can cause you major problems, including being dropped from all search engines!
eBay has the best prices on the korkus2000 model, last time I checked.
But I've heard that Service Pack 1 for KorkusXP makes it much more stable, and so it might be worth the upgrade if your system is relatively recent. :)
Jim
One typo (including a missing space) in that file can cause you major problems, including being dropped from all search engines!
You may also want to use a different filename like "robotsx.txt", while working on this. After you get it working, rename it to robots.txt. This will prevent a real robot from coming by your site and reading your robots.txt while it is invalid. The robots.txt validator allows you to check any filename, and this is the reason why.
Jim
The point being that you should at least consider a blank robots.txt or one which contains only:
User-agent: *
Disallow:
This will prevent filling up your server logs with a whole bunch of 404-Not Found errors as robots try to fetch robots.txt while they spider your site. And since it contains no filenames in Disallow directives, I doubt it poses a security risk.
You could also build the robots.txt without regard to security issues, and then use second-tier techniques to secure your site, such as using .htaccess or scripting to "trap" access attempts which should not have been made by any User-agent which obeys robots.txt. I use a mixture of these techniques, to good effect.
Jim