wilderness

msg:1529603 | 12:46 am on Jun 6, 2003 (gmt 0) |
Andrea Welcome to Webmaster World. If you have pages on your website(s) which do not have links pointing to them anyplace on the internet? That I would suggest NOT pointing to disallow these pages in robots.txt. Why point a devious bot or person to something which you desire not to exist in a public forum, which for the most part is your robots.txt I'll give you a couple of examples. At one point I was courageous enough to use my standard email and a two-line signature in my Usenet participation mail submissions. I was looking for a way to develop a 6-generational family database structure using MS-Acess. In my inquiry I provided an example to file which was displayed on a page in a folder which is basically private and used sparingly for a few friends. Some four years later, I get an occassioanl referral from google archive of groups looking for that no-longer existent example. :( On another occassion somebody in Usenet was inquiring about the free web pages which are provided with a discount registrar. To show that person that a pop-was added I provided that domain name in a usenet mail. Again four years later I get referrals from that mail in usenet. Initailly when I created the folder my sole intent was to keep it private from the main-stream internet. These two slips taught me valuable lessons. Don
|
andrea edwards

msg:1529604 | 2:35 pm on Jun 7, 2003 (gmt 0) |
Hi Don Thanks very much for the reply. If you are still around I would like to try and clarify your answer because I am new to all this and still find it a bit confusing. Are you saying that if I disallow my private/development files in the robots.txt I will make the existence of these files known. If I dont put these directories in the robots.txt file then the spiders won't know they exist and they can't request them (unless of course they are linked to, which they aren't) Many thanks again Andrea
|
wilderness

msg:1529605 | 3:53 pm on Jun 7, 2003 (gmt 0) |
Andrea Good robots would honor you robots.txt in most instances. The ONLY way that spiders/bots can become aware of a page(s) is if a link to that page exists some other place on the web. If you are the only one aware of the existence of these file/folders, there is no way a bot can find them. (If you sticky me? I'll provide an example) | Are you saying that if I disallow my private/development files in the robots.txt I will make the existence of these files known. If I dont put these directories in the robots.txt file then the spiders won't know they exist and they can't request them (unless of course they are linked to, which they aren't) |
| Yes. DEVIOUS robots in most instances don't even bother with robots.txt. However, there is no need to take a chance on pointing devious bots to either folders or files of which your websites or any other websites DO NOT have links pointing to. So yes I'm suggesting that should you have pages of which are either in the works or used "private/development files" that you do NOT list them in your robots. A much safer solution would be to dump all those files in a folder which denies access to most everybody except desired IP ranges. Jim provided me with a nice rewrite a short while back when I began working on the Oceanic IP ranges. Although it requires some caution, it has been effective. If your interested? Sticky me and I'll attempt to assist you in setting it up. Don
|
jdMorgan

msg:1529606 | 9:37 pm on Jun 7, 2003 (gmt 0) |
Andrea, Welcome to WebmasterWorld [webmasterworld.com]! It has been rumored that visiting a page with the Google Toolbar installed can cause a visit from the Googlebot. Because of this, I'd recommend you add the <meta name="robots" content="noindex"> tag to those pages you absolutely don't want disclosed. If you'd like to write a more compact robots.txt, I posted some suggestions here [webmasterworld.com]. Jim
|
rbs10025

msg:1529607 | 10:22 pm on Jun 7, 2003 (gmt 0) |
DEVIOUS robots in most instances don't even bother with robots.txt. However, there is no need to take a chance on pointing devious bots to either folders or files of which your websites or any other websites DO NOT have links pointing to. |
| And FWIW, I have on occasion seen entries in my server log indicative of real humans viewing my robots.txt files and then deliverately checking out the "disallowed" directories.
|
wilderness

msg:1529608 | 11:33 am on Jun 8, 2003 (gmt 0) |
Cannot install the thing Jim. I have Active X turned off, which it requires.
|
andrea edwards

msg:1529609 | 7:53 pm on Jun 8, 2003 (gmt 0) |
Hello Thank-you to Don and Jim for your replies. I am definitely wiser than before. And thanks for the link to the great page about reducing the robots.txt file. Don, I am interested in finding out about the script to restrict access to a page. I will mail you about this. Many thanks Andrea
|
|