homepage Welcome to WebmasterWorld Guest from 54.237.54.83
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Blocking .log file with robots.txt
Is it necessary, and syntax...?
Robert Charlton




msg:1527941
 2:29 am on Nov 10, 2003 (gmt 0)

Haven't seen this covered anywhere. My .log file is in the root directory and is a hidden file.

a) Because it's hidden, is it necessary to block it?

b) Would its starting with a period affect the syntax? Would this be correct?...

Disallow: /.log

 

jdMorgan




msg:1527942
 3:07 am on Nov 10, 2003 (gmt 0)

Robert_Charlton,

a) Is there a link to it? Same answer.
(If there is no link to your .log file, then why "advertise" its existence by publishing its name in robots.txt? I would be more inclined to simply block HTTP access to it and to other sensitive files such as .htpasswd and .htaccess by using directives in httpd.conf or in .htaccess itself to deny access. This would make them inaccessible via HTTP, but still accessible via FTP. Alternatively, you could move .log to a subdirectory, and password-protect that subdirectory if HTTP access to the log file is required.)

b) /.log is the correct pathname, therefore your syntax is correct.

Jim

Robert Charlton




msg:1527943
 3:29 am on Nov 10, 2003 (gmt 0)

If there is no link to your .log file, then why "advertise" its existence by publishing its name in robots.txt?

Jim - Thanks. Your rhetorical question suggests, though, that I'm not fully understanding what spiders can spider.

With test pages I put up, for example... even though they're not linked to from anywhere, I now keep them out of the root and put them in a blocked "test" directory. I started doing this because, on one site, I saw some of them in my Google backlinks. My assumption was, therefore, that a spider doesn't need a link to a file, but that a link to the root directory would suffice... and that a link to mydomain.com/ in fact accomplished this. What am I misunderstanding?

As for taking these out of the root, I tried to go there once before, and it starts to get political (and also a little over my head), so if I can accomplish what I need to with robots.txt alone, that would be preferable.

jdMorgan




msg:1527944
 4:19 am on Nov 10, 2003 (gmt 0)

If there is no link to your .log file, then why "advertise" its existence by publishing its name in robots.txt?

Jim - Thanks. Your rhetorical question suggests, though, that I'm not fully understanding what spiders can spider.


Spiders can spider anything they find a link to - a link in the conventional sense. The thrust of my warning about "advertising" is this:

Assume I am your competition. I decide to check out your site. I type in www.example.com/robots.txt. Bingo, there's your .log file listed right there in robots.txt! Wow! That sucker's not even password-protected -- probably because of some political turf battle between competing technically-incompetent department managers or V.P.s at example.com. OK, set up a little script to download this bad boy once an hour, send the results upstairs for competitive analysis, and ask the boss for a raise...

Fun, huh?

Logs should be password-protected, or at least not flaunted.

As to how your test pages got exposed, I'm not sure. I keep all my test pages behind a firewall so they are not accessible to the 'net, and it's never happened to me. We have stats and log theories, Google Toolbar tracking theories, and various other theories about how our 'secret' pages get exposed, but I've no direct experience, and therefore I'll decline to speculate.

Jim

Robert Charlton




msg:1527945
 5:06 am on Nov 10, 2003 (gmt 0)

Bingo...

Got it. Thanks.

That sucker's not even password-protected -- probably because of some political turf battle between competing technically-incompetent department managers or V.P.s at example.com.

You got it too. ;)

Mohamed_E




msg:1527946
 1:58 pm on Nov 10, 2003 (gmt 0)

As to how your test pages got exposed, I'm not sure. I keep all my test pages behind a firewall so they are not accessible to the 'net, and it's never happened to me. We have stats and log theories, Google Toolbar tracking theories, and various other theories about how our 'secret' pages get exposed, but I've no direct experience, and therefore I'll decline to speculate.

I think that Jim hit a very important nail right on the head.

There is enough anecdotal evidence of files with "no links" being spidered to cause any prudent (not paranoid!) person to stop and think. How they get spidered is irrelevant, they apparently occasionally do. So sensitive material loaded onto a web server must be protected by a password or otherwise.

Robert Charlton




msg:1527947
 6:59 am on Nov 11, 2003 (gmt 0)

Thanks. A discussion with the client has in fact raised some interest in putting our .log file, etc, into a password protected directory. :)

Rather than get way off the topic area of this forum, I'm posting a follow up question about how to proceed in the Apache forum, at the following thread:

Using .htpasswd with .htaccess
Some elementary questions on password protecting a directory
[webmasterworld.com...]

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved