Forum Moderators: phranque

Message Too Old, No Replies

Robots.txt

         

qimqim

8:40 am on Mar 9, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi

I've inserted the following in the .htaccess file

#hide .txt files
<Files ~ "\.txt">
Order allow,deny
Deny from all
</Files>


But now I am wodering if this denies access to the robots-txt to the likes of Googlebot.

if so, how can I add something to the code to make an exception and allow access to the robots.txt?

Thank you

LATER;

I tried to add this to the above, but it does not seem to work

#hide .txt files
<Files ~ "\robots.txt">
Order allow,deny
Allow from all
</Files>

lucy24

3:49 pm on Mar 9, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



But now I am wondering if this denies access to the robots-txt to the likes of Googlebot.

Yes, it does. It's the precise opposite of what you want. robots.txt is the one file you never want to hide or deny in any way. If you wanted to omit a file from auto-indexing that's a different command-- and it would never apply to robots.txt since that's normally located at the root, the one place you'll always have an index.html file.

Why not simply
<Files "robots.txt">
?

If you need a ~ it's better to go with FilesMatch. (Even the Apache docs say so.) But you don't need it when you're only naming one specific file.

Anything involving \r will always fail, because \r is a "hard" carriage return-- a character that is not likely to occur at all, and certainly not at the beginning of a filename.

qimqim

3:55 pm on Mar 9, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi Lucy

What I want is to deny access to certain .txt files in my site, vut of course I want to give access to the robots.txt-

So, can I use the code I pasted above (the first) and add an "allow" for the robots.txt?

not2easy

4:39 pm on Mar 9, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



So, can I use the code I pasted above (the first) and add an "allow" for the robots.txt?
No, because the
deny
applies to IPs, not filenames.

Wouldn't the simple answer be to put all the .txt files into one folder and deal with folders instead? Your robots.txt would stay at the root, all other .txt files in their /text/ folder and disallow/no-index that folder. If you have links to these .txt files, you want to nofollow those links or they will be crawled anyway.

If there are hundreds of these .txt files this might not be done so easily, but for even a dozen or two it might be worth it.

qimqim

4:59 pm on Mar 9, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Wouldn't the simple answer be to put all the .txt files into one folder and deal with folders instead?


Yes, that would be fine if I can put a rule in .htaccess to deny access. What I do not want id for visitors to enter the url and see the file.

So, what do I have to put in the ..htaccess to deny access to one particualr folder?

LATER

Could the answer be to create another .htaccess, place it in the folder holding the .txt files and write simply

Deny from all


or do I have to add anything else?

LATER STILL...

Well I've put a second .htaccess in that folder and it seems to be working. I hope I am not doing something stupid by having TWO .htaccess files

DISASTER!

Now the .txt file that is needed for aredirect no longer works when called from the other webpage. So, this system does not work. Help....

qimqim

6:02 pm on Mar 9, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



I think I may have found a solution. I expect that I can rename myfile.txt myfile.log and use the .htaccess rules below

#hide .log files
<Files ~ "\.log">
Order allow,deny
Deny from all
</Files>


Can you see any hiccups? Will it affect robots.txt in any way?

lucy24

8:00 pm on Mar 9, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A rule aimed at .log extension cannot possibly affect robots.txt in any way whatsoever.

But what are your .log files doing in publicly accessible directories in the first place? They should be locked away somewhere that http requests can't reach them.

:: uneasily thinking that anyone who knew the URL format (and who wasn't from a blocked IP) could look at my logged header files-- but heck, what would be the point? ::

I hope I am not doing something stupid by having TWO .htaccess files

No, that's fine. Since you can't have a <Directory> section in htaccess, it is often necessary to have more than one htaccess when you need to set different rules for different directories. (Just don't do it with RewriteRules. Those really need to be collected in a single physical file.) It will have no effect on requests or access time, if that's what you are concerned about. Once htaccess is permitted at all, the server then has to look for an .htaccess file in every directory; there's no command that says "There will be no further htaccess files beyond this point".

not2easy

8:16 pm on Mar 9, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



If the files you don't want crawled and indexed are renamed to ".log" in place of ".txt" you don't need to add anything to your htaccess file. In robots.txt you can disallow that folder and/or disallow log files.

If your crawled pages have links to those files, you want to nofollow those links to prevent compliant bots from following the links. The only way to prevent non-compliant bots/scrapers is to prevent access though.

IF these are private files you may want to use better security and maybe password protect the folder where they are held.

Can you see any hiccups?

Adding that to your htaccess file will block access for the same people you were blocking by restricting the folder. It will deny access to "Everyone".

Will it affect robots.txt in any way?

Not at all.

qimqim

8:49 pm on Mar 9, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thank you both.

It is working, but need a fresh brain on a fresh day to check all's well.

Now, time for bed!

Regards