Forum Moderators: phranque

Message Too Old, No Replies

htaccess doubts

         

qimqim

4:45 pm on Nov 20, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi

Some months ago you were very kind to help me sort out my .htaccess file.

i have two little doubts

a) Webmasters Tools keeps throwing up an error regarding http://example.net/

it is the final slash that is apparently not covered in the following code:

#4a index redirect

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.html
RewriteRule ^(.*)index.html$ http://example.net/$1 [R=301,L]

#4b domain-name canonicalization redirect

RewriteCond %{HTTP_HOST} !^(mysite\.net)?$ [NC]
RewriteRule ^(.*)$ http://example.net/$1 [R=301]

b) I've added a line to the file to get rid of buttons-for-website.com
is the code correct, or have I replicated [NC, OR] unnecessarily?


#1 # block visitors referred from indicated domains

RewriteCond %{HTTP_REFERER} semalt\.com [NC,OR]
RewriteCond %{HTTP_REFERER} buttons\-for\-website\.com [NC,OR]

RewriteCond %{HTTP_USER_AGENT} libwww-perl
RewriteRule .* – [F]

[edited by: Ocean10000 at 6:06 pm (utc) on Nov 20, 2014]
[edit reason] Examplified [/edit]

not2easy

5:43 pm on Nov 20, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I took a look at the original resolution and it looks like your #4a still has no anchor to capture the pattern that is appended in the Rule. Back here: [webmasterworld.com...] lucy24 posted:
The order is right, but the labels are a little inaccurate. The pattern for the index redirect has no closing anchor, so this rule includes requests for "index.html/" alongside the ordinary "index.html". Rule 4c then scoops up any remaining requests for "something-else.html/". Or, technically, .html with any kind of appended garbage; it just happens to be / here.

What else is in the current version?

Please use example.net or example.com in place of mysite (example.com can't link). That will prevent the accidental linking that makes the post hard to read.

qimqim

6:00 pm on Nov 20, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi not2easy

I'm afraid I never managed to understand htaccess despite yours and Lucy's good and long efforts. The brain is no longer what it used to be...

Here goes what I have now. Any help correcting it will be much appreciated.

#Use PHP5.4 Single php.ini as default

AddHandler application/x-httpd-php54s .php

#
AddType text/x-component .htc


#Do not allow access to the directories -For security reasons, Option followsymlinks cannot be overridden.

Options -Indexes +SymLinksIfOwnerMatch
RewriteEngine on




#1 # block visitors referred from indicated domains

RewriteCond %{HTTP_REFERER} semalt\.com [NC,OR]
RewriteCond %{HTTP_REFERER} buttons\-for\-website\.com [NC,OR]

RewriteCond %{HTTP_USER_AGENT} libwww-perl
RewriteRule .* – [F]

#2 bandwidth theft
RewriteCond %{HTTP_REFERER} !^http://example\.net/

RewriteRule .*\.(jpe?g|gif|png|bmp)$ - [F]

#3 redirects from file that changed name

#3a
RewriteRule ^Pinto/oldindex\.html http://example.net/Pinto/oldindex.php [R=301,L]

# 3b
RewriteRule ^Asia/Indonesia/bali\.html http://example.net/Asia/Indonesia/bali.php [R=301,L]

# 3c
RewriteRule ^Asia/Indonesia/indonesia\.html http://example.net/Asia/Indonesia/indonesia.php [R=301,L]

# 3d
RewriteRule ^Americas/DomRepublic/DomRepublic\.html http://example.net/Americas/DomRepublic/StoDomingo.php [R=301,L]



#4a index redirect

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.html
RewriteRule ^(.*)index.html$ http://example.net/$1 [R=301,L]

#4b domain-name canonicalization redirect

RewriteCond %{HTTP_HOST} !^(example\.net)?$ [NC]
RewriteRule ^(.*)$ http://example.net/$1 [R=301]

#5
# BEGIN EXPIRES
<IfModule mod_expires.c>
ExpiresActive On
ExpiresDefault "access 1 week"
ExpiresByType text/css "access plus 1 week"
ExpiresByType css/js "access plus 1 week"
ExpiresByType text/plain "access plus 1 week"
ExpiresByType image/gif "access plus 1 month"
ExpiresByType image/png "access plus 1 month"
ExpiresByType image/jpg "access plus 1 month"
ExpiresByType image/jpeg "access plus 1 month"
ExpiresByType application/x-javascript "access plus 1 month"
ExpiresByType text/javascript "access plus 2 month"
ExpiresByType application/javascript "access plus 1 week"
ExpiresByType application/x-icon "access plus 1 year"
</IfModule>
# END EXPIRES



<IfModule mod_deflate.c>
<FilesMatch "\.(js|css|html|php)$">
SetOutputFilter DEFLATE
</FilesMatch>
</IfModule>


#Set charset

<filesMatch "\.(htm|html|css|js|php)$">
AddDefaultCharset UTF-8
</filesMatch>

not2easy

6:57 pm on Nov 20, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



We're just trying to help you sort it all out. I see that the catch-all #4c from the previous recommendations was left off, I need to go back and review some of that, but it looked like it was supposed to catch requests and deal with the slash. The Rule #4a here does not have a closing anchor at the end of the rule, it should have $ or ?$ if it captures a query string.

Please don't run and change it, because it may not fit what you want and before I could offer real help I do need to go back and get notes about what exactly you are wanting to happen. Then we can see why it isn't.

Maybe lucy24 can take a quick look and remember right away but I am not familiar with this exact situation.

BTW - it is a good idea for readability to leave a line between rules, but not between conditions and rules, like this:
#2 bandwidth theft
RewriteCond %{HTTP_REFERER} !^http://example\.net/

RewriteRule .*\.(jpe?g|gif|png|bmp)$ - [F]

lucy24

7:18 pm on Nov 20, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



leave a line between rules

Easiest point first ;) In Apache, unlike in robots.txt, a blank line never has syntactic meaning. (Careful! A blank space within a line does have meaning.) So you can and should use blank lines to make it easier for a human-- including yourself-- to read your htaccess.

a) Webmasters Tools keeps throwing up an error regarding http://example.net/

it is the final slash that is apparently not covered

No, this is a misunderstanding. Coincidentally it's one that has come up in several recent threads. This specific slash -- the one immediately after the hostname -- is not seen by the server and therefore has no effect on RewriteRules. mod_rewrite only sees the part after the slash. So, in htaccess,

RewriteRule ^$ blahblah
=
request for
example.com/
and/or
example.com
depending on the whim of the browser.

If you need to distinguish between example.com and example.net, that's a RewriteCond looking at %{HTTP_HOST}. This, in turn, can only be done if the htaccess file covers more than one hostname or (sub)domain.

Maybe it was a badly chosen example. Are you back on the issue of
/directory
vs.
/directory/
or possibly
/directory/pagename.html/
? Or is there some entirely different wmt issue that actually has nothing to do with the final slash?

qimqim

8:12 pm on Nov 20, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi Lucy

Thank you for helping me.

All I am trying to do is get rid of Webmasters Tools from continually not being able to find http://example.net/

I'm not sure how important that is, wether I should just dismiss it or ask you If I could change the htaccess file to take that in consideration, i.e. to accept example.net/ and open the page eample.net.

not2easy

9:00 pm on Nov 20, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



If you paste the http://example.net/ URL in your browser's address bar do you go to the home page? If so, not to worry. If you get a 404 that is a serious problem.

@ lucy24 - Not to take this OT, but the suggestion to leave a blank line between rules to help with cleanup was not referring to syntax, but readability. I was referring to:
#2 bandwidth theft
RewriteCond %{HTTP_REFERER} !^http://example\.net/

RewriteRule .*\.(jpe?g|gif|png|bmp)$ - [F]


which seems more readable as:
#2 bandwidth theft
RewriteCond %{HTTP_REFERER} !^http://example\.net/
RewriteRule .*\.(jpe?g|gif|png|bmp)$ - [F]

Just trying to avoid confusing the OP :)

lucy24

9:30 pm on Nov 20, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



to accept example.net/ and open the page e[x]ample.net

Again: These are the same page. Your server simply cannot see that final slash.

If you enter
http://example.net/
in your browser, what do you get?

Webmasters Tools ... continually not being able to find http://example.net/

What does "not able to find" mean? I haven't added a site in a while, so I don't remember what their error messages look like. Are you sure they're saying they can't find the site at all? Or are they saying that they can't find the verification file that proves you own the site? Remember, you have to leave the file there permanently, because they will keep checking for it. Also keep in mind that in order to tell wmt that www.example.net and example.net are the same site, you have to create accounts for both, even if one form redirects to the other (as it should).

I was referring to

No worries, not2easy, I was agreeing with you ;) In fact it's even worse, because I recently found by experiment that you can take directives that have nothing to do with mod_rewrite and shove them into the middle of a ruleset without affecting rule execution in any way. (This was definitely an unnerving discovery. But it's why I have a test site.)

qimqim

9:41 pm on Nov 20, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi

Well, although I belie´ve that in the past I was getting a 404 when I wrote example.net/, the fact is that right now it opened my page. I have resubmitted the url to Fetcha as Google- let's see what happens. From what I gathered from your messages there is nothing that can/sjhould be done in the htaccess file on this slasj«h matter.

Could you confirm, please, that the line I added in #1 is ok?

#1 # block visitors referred from indicated domains

RewriteCond %{HTTP_REFERER} semalt\.com [NC,OR]
RewriteCond %{HTTP_REFERER} buttons\-for\-website\.com [NC,OR]

RewriteCond %{HTTP_USER_AGENT} libwww-perl
RewriteRule .* – [F]

lucy24

6:36 am on Nov 21, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yup, but get rid of that empty line in the middle. It doesn't affect the rule in any way whatsoever, but may cause you unwanted confusion in the future.

Where you say .* in the rule you can instead say .? but this is a matter of nanoseconds.

Also: leave off [NC] unless you particularly need it. It's only four* bytes of code, but it means the server is really looking for (for example)
[Ss][Ee][Mm][Aa]
... et cetera, I got tired just trying to type it!

buttons\-for\-website\.com

Hyphens do not need to be escaped in Regular Expressions except when they occur inside grouping brackets (a reversal of the usual pattern where most things don't need escaping inside brackets). Escaping hyphens will not have unintended consequences; it just isn't necessary and adds two bytes to the filesize.

Now, personally I like to constrain [F] rules to requests for pages:
RewriteRule (^|/|\.html)$ - [F]

so the server doesn't have to stop and evaluate Conditions on every single request ever. It's exceedingly rare for a robot to request non-page files-- especially when it hasn't been allowed to see the page the supporting files belong to. But that's a judgement call depending on your individual site.


* Yes, OK, 3-5 bytes depending on what other flags are present.

qimqim

6:56 am on Nov 21, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



#1 # block visitors referred from indicated domains

RewriteCond %{HTTP_REFERER} semalt\.com [OR]
RewriteCond %{HTTP_REFERER} buttons\-for\-website\.com [OR]
RewriteCond %{HTTP_USER_AGENT} libwww-perl
RewriteRule (^|/|\.html)$ - [F]



Hi Lucy

Thank you. Your reply is too technical for me... but I think I got the gist of it.

At the beginning you write "Where you say .* in the rule you can instead say .? "

but then at the end you suggest "RewriteRule (^|/|\.html)$ - [F] "

As the only .* that I can see is on that line I guess I can now ignore it.

My main concern then is the buttons\-for\-website\.com . I don't even know what you mean by "escaping". Is it the slashes? How should I write the url, then?

Many thanks

wilderness

12:48 pm on Nov 21, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteCond %{HTTP_REFERER} semalt\.com [OR]
RewriteCond %{HTTP_REFERER} buttons\-for\-website\.com [OR]


FWIW, you don't need the entire name and/or domain.
You could easily combine these on one line.

# Refer contains buttons or semalt
RewriteCond %{HTTP_REFERER} (buttons|semalt) [OR]

qimqim

1:42 pm on Nov 21, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi

RewriteCond %{HTTP_REFERER} (buttons|semalt) [OR]


But how does the system know if semalt is .com or .net or .co.uk?
Also buttons is only the beginning of their name which is buttons-for-website.com

Thanks, but I am forever confused with this htaccess business...

wilderness

1:54 pm on Nov 21, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It really doesn't matter what the domain extension is, at least in these instances.

the anchor applied is the 'contains', which translates to the keyword being used anywhere in the referring line.

Whats the possibility of good/valid visitor with the same domain name and a different extension?

Even the google bot ONLY uses COM and not NET, thus why would you make an allowance for NET?

BTW, the buttons that I'm seeing ONLY come from a COM as well, thus why would you allow NET, or even be worried about the potential of a valid visitor from what would be a sister-domain?

When using htaccess restrictions for visitors you must understand basic 'anchors'.
You should also keep in mind the versatility of NOT focusing on terminology that focuses upon single visitor-applications, rather, multiple visitors.

qimqim

3:40 pm on Nov 21, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi

I've just realized that what Google was moaning about was http://example.net/index.html/

I guess I can ignore this as I cannot see any visitor getting to my site typing that url

not2easy

3:56 pm on Nov 21, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



What happens if you type/paste that into your browser? Google may be following someone's link to your site or an old link somewhere on your site to the home page? It needs to 301 to your home page.

qimqim

3:59 pm on Nov 21, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



That link produces a 404. What do I have to add to the htaccess, please?

lucy24

7:38 pm on Nov 21, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



At the beginning you write "Where you say .* in the rule you can instead say .? "

but then at the end you suggest "RewriteRule (^|/|\.html)$ - [F] "

Two different approaches. Use one or the other depending on situation.

But how does the system know if semalt is .com or .net or .co.uk?

The chances are pretty minute that there exists a non-spam-referring domain called semalt.some-other-tld. In the case of "buttons", you do want to stick with the full (semalt|buttons-for-website). Yes, "escape" in Regular Expressions means the preceding backslash. It's used when some character has a special meaning, for example ? question mark means "this bit is optional", and you need to say "here I mean a literal question mark".

I've just realized that what Google was moaning about was http://example.net/index.html/

I guess I can ignore this as I cannot see any visitor getting to my site typing that url

This is part of a pretty big category "Problems you don't need to think about unless they actually happen." The fact that google is complaining about it means that the URL has been found somewhere. You can choose to ignore it and let it return a natural 404. Or, if it occurs a lot, you can make a rule like this (the Condition is, paradoxically, to save the server work)

RewriteCond %{REQUEST_URI} ^([^.]+\.html)
RewriteRule \.html. http://www.example.com/%1 [R=301,L]


Did you mean literally "index.html/" or did you just make that up for the example? If it really is "index.html/" you don't need another rule. You probably already have an index redirect. Just omit the closing anchor and the problem will go away.

qimqim

1:57 pm on Nov 22, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi

I've updated the following:
#1 # block visitors referred from indicated domains


RewriteCond %{HTTP_REFERER} (semalt|buttons-for-website) [OR]
RewriteCond %{HTTP_USER_AGENT} libwww-perl
RewriteRule .* – [F]


but in which category shall I add


RewriteCond %{REQUEST_URI} ^([^.]+\.html)
RewriteRule \.html. http://www.example.com/%1 [R=301,L]


in this category?


#4a index redirect

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.html
RewriteRule ^(.*)index.html$ http://example.net/$1 [R=301,L]

#4b domain-name canonicalization redirect

RewriteCond %{HTTP_HOST} !^(example\.net)?$ [NC]
RewriteRule ^(.*)$ http://example.net/$1 [R=301]

lucy24

7:54 pm on Nov 22, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



#4a index redirect

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.html
RewriteRule ^(.*)index.html$ http://example.net/$1 [R=301,L]

If you omit the $ after \.html (ahem) in the Rule, this existing rule will also cover requests for /directory/index.html/ and so on.

but in which category shall I add

If you do need to handle requests for /filename.html/more-stuff-here then that would become 4b, while the index redirect remains 4a and domain-name-canonicalization is 4c. The idea is to intercept requests for index.html (with or without trailing slash) before requests for filename.html/more-stuff, because index.html is a more specific case.

qimqim

9:56 pm on Nov 22, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thank you Lucy, but I can't quite understand.

first can you confirm that the #1 is correct?

As for the rest all I want to do is to cover the example.net/index.html/ which Webmasters Tools keep flagginbg as an error.
Your comment about $ left me confused. Are you saying I should add it?

lucy24

11:28 pm on Nov 22, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



first can you confirm that the #1 is correct?

Yup, that looks fine.

Your comment about $ left me confused. Are you saying I should add it?

No, you should remove it. Just the single $ character, nothing else. If you omit $ from the pattern in your existing rule
RewriteRule ^(.*)index.html$ http://example.net/$1 [R=301,L]

vs.
RewriteRule ^(.*)index\.html http://example.net/$1 [R=301,L]

then the index redirect will also take care of any "index.html/" requests. I don't much care for (.*) in this location, but that's an unrelated issue.

This is assuming that your google issue is specifically "index.html/" as opposed to "some-other-page-name.html/"

qimqim

8:09 am on Nov 23, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi Lucy

many thanks

From the above thread I take it that the larger box below is correct, even if semal and buttons.. could be on the same line as in

RewriteCond %{HTTP_REFERER} (semalt|buttons-for-website) [OR]



#1 # block visitors referred from indicated domains

RewriteCond %{HTTP_REFERER} semalt\.com [NC,OR]
RewriteCond %{HTTP_REFERER} buttons\-for\-website\.com [NC,OR]

RewriteCond %{HTTP_USER_AGENT} libwww-perl
RewriteRule .* – [F]



and rule #4 should now become


#4a index redirect

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.html
RewriteRule ^(.*)index\.html http://example.net/$1 [R=301,L]

#4b domain-name canonicalization redirect

RewriteCond %{HTTP_HOST} !^(example\.net)?$ [NC]
RewriteRule ^(.*)$ http://example.net/$1 [R=301]



PS - meanwhile, after amending the .htaccess file I pasted the url ht[u][/u]tp://mysite.net/index.html/ and I got a screen with "Example domain" and the url changed itself to example.net. In this case "example" was the actual url...

And I get the same with mysite.net/index.html without the leading slash.


PPS I put the ols code back in, but I'm still getting the same: the url that includes index with or without slash jumps to a real example.net with a message about

Example Domain

This domain is established to be used for illustrative examples in documents. You may use this domain in examples without prior coordination or asking for permission.

More information...

[edited by: Ocean10000 at 7:02 pm (utc) on Nov 24, 2014]
[edit reason] examplified [/edit]

lucy24

9:03 am on Nov 23, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



the url that includes index with or without slash jumps to a real example.net

Uhm...

In these forums, example.com (or example.net or example.any-tld) is used as a stand-in for any and all actual domain names.

Two reasons.

One, you're not allowed to mention your own domain. In fact, even naming other people's domains is frowned upon unless it's a Recognized Source like apache.org. "example.com" is a special reserved name; it will never be used by a real domain.

Two, anything beginning in
http://
is automatically converted into a clickable link. This makes posted rules unreadable-- a serious problem in the Apache subforum.

So for posting purposes you say example.com, but in your actual htaccess you change it to your real domain name.

qimqim

12:21 pm on Nov 23, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi Lucy

My threads seem to go on forever...

I am starting to think that this is something that disappears once you clear the cache.
But could you try

http://example.net/index.html/

but introduce my domeain name

Thank you

[edited by: Ocean10000 at 7:23 pm (utc) on Nov 24, 2014]
[edit reason] examplified [/edit]

lucy24

8:17 pm on Nov 23, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I am starting to think that this is something that disappears once you clear the cache.

If things work as intended after you've cleared the cache, everything is fine. Redirect responses are cached by the browser, so changing htaccess may or may not show new results immediately.

"Clear your cache" is one of those all-purpose Fixes For What Ails You, like "Replace your tenor" or "Tighten the wing nuts".

But could you try

Nope, I get the 404 page. If I try it without the ending slash, I end up on example.net, meaning that you have forgotten to fix your htaccess. Open the file in a text editor and
(1) globally replace "example" with your actual site name
(2) delete the $ after "index.html"

qimqim

9:40 pm on Nov 23, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



All very confusing!

Ok, I've changed the file with your recommendation. It is throwing up the example page with both index.html and index.html/

I hope by morning it will clear and start behaving properly.

Now, it's time for bed!

Thanks

lucy24

11:21 pm on Nov 23, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I tried in a different browser to avoid any possibility of caching. I now get example.net whether or not I use a slash after "index.html". Are you positive you've replaced every single occurrence of "example.net" with your actual domain name?

qimqim

5:52 am on Nov 24, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi Lucy

I need my head seen to...

I placed the line of code in the file and did not change from example to my own domain name.

I am getting different results now but I think it is starting to work.

Very sorry!

#4a index redirect

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.html
RewriteRule ^(.*)index\.html http://example.net/$1 [R=301,L]


#4b domain-name canonicalization redirect

RewriteCond %{HTTP_HOST} !^(example\.net)?$ [NC]
RewriteRule ^(.*)$ http://example.net/$1 [R=301]

[edited by: qimqim at 6:53 am (utc) on Nov 24, 2014]

lucy24

6:39 am on Nov 24, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yup, it seems to work :)

The next passing moderator will replace all occurrences of {sitename} in the foregoing post with "example.net". But in the meantime you can see why exemplifying is necessary.
This 31 message thread spans 2 pages: 31