Forum Moderators: phranque
I use SSI includes on most of my websites - I design them in from the start of almost every project these days because it makes things easier for site-wide updating of headers, footers, or menu bars.
Today I discovered something in my logs that has all but panicked me! Someone came into the site calling the .html page twice, and rather than a 404 error (I would expect one!) they received a 200OK. I investigated, and found that my pages come up if you call the .html file twice, e.g.
www.example.com/page1.html/page1.html
Each place on the page where I have called an include, the "page1.html" starts over, nested inside of "page1.html"
Here is how I set up the SSI include in my pages:
<!--#include virtual="menubar.html" -->
my .htaccess file has these lines near the bottom of the file
#This is to parse all .html files for SSI includes
AddType text/html .html
AddHandler server-parsed .html
My ssi includes work perfectly fine, they are calling in the correct code that is in the menubar.html file, if the correct url is used, e.g. www.example.com/page1.html
If I remove those lines from the .htaccess file, of course my includes don't pull into the page, but at least I get a proper 404 error for the non-existent double-typed url www.example.com/page1.html/page1.html
This nesting thing really has me scratching my head. Does anyone have a suggestion for an alternate .htaccess method that doesn't allow this to happen?
If you want to explicitly prevent this, you can. But it involves adding code to look for "/" after "." and truncate the URL at that point.
Of more concern is, "Who is linking to your site with that malformed URL?" If it's your own site, then you'll have to find the cause and fix it. If it's an external site, ask them to fix it. If that link's already been indexed by search engines, then you'll need to add code to 301-redirect it. Something like this mod_rewrite code snippet:
RewriteRule ^([^.]+)\.([^/]+)/ http://www.example.com/$1.$2 [R=301,L]
I'd be checking for malformed <base href> tags, use of page-relative versus server-relative links, etc.
Jim
And you are absolutely right - after I found this problem, I realized that there could be far reaching implications regarding whether somehow, someone might eventually screw up a real link to my website, and what that might mean if a search engine came into the site like that.
I appreciate your feedback, and the 301 solution might be the only thing that I can implement at this point, and still use ssi includes.