Forum Moderators: phranque

Message Too Old, No Replies

SSI includes nesting multiple times

Is my htaccess syntax wrong?

         

cws3di

7:44 pm on Oct 7, 2005 (gmt 0)

10+ Year Member



I posted this question earlier in the HTML forum, but the more I test and try different code to solve this problem, the more I am convinced that I have something wrong with the syntax in my .htaccess file

I use SSI includes on most of my websites - I design them in from the start of almost every project these days because it makes things easier for site-wide updating of headers, footers, or menu bars.

Today I discovered something in my logs that has all but panicked me! Someone came into the site calling the .html page twice, and rather than a 404 error (I would expect one!) they received a 200OK. I investigated, and found that my pages come up if you call the .html file twice, e.g.

www.example.com/page1.html/page1.html

Each place on the page where I have called an include, the "page1.html" starts over, nested inside of "page1.html"

Here is how I set up the SSI include in my pages:

<!--#include virtual="menubar.html" -->

my .htaccess file has these lines near the bottom of the file

#This is to parse all .html files for SSI includes
AddType text/html .html
AddHandler server-parsed .html

My ssi includes work perfectly fine, they are calling in the correct code that is in the menubar.html file, if the correct url is used, e.g. www.example.com/page1.html

If I remove those lines from the .htaccess file, of course my includes don't pull into the page, but at least I get a proper 404 error for the non-existent double-typed url www.example.com/page1.html/page1.html

This nesting thing really has me scratching my head. Does anyone have a suggestion for an alternate .htaccess method that doesn't allow this to happen?

jdMorgan

9:51 pm on Oct 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This probably isn't related to your .htaccess code at all, it's a matter of how URLs are interpreted. Once the server finds a ".", it figures that anything that follows is the a filetype or something else. And once it finds the "/" at the end of the filetype, it quits. So, technically. you could access /index.html/anything_you_want_here, and the server will server /index.html.

If you want to explicitly prevent this, you can. But it involves adding code to look for "/" after "." and truncate the URL at that point.

Of more concern is, "Who is linking to your site with that malformed URL?" If it's your own site, then you'll have to find the cause and fix it. If it's an external site, ask them to fix it. If that link's already been indexed by search engines, then you'll need to add code to 301-redirect it. Something like this mod_rewrite code snippet:


RewriteRule ^([^.]+)\.([^/]+)/ http://www.example.com/$1.$2 [R=301,L]

My main point here is that the 404 behaviour is an issue outside of the real cause of the problem.

I'd be checking for malformed <base href> tags, use of page-relative versus server-relative links, etc.

Jim

cws3di

10:06 pm on Oct 7, 2005 (gmt 0)

10+ Year Member



Whoever it was came into the web page from a link in a yahoo e-mail, according to the User-Agent in the web stats log. I am assuming somebody must have found my site and sent an e-mail to their friend or something. (I do not advertise by e-mail, but I have good content that people bookmark and tell others about).

And you are absolutely right - after I found this problem, I realized that there could be far reaching implications regarding whether somehow, someone might eventually screw up a real link to my website, and what that might mean if a search engine came into the site like that.

I appreciate your feedback, and the 301 solution might be the only thing that I can implement at this point, and still use ssi includes.