Forum Moderators: phranque

Message Too Old, No Replies

Any SEO implications in using AddType/AddHandler for SSI?

...concerned about possible mod_mime side effects

         

Robert Charlton

2:27 am on Apr 27, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I've searched through the forums about converting extensions of Server Side Include pages from shtml to html. Most threads I could find suggested adding the following 2 lines to an .htaccess file placed in the root directory:

AddType text/html .html 
AddHandler server-parsed .html

In searching for more info on what AddType and AddHandler do, I found these are associated with the Apache mod_mime module... and here I'm at the limits of my understanding, and I'd like to get more comfortable.

First, can anyone clarify what this module, and AddType and Add Handler each do?

What I'm concerned about is that I believe I've encountered mod_mime before, on a host where I think it was set by default... perhaps set incorrectly. It seemed to look at a file, and if the file appeared to be an html file, to return it as html, but with whatever filename had been entered.

I discovered this when Google dropped a page called "domain.com/products.html" in favor of "domain.com/products" (without an extension)... for no reason that I could discover. Eg, there were no links to "domain.com/products" that I could find.

I was told by the host's "support" that this was due to the mod_mime module, and that it was a system wide setting that couldn't be changed. We've changed hosts, so I never did diagnose that problem.

I want to make sure now that, by modifying the SSI extensions with AddType/AddHandler as described above, I'm not also inadvertently creating an infinite number of mirror pages that might get indexed.

Where do I need to be cautious?

jdMorgan

2:44 am on Apr 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Robert,

I think your host was confused. The "automatic" function that caused you problems was most likely mod_mime_magic, not mod_mime. Check out the Apache documentation [httpd.apache.org] for both.

AddType simply associates a file extension with a MIME-type. In simple terms, it tells the server what MIME-type header to return when a file with that extension is served. This is the header you can view with the Server Header checker. It tells the client browser what kind of file it is, so the browser can decide to handle it internally, use a plug-in, or pass the data to an external application for display. For example, html files can be directly displayed, but pdf files need to be handled by a browser plug-in, or passed to Adobe Reader for display.

AddHandler tells Apache that you want that file 'handled' in some special way prior to -- or instead of -- serving it directly to the client. For example, you might want .html extension files to be parsed (scanned) for SSI includes, so you use AddHandler to inform Apache of that fact.

mod_mime_magic, however, is a completely different beast, and could cause the problem you describe if there was just one link pointed to an extensionless version of your page. It 'infers' the MIME-type of a file by reading a few bytes from the file, and sets the MIME-type server response header accordingly. On a system with mod_mime_magic unconditionally enabled, file extensions become meaningless as MIME-type associations, and the MIME-type header in the server response is set based on the *content* of the files.

Jim

Robert Charlton

5:06 am on Apr 27, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Hi Jim,

Thanks for another of your typically thorough and enlightening answers.

I think your host was confused. The "automatic" function that caused you problems was most likely mod_mime_magic, not mod_mime.

No, I'm the one who was confused. ;-(

I'd remembered the hosting situation from over a year ago, and I vaguely remembered the word "magic," along with "mod" and "mime"... but with those underscores in the module names, a partial search match didn't do it. I kept coming up with "use some htaccess magic" and the like when searching for the term in relation to the SSI and .htaccess info I was unearthing.

I think I understand the syntax of the AddType line...

In the second line, I assume that all .html files become server-parsed...

AddHandler server-parsed .html

Does "server-parsed" apply only to SSI includes, or are there other things the server might look for when it scans the file?

In the case of the site we'll be using this on, all of the pages will have includes, so there aren't any unnecessary operations... but, if, say, only 10 of 1000 pages had includes, and the rest were plain vanilla hard-coded html, would the server still be scanning each file it served, even when most weren't SSI pages?

The way I had assumed this conversion would be done would be that the pages with includes would have shtml extensions, and that these only would be converted or renamed. This obviously isn't the way Apache does it. If there is simple further background info on why it doesn't work that way, I'd find it interesting and probably helpful.

jdMorgan

2:53 pm on Apr 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In the second line, I assume that all .html files become server-parsed...

AddHandler server-parsed .html

Does "server-parsed" apply only to SSI includes, or are there other things the server might look for when it scans the file?


Hmmm... Not absolutely sure, actually. I believe server-parsed only applies to SSI, though.

In the case of the site we'll be using this on, all of the pages will have includes, so there aren't any unnecessary operations... but, if, say, only 10 of 1000 pages had includes, and the rest were plain vanilla hard-coded html, would the server still be scanning each file it served, even when most weren't SSI pages?

It would be scanning each filetype specified in the AddHandler directive, yes.

For 'sparse' SSI parsing, a better solution might be to use XBItHack -- See Apache mod_include
The downside of XBitHack is that your maintenance staff needs to be sharp -- or they need to work from a formal maintenance procedures script -- to make sure the files with Xbits set keep them set and those without don't get them set unexpectedly.
Another option is to set AddHandler on a per-directory basis (.htaccess) so that only files in certain subdirectories are parsed for SSI.

The way I had assumed this conversion would be done would be that the pages with includes would have shtml extensions, and that these only would be converted or renamed. This obviously isn't the way Apache does it. If there is simple further background info on why it doesn't work that way, I'd find it interesting and probably helpful.

Not sure I understand this q. Add SSI to a page. Rename the page to .shtml. Change all on-site links to it to reflect this new name. Maybe add a Redirect to fix requests using external links to the old page. In the absence of an AddHandler directive, only .shtml pages will be parsed for SSI; This is the default Apache behaviour. Otherwise, you can use XBitHack or AddHandler, either globally or on a per-directory basis, to achieve what you want (or at least a close approximation). In the context of this discussion, XBitHack and AddHandler are useful to avoid having to rename the files at all.

Jim

Robert Charlton

6:40 am on May 1, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Jim - Again, thanks.

The way I had assumed this conversion would be done would be that the pages with includes would have shtml extensions, and that these only would be converted or renamed....

Not sure I understand this q.

I was just re-inventing Apache in my mind, in a way that ultimately wouldn't be useful. ;)

If the server changed all the ssi page extensions from .shtml to .html on the fly, you'd have a hell of a time building the site and testing the links without a server running. As it is, I assume you need a server to try out your includes.