Forum Moderators: phranque

Message Too Old, No Replies

.htaccess extension hiding breaks W3C validation

How to hide the extension, add trailing slash and stay valid?

         

zbbtt

2:10 pm on Jul 1, 2008 (gmt 0)

10+ Year Member



Hello,

I'm using the following to hide the .xhtml extension from a site's addresses.

AddType text/html .xhtml
DirectoryIndex perfil.xhtml
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.xhtml -f
RewriteRule ^(.*)$ $1.xhtml

The xhtml files themselves are valid, but the lack of extension gives me this (and other related) errors when trying to validate them with W3Cs validation tool:


1. Unable to Determine Parse Mode!

The validator can process documents either as XML (for document types such as XHTML, SVG, etc.) or SGML (for HTML 4.01 and prior versions). For this document, the information available was not sufficient to determine the parsing mode unambiguously, because:
* the MIME Media Type (text/html) can be used for XML or SGML document types
* the Document Type (-//W3C//DTD XHTML; 1.0 Strict//EN) is not in the validator's catalog
* No XML declaration (e.g <?xml version="1.0"?>) could be found at the beginning of the document.

As a default, the validator is falling back to SGML mode.
2. Warning Namespace Found in non-XML Document

Namespace "" found, but the -//W3C//DTD XHTML; 1.0 Strict//EN document type is not an XML document type!

Validation Output: 78 Error
1. Error Line 237, Column 27: omitted tag minimization parameter can be omitted only if OMITTAG NO is specified.

... And the list goes on and on with omitted tag errors (again, if I check the xhtml files, they're valid).

Some over at Kirupa's forum said I should try:


AddType text/html .xhtml
DirectoryIndex perfil.xhtml
Options +FollowSymLinks +Indexes
RewriteEngine on
RewriteBase /
RewriteRule ^([^.]+)\.xhtml$ $1 [L]

But this gives me a 404 error.
How can I:
1. Hide the xhtml extension?
2. Automatically add a slash at the end of the address, if one doesn't exist?

I've found a thread on this forum about it [webmasterworld.com], but it's not working either (gets stuck in an infinite loop, adding slashes to the end of the address).

Thank you very much for any assistance!

jdMorgan

3:50 pm on Jul 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



A "page" should never have a trailing slash, so let's leave that for later...

What were the results of adding the page and server headers (such as "<?xml version="1.0"?>;") requested by the validator?

Be aware that AddType works by filename, so the fact that your URLs are extensionless will not stop it from working properly. However the content-type you add should be a valid XML type, not "text/html".

Jim

zbbtt

4:24 pm on Jul 1, 2008 (gmt 0)

10+ Year Member



Hi,

I'm wondering about the slash because adding a slash gives me an internal server error (500). People will often type a slash at the end of addresses, so I don't want them to get a nasty error page thrown back at them. I suppose another way would be removing any slashes if the user is trying to reach a page and not a directory? That would work too, and if it's more correct, all the better.

The files themselves are valid (validated them before trying to hide the extension) and have all the necessary headers, so the problem is with the .htaccess settings I'm using. I added the text/html bit because I was having trouble with IE 8 not opening xhtml files and found an article that had that as a solution -- it works, but is it not correct? (It's forcing .xhtml files to be treated as text/html ones?)

Sorry, Apache is really not my area of expertise, this is just a personal site and I'm trying to get things done as properly as possible. :)
Just not quite sure if I understand all that I'm trying to accomplish.

Thanks very much for your reply.

g1smd

11:14 pm on Jul 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A URL that ends in a slash is for a folder.

You can't fight the HTTP specifications on that one!

jdMorgan

11:30 pm on Jul 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, are these pages XHTML, or not? If they're XML/XHTML, then you wouldn't expect a browser to open them... They should be parsed on the server, and the results output in HTML if you're trying to respond to a browser.

XML and XHTML *are not* "the latest and greatest version" of HTML -- so you'll need to decide what you're doing and why, here. The fact that IE does not support XHTML should be telling you something.

Yes, you can redirect to remove trailing slashes on file requests that have a trailing slash, but as I stated, let's leave that for later... There is no use trying to solve two unrelated problems at once, and the result is often a very-confusing thread.

The first snippet of code you posted above, with the directory and file-exists checks, should work, assuming that you have "Options +FollowSymsLinks -MultiViews" already set on your server.

Jim

zbbtt

12:14 am on Jul 2, 2008 (gmt 0)

10+ Year Member



Hi again,

A URL that ends in a slash is for a folder.

I want to solve this for the benefit of people who might type the slash at the end (when you have short addresses, people will often just type them). And people do that quite often, as I discovered on a previous site that used the same gimmick. So why not get one step ahead and stop it from happening? Users get frustrated quite easily. I guess I'll try stripping the slash, that's a better idea.

XML and XHTML *are not* "the latest and greatest version" of HTML -- so you'll need to decide what you're doing and why, here. The fact that IE does not support XHTML should be telling you something.

It makes me ask why all the other browsers do. I want to use XHTML because it seems to be more strict and in the long run, easier because it doesn't let your mistakes go by and therefore, teaches you to write your code without them. How do I set them up then? Only way I know is the one posted above, googling it showed no other pointers.

The first snippet of code you posted above, with the directory and file-exists checks, should work

Yes the snippet I posted works, it just breaks the validator and I don't know why. You're being a bit vague, which, sorry to say, doesn't help me much. If I had experience with server configurations (don't really have access to the conf file anyway, can only change through .htaccess), I wouldn't be posting this... and yet, there are still some things that need to get done. Where's "Apache for dummies"? ;)

jdMorgan

1:43 am on Jul 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm being vague because it's not clear why you're trying to use XHTML and then return a MIME-type of text/html, and also because you have not commented on what you did about the validator errors and recommendations:

* the Document Type (-//W3C//DTD XHTML; 1.0 Strict//EN) is not in the validator's catalog
* No XML declaration (e.g <?xml version="1.0"?>wink could be found at the beginning of the document.

It wasn't clear from this error report whether or not you specified a DTD reference in your DocType declaration, either.

So unfortunately, I'm being vague because there are just too many loose ends to grab... :(

You need a valid DocType including a DTD link before your page <head>. The MIME-type (HTTP Content-Type header) should agree with that DocType, and you also need an xmlns namespace declaration before the page <head>. Only then can you avoid the validator going into fallback mode and throwing all those errors at you. Your extensionless-URL mod_rewrite code should work just fine, and I suspect that the reason you're now getting errors is that the validator can no longer fall back on the file extension to figure out what kind of page it's looking at in the absence of the other required headers.

I don't know if this will help, but I've only ever used xml+xhtml for mobile-device pages. Here's the stuff in the first three lines of each page before the <head> section, as described above. But again, these are mobile site declarations, and would need to be adjusted for your site:


<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE html PUBLIC "-//WAPFORUM//DTD XHTML Mobile 1.0//EN" "http://www.wapforum.org/DTD/xhtml-mobile10.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US">

As an aside, look for the new version of HTML, coming soon to a Web near you. It would be more appropriate if all you need is stricter HTML document syntax checking.

There is, alas, no such thing as "Apache for Dummies." Web server configuration cannot be both flexible and simple at the same time.

Jim

zbbtt

12:17 pm on Jul 2, 2008 (gmt 0)

10+ Year Member



Hi again,

I'm sorry if I wasn't clear enough in my reply. The valid doctype was there, but now I see that I didn't add the xml declaration on the page I was validating. Since it was validating before the .htaccess tinkering (guess if the file has the extension, the validator is happy enough to continue?) I didn't notice this. At least now I know the source of the problem and corrected it.

It's possible to explain practical examples to people who just want to tweak a thing here or there. But generally speaking, all information I've found on mod-rewrite on the web has two problems: too technical with cryptic explanations or too simple with no explanations given (very few examples are commented like: "this bit is doing _this_ which will achieve _this_ result" and sometimes that's really all that's needed for people to understand). Think somewhere in between would be ideal!

I figure if I stick with XHTML, later on I can start doing more complex things with XML. So I'd rather stay on this path and learn as I go.

Thanks for your patience in trying to explain this to me!

jdMorgan

2:17 pm on Jul 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> (guess if the file has the extension, the validator is happy enough to continue?)

Yes, as I noted above: "I suspect that the reason you're now getting errors is that the validator can no longer fall back on the file extension to figure out what kind of page it's looking at in the absence of the other required headers."

Mod_rewrite is, unfortunately, rather cryptic. First, it uses regular-expressions, which are in themselves a big challenge to learn (learn well). Second, mod_rewrite code is unique in that there is no other 'similar' language -- only the most fundamental programming techniques carry over from previous experience. Third, mod_rewrite code modifies your server's behavior -- often in complex and unintended ways. Without a good grounding and lots of experience, it can be very difficult to diagnose and debug.

I'm not saying mod_rewrite is rocket science -- It is accessible with effort. But the documentation is terse and cryptic in order to satisfy two requirements: First, to *not* be 2000 pages long, and second, to avoid violating Einstein's dictum: "Make everything as simple as possible, but no simpler." :)

Jim

zbbtt

3:51 pm on Jul 2, 2008 (gmt 0)

10+ Year Member



Those mysterious errors are frustrating, especially if you don't know what you're doing (that would be me). I wish the error messages would give more information, like what line is causing the problem. I'm wondering why most hosting companies don't set up something to make these configurations easier for people like me, who just need to get things done for themselves... well, my host already does with things like hotlinking and showing the indexes or not, but that's about it, a bit limited. :(

I've done some string manipulation with regular expressions, but it's been so long, I've forgotten most of it. Like a lot of people, the more complex expressions make my head hurt, as I'm not fond of maths and they start to look too math-like.

Thanks again for your explanation.

g1smd

8:35 pm on Jul 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



*** I want to use XHTML ***

I see no gain over validated HTML 4.01 Strict coding.