Regex to convert input tags to xhtml - Website Technology Issues forum at WebmasterWorld - WebmasterWorld

Forum Moderators: phranque

Message Too Old, No Replies

Regex to convert input tags to xhtml

I need help to construct a regex expression :)

snowweb

2:52 am on Mar 24, 2008 (gmt 0)

10+ Year Member

Hi Chaps,

I have a website with many pages written to html 4.01 and need to convert it to xhtml. My hardest task is to add the trailing '/' to the input tags on my forms. There are hundreds of input tags!

I can use my editor (PhpED) to search and replace, using regular expressions, but I can't get the regular expression correct (I have VERY limited experience in Regex).

What makes it more difficult, is that many (but not all) of the input tags, also have some PHP code embedded in them. Please see the example below:


<input type = "text" name = "f27d"       size = "20" value = "<?php if(isset($f27d)){ echo $f27d ;}else {echo @$tax_agent_acc_atty_roll_no ; }?>"       maxlength = "20" style = "text-align:center;">

There may also be white space or line breaks at any point in the tag due to code formating by the text editor.

It also needs to ignore tags which have already been converted.

The expression I have so far is no where near sophisticated enough and I don't think is even 'a start'!


<input[^>]*!([(/)])>

Please can someone help educate this old dog!

Thanks buddys.

peter

jtara

5:15 am on Mar 24, 2008 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I think this is way beyond the capability of a regex expression.

Surely, a number of programs exist to make this conversion. Why re-invent the wheel?

But.... why convert? What's wrong with HTML 4.01?

I'm curious as to whether there's some specific application for XHTML that is so compelling that it overrides it's many drawbacks.

snowweb

6:36 am on Mar 24, 2008 (gmt 0)

10+ Year Member

I think this is way beyond the capability of a regex expression.

I'm not sure I agree with you there. Finding patterns of text is what what regex was designed for. The only limitation, as far as I'm aware is a limitation on the part of the programmer concerning the construction of the expression. I suspect that any application that may be capable of performing the task I need would also use regex internally to match the relevant tags.

Surely, a number of programs exist to make this conversion. Why re-invent the wheel?

Several reasons. First of all I have already searched for such applications and can't find one that I can be sure can cope with the embedded PHP and secondly, the cost of such an application is also an issue when converting only one website, which is a none profit website, offering a public service.

But.... why convert? What's wrong with HTML 4.01?
I'm curious as to whether there's some specific application for XHTML that is so compelling that it overrides it's many drawbacks.

XHTML is a web standard which has superseded HTML 4.01 and therefore the w3c recommends that is used to maintain accessibility in the future as the other standards are being depreciated.

What if we had all stuck to HTML 3.0? The web today would look a bit flat! We need to keep up with the technology.

Furthermore, the new editor that I have just purchased from NuSphere (PhpED) enters tags in XHTML format (ie. <br />, even when declaring a HTML 4.01 doctype. When I mentioned this to them, all they could suggest was that, "Perhaps it's time you moved to XHTML".

I believe there are other reasons why we should start to adopt the new standards, but they're way beyond the scope of this discussion. You might be able to find more information about that on the w3c website.

Concerning the 'drawbacks' of using XHTML, I'm not aware of any. Perhaps if I'm making a mistake here, someone could point out what these many drawbacks are please?

Rather than trying to blow against the wind, I have decided to go along with the conversion (despite NuSphere's rather cop-out answer!).

Does anyone else have any suggestions here on how to do this, or perhaps, can anyone recommend an open source or freeware application which is capable of doing what I need please?

Kind regards

pete

Habtom

6:41 am on Mar 24, 2008 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I believe there are other reasons why we should start to adopt the new standards...

There are also a few reasons not to [webmasterworld.com].

bill

10:15 am on Mar 24, 2008 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

XHTML is a web standard which has superseded HTML 4.01 and therefore the w3c recommends that is used to maintain accessibility in the future as the other standards are being depreciated.

You might be surprised to learn that the W3C is adopting HTML 5 [webmasterworld.com] which confirms that the approved path for HTML development is towards HTML 5 and not XHTML 2.0

jtara

3:58 pm on Mar 24, 2008 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I'm not sure I agree with you there. Finding patterns of text is what what regex was designed for.

Yes, finding patterns in text. NOT parsing a computer language!

I suspect that any application that may be capable of performing the task I need would also use regex internally to match the relevant tags.

Very, very unlikely! And if it did, I'd run the other way and fast!

They would typically use an HTML parsing package for their favorite scripting language. Under the hood, these will include a real parser. Parsing methods are too complex to go into in detail in this post, but I assure you, they don't use regexs (except in a supporting role - typically, you use a parser and a lexer. The lexer recognizes language elements, like a number or an identifier. It understands lexemes - or language elements. Regex could be a fine basis for a lexer, but has no use in a parser.)

Basically, a parser decorates a tree with the language elements found. It understands syntax - or how language elements are arranged and relate to each other.

There are two common approaches to working with parsed HTML - one or both might be supported by a given parsing package.

One is for your code to receive a "callback" as each element is found. The other is to use a kind of "query language", such as XPATH (for XML, but there are similar approaches for HTML) that allow you to query the tree once it is created.

What you are asking for is basically a compiler. The input is HTML4.01+PHP. The output is XHTML+PHP. Oh, the input is probably HTML4.01+PHP+errors+tag soup.

I know something about this stuff, having written a couple of compilers.

Several reasons. First of all I have already searched for such applications and can't find one that I can be sure can cope with the embedded PHP

So now you need TWO parsers - one for HTML and one for PHP!

Now, go back and read why you probably don't want to use XHTML anyway...