homepage Welcome to WebmasterWorld Guest from 54.166.39.179
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Hardware and OS Related Technologies / Website Technology Issues
Forum Library, Charter, Moderators: phranque

Website Technology Issues Forum

    
Regex to convert input tags to xhtml
I need help to construct a regex expression :)
snowweb

5+ Year Member



 
Msg#: 3608861 posted 2:52 am on Mar 24, 2008 (gmt 0)

Hi Chaps,

I have a website with many pages written to html 4.01 and need to convert it to xhtml. My hardest task is to add the trailing '/' to the input tags on my forms. There are hundreds of input tags!

I can use my editor (PhpED) to search and replace, using regular expressions, but I can't get the regular expression correct (I have VERY limited experience in Regex).

What makes it more difficult, is that many (but not all) of the input tags, also have some PHP code embedded in them. Please see the example below:


<input type = "text" name = "f27d" size = "20" value = "<?php if(isset($f27d)){ echo $f27d ;}else {echo @$tax_agent_acc_atty_roll_no ; }?>" maxlength = "20" style = "text-align:center;">

There may also be white space or line breaks at any point in the tag due to code formating by the text editor.

It also needs to ignore tags which have already been converted.

The expression I have so far is no where near sophisticated enough and I don't think is even 'a start'!


<input[^>]*!([(/)])>

Please can someone help educate this old dog!

Thanks buddys.

peter

 

jtara

WebmasterWorld Senior Member jtara us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 3608861 posted 5:15 am on Mar 24, 2008 (gmt 0)

I think this is way beyond the capability of a regex expression.

Surely, a number of programs exist to make this conversion. Why re-invent the wheel?

But.... why convert? What's wrong with HTML 4.01?

I'm curious as to whether there's some specific application for XHTML that is so compelling that it overrides it's many drawbacks.

snowweb

5+ Year Member



 
Msg#: 3608861 posted 6:36 am on Mar 24, 2008 (gmt 0)


I think this is way beyond the capability of a regex expression.

I'm not sure I agree with you there. Finding patterns of text is what what regex was designed for. The only limitation, as far as I'm aware is a limitation on the part of the programmer concerning the construction of the expression. I suspect that any application that may be capable of performing the task I need would also use regex internally to match the relevant tags.


Surely, a number of programs exist to make this conversion. Why re-invent the wheel?

Several reasons. First of all I have already searched for such applications and can't find one that I can be sure can cope with the embedded PHP and secondly, the cost of such an application is also an issue when converting only one website, which is a none profit website, offering a public service.


But.... why convert? What's wrong with HTML 4.01?

I'm curious as to whether there's some specific application for XHTML that is so compelling that it overrides it's many drawbacks.

XHTML is a web standard which has superseded HTML 4.01 and therefore the w3c recommends that is used to maintain accessibility in the future as the other standards are being depreciated.

What if we had all stuck to HTML 3.0? The web today would look a bit flat! We need to keep up with the technology.

Furthermore, the new editor that I have just purchased from NuSphere (PhpED) enters tags in XHTML format (ie. <br />, even when declaring a HTML 4.01 doctype. When I mentioned this to them, all they could suggest was that, "Perhaps it's time you moved to XHTML".

I believe there are other reasons why we should start to adopt the new standards, but they're way beyond the scope of this discussion. You might be able to find more information about that on the w3c website.

Concerning the 'drawbacks' of using XHTML, I'm not aware of any. Perhaps if I'm making a mistake here, someone could point out what these many drawbacks are please?

Rather than trying to blow against the wind, I have decided to go along with the conversion (despite NuSphere's rather cop-out answer!).

Does anyone else have any suggestions here on how to do this, or perhaps, can anyone recommend an open source or freeware application which is capable of doing what I need please?

Kind regards

pete

Habtom

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3608861 posted 6:41 am on Mar 24, 2008 (gmt 0)

I believe there are other reasons why we should start to adopt the new standards...

There are also a few reasons not to [webmasterworld.com].

bill

WebmasterWorld Administrator bill us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3608861 posted 10:15 am on Mar 24, 2008 (gmt 0)

XHTML is a web standard which has superseded HTML 4.01 and therefore the w3c recommends that is used to maintain accessibility in the future as the other standards are being depreciated.

You might be surprised to learn that the W3C is adopting HTML 5 [webmasterworld.com] which confirms that the approved path for HTML development is towards HTML 5 and not XHTML 2.0

jtara

WebmasterWorld Senior Member jtara us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 3608861 posted 3:58 pm on Mar 24, 2008 (gmt 0)

I'm not sure I agree with you there. Finding patterns of text is what what regex was designed for.

Yes, finding patterns in text. NOT parsing a computer language!

I suspect that any application that may be capable of performing the task I need would also use regex internally to match the relevant tags.

Very, very unlikely! And if it did, I'd run the other way and fast!

They would typically use an HTML parsing package for their favorite scripting language. Under the hood, these will include a real parser. Parsing methods are too complex to go into in detail in this post, but I assure you, they don't use regexs (except in a supporting role - typically, you use a parser and a lexer. The lexer recognizes language elements, like a number or an identifier. It understands lexemes - or language elements. Regex could be a fine basis for a lexer, but has no use in a parser.)

Basically, a parser decorates a tree with the language elements found. It understands syntax - or how language elements are arranged and relate to each other.

There are two common approaches to working with parsed HTML - one or both might be supported by a given parsing package.

One is for your code to receive a "callback" as each element is found. The other is to use a kind of "query language", such as XPATH (for XML, but there are similar approaches for HTML) that allow you to query the tree once it is created.

What you are asking for is basically a compiler. The input is HTML4.01+PHP. The output is XHTML+PHP. Oh, the input is probably HTML4.01+PHP+errors+tag soup.

I know something about this stuff, having written a couple of compilers.

Several reasons. First of all I have already searched for such applications and can't find one that I can be sure can cope with the embedded PHP

So now you need TWO parsers - one for HTML and one for PHP!

Now, go back and read why you probably don't want to use XHTML anyway...

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Hardware and OS Related Technologies / Website Technology Issues
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved