Forum Moderators: coopster

Message Too Old, No Replies

preg_replace but ignore IMG?

         

ahmedtheking

11:53 pm on Jul 19, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've got a simple preg_replace script that adds an acronym tag to my body text. But, how can I make this script only change the body code and not, for example, the img tag (usually within the title!)?

Here's the script

<?php
// this page adds acronyms
$acronymsfind = array(
"/LDAP/",
"/XML/",
"/PHP/",
"/Access Keys/",
"/OOP/",
"/XHTML/",
//"/HTML/",
//"/HTM/",
"/CSS/",
"/W3C/",
"/WCAG/",
"/PDF/",
"/MySQL/",
"/ASP/",
"/JSP/",
"/ActionScript/",
"/Lingo/",
"/CD/",
"/DVD/",
"/On Demand/",
"/CGI/",
"/Perl/",
"/Python/",
"/RSS/",
"/RDF/",
"/Atom/"
);
$acronymsreplace = array(
"<acronym title=\"Lightweight Directory Access Protocol\">LDAP</acronym>",
"<acronym title=\"eXtensible Markup Language\">XML</acronym>",
"<acronym title=\"PHP: Hypertext Preprocessor (HTML-embedded scripting language)\">PHP</acronym>",
"<acronym title=\"Navigate the site without a mouse\">Access Keys</acronym>",
"<acronym title=\"Object Oriented Programming\">OOP</acronym>",
"<acronym title=\"Extensible Hypertext Markup Language\">XHTML</acronym>",
//"<acronym title=\"Hypertext Markup Language\">HTML</acronym>",
//"<acronym title=\"Hypertext Markup Language\">HTM</acronym>",
"<acronym title=\"Cascading Style Sheet\">CSS</acronym>",
"<acronym title=\"World Wide Web Consortium\">W3C</acronym>",
"<acronym title=\"Web Content Accessibility Guidelines\">WCAG</acronym>",
"<acronym title=\"Portable Document Format\">PDF</acronym>",
"<acronym title=\"Structured Query Language (database query lanquage)\">MySQL</acronym>",
"<acronym title=\"Active Server Page(s) (Microsoft web scripting language and file extension)\">ASP</acronym>",
"<acronym title=\"Java Server Pages\">JSP</acronym>",
"<acronym title=\"Macromedia Flash Programming Language\">ActionScript</acronym>",
"<acronym title=\"Macromedia Director/Shockwave Programming Language\">Lingo</acronym>",
"<acronym title=\"Compact Disc\">CD</acronym>",
"<acronym title=\"Digital Versatile Disc (formerly Digital Video Disc)\">DVD</acronym>",
"<acronym title=\"Content (eg video, music, etc...) there when you want it\">On Demand</acronym>",
"<acronym title=\"Common Gateway Interface (web scripting facility)\">CGI</acronym>",
"<acronym title=\"Practical Extraction and Report Language\">Perl</acronym>",
"<acronym title=\"Programming Language Similer to Perl\">Python</acronym>",
"<acronym title=\"A format for syndicating news and the content of news-like sites\">RSS</acronym>",
"<acronym title=\"Resource Description Framework\">RDF</acronym>",
"<acronym title=\"Much like RSS\">Atom</acronym>"
);
$thebody = preg_replace($acronymsfind,$acronymsreplace,$thebody);

?>

(feel free to copy/steal/use it!)

Here is an example of the code that comes out with the errors:

... <img width="71" height="71" alt="<acronym title="PHP: Hypertext Preprocessor (HTML-embedded scripting language)>PHP</acronym> Acceleration" src="/images/builds/2/main/pages/cl_index/legend/php-acceleration.gif" /> ...

And the url of the page is available on request! :D

Regards,

Ahmed Nuaman

[edited by: jatar_k at 12:12 am (utc) on July 20, 2005]
[edit reason] no sigs thanks [/edit]

ahmedtheking

9:06 pm on Jul 23, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



anyone?

ergophobe

11:10 pm on Jul 23, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I think you would need to use a negative lookbehind assertion

Something like
((?<=>[^<])¦(?<!<img[^>]+))LDAP

Er.. if I have that right (BIG IF), that should be
-find 'LDAP'
-when either
-preceded by a closing tag that is not followed by an opening tag
-or when not preceded by '<img....' with no closing tag.

The first part of the OR is so that it doesn't have to go all the way back to an image tag and find its closing tag.

I think there must be a way that would be way more efficient, but ... no time for testing or playing at the moment.

ahmedtheking

10:20 pm on Jul 24, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Lol, thanks for the help! Yeah that kinda works, but then it kind doesn't!

Where we're telling it to ignore the <img tag at the start and so on, it's not ignoring the title attr of the image tag!

Is there a way to, for example, strip html tags (via strip_tags), add the acronyms and then add the html tags?