Forum Moderators: coopster
At the moment it will strip all unwanted attributes from a given tag. After stipping all the other attributes, it knocks the rest of the tag off - this is because of the variety of types of inputs the form can take.
Can anybody show me how to replace the = signs in a dynamic url, when that url is a part of an img or a tag?
Example:
<a href="http://www.webmasterworld.com/postv4.cgi?action=new&forum=88" title ="Woo" target="_blank">Some text here</a> What I want to do is convert that to:
<a href="http://www.webmasterworld.com/postv4.cgi?action_SOMETHING-ELSE_new&forum_SOMETHING-ELSE_88" title ="Woo" target="_blank">Some text here</a> I've been trying to do it with the likes of regular expressions... In my example though it is imperative that only the = signs in the url are changed - nothing else.
If any of you have been down this road... I would be beyond grateful for whatever solution you could provide.
Thanks in advance.
explode('=', trim($attrSet[$i])); That's why it's breaking the urls. I'd rather work around it than change that filter as it's very paranoid...
I'm trying to process things more like this:
<a href="http://www.webmasterworld.com/postv4.cgi?action=reply&forum=88& discussion=3441821"><img ilo-full-src="http://www.searchengineworld.com/gfx/logo.png" src="http://www.SearchEngineWorld.com/gfx/logo.png" alt="http://www.webmasterworld.com" title="http://www.webmasterworld.com" align="left" border="0" hspace="7" vspace="0"></a> I need them to look like this:
<a href="http://www.webmasterworld.com/postv4.cgi?action_somethingelse_reply& forum_somethingelse_88&discussion_somethingelse_3441821"><img ilo-full-src="http://www.searchengineworld.com/gfx/logo.png" src="http://www.SearchEngineWorld.com/gfx/logo.png" alt="http://www.webmasterworld.com" title="http://www.webmasterworld.com" align="left" border="0" hspace="7" vspace="0"></a> Now... I want to convert the = signs in the url itself:
action=reply&forum=88
But I don't want to convert the other equals signs:
src="http://
border="0
align="l
I will ultimately get it myself in the end.... So it needs to do something along the lines of:
-IF there is an alphanumeric character on BOTH sides of the equals sign (with our without whitespace), and the RHS character is not part of the string "http" THEN convert it.
The reasoning.....:
- Some tags may be malformed in not having " or ' around attributes, so I can't use those characters as criteria for a match
- http is the protocol that will most often be submitted, so at the very least, the filter must exlude that from matches.
I'm trying to vandalize only the = signs in the url, and not touch any of the others. I'm not sure that I'll be able to get it 100% but if you've seen some expression or function that will do this, I'd love to hear about it.
If not, and I make one myself, I will bring it back. It may be that a (non?)working example would be the best way to explain it.
Thank you at least for taking the time to reply :-)
[edited by: dreamcatcher at 6:32 am (utc) on Sep. 6, 2007]
[edit reason] Fixed side scroll. [/edit]
<a href="http://www.webmasterworld.com/postv4.cgi?action=reply&forum=88& discussion=3441821"><img ilo-full-src="http://www.searchengineworld.com/gfx/logo.png" src="http://www.SearchEngineWorld.com/gfx/logo.png" alt="http://www.webmasterworld.com" title="http://www.webmasterworld.com" align="left" border="0" hspace="7" vspace="0"></a>I need them to look like this:
<a href="http://www.webmasterworld.com/postv4.cgi?action_somethingelse_reply& forum_somethingelse_88&discussion_somethingelse_3441821"><img ilo-full-src="http://www.searchengineworld.com/gfx/logo.png" src="http://www.SearchEngineWorld.com/gfx/logo.png" alt="http://www.webmasterworld.com" title="http://www.webmasterworld.com" align="left" border="0" hspace="7" vspace="0"></a>
I think the easiest way to do that is to break it down in smaller steps and not try to do everything in one block.
First, get the href attribute content in a variable:
Something like this should do the job: /(href=)(.+)(\s¦>)/Ui
The assumption here is that all urls are after "href" attribute, and that the href attribute ends with either a space or a ">" sign. Note the modifiers U (ungreedy) and i (caseless).
Collect the content of the 2nd parenthesis in a variable. If there are several URLs (most likely), store them all in an array.
Run through the array and strip " or ' if they exist at the begining and end of the string. Put the stripped versions in a different array.
Now that you have isolated the url, we can safely assume that any "=" sign mus be replaced by "_somethingelse_". That's easy, you have given several methods for it.
By now you should have one array containing all the original urls and one array containing all the new urls. add some " to the begining and end of the new urls. Replace the values of the old array with the values of the new array and you are done.
Cheers,
Sylver
I don't expect a lot of trouble from my technique in future although I think using the method you've provided will simplify things if I ever do it again.
What I used in the end was provided to me by Ketan Kulkarni from carvingIT.com
It's a short script involving a regular expression and it works like this more or less:
$pattern = '#(src=[^\?]*)([\?]?)([^=\s]+)=([^=\s\&]+)#i';
$replacement = '$1$2$3_EQUAL_SIGN_$4';
$output = preg_replace($pattern,$replacement,$output);
while(preg_match($pattern,$output)){
$output = preg_replace($pattern,$replacement,$output);
} It doesn't seperate the url, it only manipulates it. The benefit of this is that I'm able to keep the context. It's allowed me to preserve the url itself while performing operations on the rest of it (without keeping the string in an array for the duration of the script.)
Thanks again for all your help, everyone.
(And thanks for stopping my prior post from sidescrolling)