Forum Moderators: open
On one of the projects I am working on, I have it almost done, but except for one hurdle, which if I carry on banging my head, I'll eventually work it out somehow, but the project is well overdue and don't want to waste more time, so here goes my problem:
If you have seen or worked with the module autolink (they call it Multihook now) from Zikula / PHPnuke and other variants CMSs, that's what I want to achieve BUT the difference is on remote sites that are hosted using our hosting service, that is, they get a javascipt snippet, place it on their header templates or HTML/PHP pages and that will call the JS script on our site which itself calls php scripts to scan their page, search and replace certain keywords with recommended partner sites, basically if it finds widget, it'll turn it into hyperlinked link with a target url as the partner site and the alt title is the message such as visit our partner about widgets.
Now, for simplicity sake, I have replace.php as the main PHP script which calls different functions and include files that are there to do the search and replace etc on the $_GET['url'] or $_REQUEST['url'] which is $url.
I use $data = file_get_contents($url); to analyze and do the search and replace, then return and echo that at the end of the script which will do the scan, search and replace on users site / pages.
Also have replace.js is the js script, on users sites / pages, they place this on their pages
<script type="text/javascript" src="http://mysite.com/replace/replace.js"></script> and change the body tag from <body> to <body id='src' onload="doHttpRequest();"> . doHttpRequest is the Ajax function which has .....replace.open("GET", "http://mysite.com/replace/replace.php?url="+url, true);.....and the url var being location.href. When I test it on my localhost machine with many sites locally, it does the job in every respect, EXCEPT:
When you are on a site that has, say a form to fill and that form has a preview page and the action is POST instead of GET with hidden values, the URL (url = location.href; in the doHttpRequest function) becomes the same as from the previous page, so the POST or GET request in the above ajax will send the url without parameters as it does not see them because the action of the form is POST, just like normal users can't see them on the address bar, and I end up with the preview page in news articles, forum posts etc being refreshed back to the original url with empty fields. I.e, when you are on /postnews.php, you fill in the news form, press PREVIEW, you are supposed to be in /postnews.php?action=preview&id=1234&watever=something, you only see /postnews.php and that's fine because the action was POST, the problem is when you place the js script on the page, the doHttpRequest function is grabbing /postnews.php sending it to the php S&R function instead of /postnews.php?action=preview&id=1234&watever=something
Can't blame the php Search & Replace functions, they get the $url from the ajax as $_GET[url] and that url is /postnews.php
I know I am close if I say it has to do with the ajax request needs to find a way to get the URL as if its action was a GET. I tried and changed the form action on news and forums to "GET" and all works OK on our sites, of course I can't ask other sites to change their forms to use the GET instead of POST action.
I'll appreciate a step by step help to solve the problem, PM me if you wish to help and give me your favorite charity details once all this is solved and I promise I'll still scan those forums and try and help in other matters in which I am more qualified.
Now, below I have in brief, not all content of the scripts except the ajax calls:
1) the ajax file, lodfunctions.js file:
var xmlhttp
function rePlace(){
xmlhttp=GetXmlHttpObject();
if (xmlhttp==null){
alert ("Your browser does not support AJAX!");
return;
}
var page = escape(location.href);
var dest = "?url=" + page;
var url="http://localhost/ads/replace.php" +dest;
xmlhttp.onreadystatechange=stateChanged;
xmlhttp.open("GET",url,true);
xmlhttp.send(null);
}
function stateChanged(){
if (xmlhttp.readyState==4){
document.getElementById("theID").innerHTML=xmlhttp.responseText;
}
}
function GetXmlHttpObject(){
if (window.XMLHttpRequest){
// code for IE7+, Firefox, Chrome, Opera, Safari
return new XMLHttpRequest();
}
if (window.ActiveXObject){
// code for IE6, IE5
return new ActiveXObject("Microsoft.XMLHTTP");
}
return null;
}
2) the test HTML or PHP page, test.html or test.php
<html>
<head>
.........
.........
<script type="text/javascript" src="http://localhost/ads/lodfunctions.js"></script>
</head>
<body id="theID" onload="rePlace();">
....................
...................
</body>
</html> 3) the replace.php script which has other php include files doing the regex and filtering, database stuff etc
<?php
$file = $_REQUEST['url']; // or $file = $_GET['url'];
.................
$data = file_get_contents($file);
....................
few include php files with function to analyze, parse and return content of $data (they return $content;)
..............................
.............................
echo $content;?>
You see, testpage.html passes its content to lodfunctions.js, which sends that content to replace.php, the php script echos the parsed content with the replaced keywords. All works OK except on submitted preview forms...as explained above.
passing document.body.innerHTML instead of the location.href does make sense, I tried the below change but got errors from php about file_get_contents encountering problems:
I tried: var page = document.body.innerHTML; instead of var page = escape(location.href);
and
document.body.innerHTML=xmlhttp.responseText; instead of document.getElementById("theID").innerHTML=xmlhttp.responseText;
I can see theID is missing, not sure if I change
<body id="theID" onload="rePlace();">
to:
<body onload="rePlace();">
I am also aware that I should use POST in the ajax calls sending data url encoded with the relevant headers, but that does not seem to be the issue.
Any pointers will be greatly appreciated, I was never a javascipt programmer and know little clientside for that matter, Perl is my main thing for solid backend programming and PHP for frontend site design, but I admit clientside is back in fashion due to ajax/jquery which can't be ignored.
Also, sending the page as a URL which must then be parsed by your PHP page will not work for pages that process form POST data, and maybe not for some other pages as well. If someone has any sort of server side script that checks the referrer to deliver customized content, then that won't work for your PHP script.
The best alternative is to have your JavaScript send requests back to the server, passing the text to be replaced as a parameter instead of passing a URL to be parsed. For example, have your JavaScript walk the DOM and send each text node to your PHP page. The PHP page would then do the lookup for any keywords in the database, and return the modified value, and your JavaScript would replace the appropriate node. That's really the only way that will work and will be safe from destroying existing event handlers, and will work for pages that display POSTed data.
Have you considered how this will affect sites that have their own scripts and event handlers within the <body></body> tags? You are essentially replacing any scripts and/or event handlers when you replace the innerHTML on the body, so that could have a major impact. Replacing the innerHTML of the body is not a good idea.
Yes, I've done a lot of filtering work using a lot of Perl regex, well in the php scripts to only replace the keywords or phrases nothing else and ignores any tags, script calls etc.
Also, sending the page as a URL which must then be parsed by your PHP page will not work for pages that process form POST data
Fotiman, this is the reason I started this thread, all works OK except on a page with POST form data with submit button to preview the page, then it becomes a problem. As I said above, in test.php if there is a form to fill in, say a forum post or reply with action=post, you press preview which should go to test.php?action=preview&id=234 for example, it's normal to only see test.php on the address bar because of the action not being a GET action, the ajax function sends the location.href to php's file_get_contents function for processing and that location.href is test.php, so it becomes file_get_contents(test.php); WHEN it should be file_get_contents(test.php?action=preview&id=234);. If the form action is GET all works OK.
astupidname suggested document.body.innerHTML instead of location.href which does make sense as the page content to be processed is the current page where the user is in the real sense, however, I am not too sure how to implement that, so far I get few error complaints from the file_get_contents function in the php script.
I thought of sending the filtered keywords $search and $replace after doing all the required regex on the content in a js snipped in php and print / echo it back to the calling page, using a dynamic way of placing that on the page itself like this:
<script type="text/javascript">
onload = function()
{
document.body.innerHTML = document.body.innerHTML.replace(/WIDGET/gi, 'REPLACED');
}
</script>
....document.body.innerHTML.replace(/<?php print $search; ?>/gi, <?php print $replace; ?>);
OR ....document.body.innerHTML.replace(/<?php echo $search; ?>/gi, <?php echo $replace; ?>);
but this did not do it either. Even used document.write the above preceded with js tags.
I can go all out js solution, but that means I have to re-rewrite everything in javascipt, something which I am shaky about and would get lost when the database has to be queried as well as interaction with other scripts is needed, let alone having to adjust my Perl and PHP knowledge to the way js code is constructed.
document.body.innerHTML=xmlhttp.responseText; instead of document.getElementById("theID").innerHTML=xmlhttp.responseText;
//lodfunctions.js file
var scriptOriginSiteName_lodfunctions = {
xmlhttp:null,
GetXmlHttpObject:function () {
if (window.XMLHttpRequest){ //IE7+, Firefox, Chrome, Opera, Safari
return new XMLHttpRequest();
} else if (window.ActiveXObject){ //code for IE6, IE5
return new ActiveXObject("Microsoft.XMLHTTP");
}
return null;
},
stateChanged:function () { //note the references to 'this.xmlhttp', as xmlhttp is now a property of this lodfunctions object
if (this.xmlhttp.readyState==4 && this.xmlhttp.status == 200){ //note should be checking for status == 200 o.k.
document.getElementById("theID").innerHTML = this.xmlhttp.responseText;
}
},
rePlace:function () {
var O = this; //we'll need a reference to the 'this' object for inside the onreadystatechange function
O.xmlhttp = O.GetXmlHttpObject();
if (O.xmlhttp==null){
alert ("Your browser does not support AJAX!");
return;
}
var page = escape(location.href);
var dest = "?url=" + page;
var url="http://localhost/ads/replace.php" +dest;
O.xmlhttp.onreadystatechange = function () { O.stateChanged(); };
O.xmlhttp.open("GET",url,true);
O.xmlhttp.send(null);
},
init:function () {
var O = this,
W = window,
f = function () { O.replace(); };
if (W.attachEvent) {
W.attachEvent("onload", f);
} else if (W.addEventListener) {
W.addEventListener("load", f, false );
}
}
};scriptOriginSiteName_lodfunctions.init();
//end of lodfunctions.js
Yes, I've done a lot of filtering work using a lot of Perl regex, well in the php scripts to only replace the keywords or phrases nothing else and ignores any tags, script calls etc.
all works OK except on a page with POST form data with submit button to preview the page,
And suppose processForm.php does something like this:
if(!isset($_POST['city'])) {
// display error message and/or redirect back to form
}
else {
// stay at this URL and output some HTML
}
Here's a helper function:
function isTextNode(n) {
return (n.nodeType === 3);
}
So you could walk through all of the nodes in the DOM and call isTextNode on each one to create your array of nodes:
var nodes = [];
// start loop
if (isTextNode(n)) {
nodes.push(n);
}
// end loop
// send nodes array to the PHP for processing
// get the results back in an array, then
// start loop
nodes[i].parentNode.replaceChild(results[i], nodes[i]);
// end loop
f = function () { O.replace(); };
TO:
f = function () { O.rePlace(); }; P instead of p
Looks much neater, thanks, however, I am back where I started, when a form is posted for preview, the page just empties and refreshes to the original page, I guess it wouldn't work as Fotiman said on forms with POST action and it's not an error.
Fotiman explained it nicely with the php example:
Then when your PHP page tries to access processForm.php, that page's processing might redirect back to the form page, or it might display an error message, neither of which will match what the end user sees who is already on that page. You simply can not do it this way... it will never work.
Fotiman, I'd like to have a go at implementing the isTextNode function in astupidname's example even if that needs to be modified and see if that resolves the problem, not too sure where to drop those functions in the loadfunctions.js file.
astupidname:
Note also, that if you were to allow users to configure what elements are to be having the replace done on them, there's a few ways you could do it. Simplest would be to have them give the particular element/s they want processed a particular style class name attribute, such as 'class="lodFuncReplace" or something
<span name="NoReplace">.....</span>
<html>
<head>
<title>Find and Replace Test</title>
<style type="text/css">
.theword { font-weight: bold; }
.brownDog { background-color: brown; }
.redFox { background-color: red; }
.greyDog { background-color: #CCC; }
</style>
</head>
<body>
<p>
The little brown dog jumped over the red fox, then
the the grey dog chased after the two and each of
the brown dog and the red fox ran away from the
grey dog....<br />
The little brown dog jumped over the red fox, then
the the grey dog chased after the two and each of
the brown dog and the red fox ran away from the
grey dog....
</p>
<script type="text/javascript">
function findAndReplace(searchText, replacement, s) {
if (!searchText || typeof replacement === 'undefined') {
// Throw error here if you want...
return;
}
var parent,
frag,
oldNode,
searchNode = s || document.body,
regex = (((typeof searchText) === 'string') ? new RegExp(searchText, 'g') : searchText),
currentNode = searchNode.firstChild,
excludes = 'html,head,style,title,link,meta,script,object,iframe';
while (currentNode != null) {
if (currentNode.nodeType === 1 && (excludes + ',').indexOf(currentNode.nodeName.toLowerCase() + ',') === -1) {
// Element node that's not excluded
findAndReplace(searchText, replacement, currentNode);
}
if (currentNode.nodeType !== 3) {
// Not a text node, so move on to the next
currentNode = currentNode.nextSibling;
continue;
}
// Still here so we have a text node
if (!currentNode.data.match(regex)) {
// Text node doesn't contain a match, so move on to the next
currentNode = currentNode.nextSibling;
continue;
}
// Looks like we can do the replacement
parent = currentNode.parentNode;
frag = (function () {
var html = currentNode.data.replace(regex, replacement),
wrap = document.createElement('div'),
frag = document.createDocumentFragment();
wrap.innerHTML = html;
while (wrap.firstChild) {
frag.appendChild(wrap.firstChild);
}
return frag;
})();
oldNode = currentNode;
currentNode = currentNode.nextSibling;
parent.replaceChild(frag, oldNode);
}
}
function doreplace() {
var i, n, newword, oldword = ['brown dog', 'red fox', 'grey dog'];
for (i = 0, n = oldword.length; i < n; i++) {
newword = ' <span class="theword">' + oldword[i] + '<\/span> ';
switch (oldword[i]) {
case 'brown dog':
newword = ' <span class="brownDog"> ' + newword + ' <\/span> ';
break;
case 'red fox':
newword = ' <span class="redFox"> ' + newword + ' <\/span> ';
break;
case 'grey dog':
newword = ' <span class="greyDog"> ' + newword + ' <\/span> ';
break;
}
findAndReplace(oldword[i], newword);
}
}
window.onload = function () {
doreplace();
};
</script>
</body>
</html>
<html>
<head>
<title>Find and Replace Test</title>
<style type="text/css">
.theword { font-weight: bold; }
.brownDog { background-color: brown; }
.redFox { background-color: red; }
.greyDog { background-color: #CCC; }
</style>
</head>
<body>
<p>
The little brown dog jumped over the red fox, then
the the grey dog chased after the two and each of
the brown dog and the red fox ran away from the
grey dog....<br />The little brown dog jumped over
the red fox, then the the grey dog chased after the
two and each of the brown dog and the red fox ran
away from the grey dog....
</p>
<p>
The little brown dog jumped over the red fox, then
the the grey dog chased after the two and each of
the brown dog and the red fox ran away from the
grey dog....<br />The little brown dog jumped over
the red fox, then the the grey dog chased after the
two and each of the brown dog and the red fox ran
away from the grey dog....
</p>
<script type="text/javascript">
var replaced = {}; // Keeps track of what's been replaced
function findAndReplace(searchText, replacement, s) {
if (!searchText || typeof replacement === 'undefined') {
// Throw error here if you want...
return;
}
var parent,
frag,
oldNode,
searchNode = s || document.body,
regex = (((typeof searchText) === 'string') ? new RegExp(searchText) : searchText),
currentNode = searchNode.firstChild,
excludes = 'html,head,style,title,link,meta,script,object,iframe';
while (currentNode != null) {
if (currentNode.nodeType === 1 && (excludes + ',').indexOf(currentNode.nodeName.toLowerCase() + ',') === -1) {
// Element node that's not excluded
findAndReplace(searchText, replacement, currentNode);
}
if (currentNode.nodeType !== 3) {
// Not a text node, so move on to the next
currentNode = currentNode.nextSibling;
continue;
}
// Still here so we have a text node
if (!currentNode.data.match(regex)) {
// Text node doesn't contain a match, so move on to the next
currentNode = currentNode.nextSibling;
continue;
}
if (replaced.hasOwnProperty(searchText)) {
// Already did this replacement
return;
}
// Looks like we can do the replacement
parent = currentNode.parentNode;
frag = (function () {
var html = currentNode.data.replace(regex, replacement),
wrap = document.createElement('div'),
frag = document.createDocumentFragment();
wrap.innerHTML = html;
while (wrap.firstChild) {
frag.appendChild(wrap.firstChild);
}
return frag;
})();
oldNode = currentNode;
currentNode = currentNode.nextSibling;
parent.replaceChild(frag, oldNode);
replaced[searchText] = true;
}
}
function doreplace() {
var i, n, newword, oldword = ['brown dog', 'red fox', 'grey dog'];
for (i = 0, n = oldword.length; i < n; i++) {
newword = ' <span class="theword">' + oldword[i] + '<\/span> ';
switch (oldword[i]) {
case 'brown dog':
newword = ' <span class="brownDog"> ' + newword + ' <\/span> ';
break;
case 'red fox':
newword = ' <span class="redFox"> ' + newword + ' <\/span> ';
break;
case 'grey dog':
newword = ' <span class="greyDog"> ' + newword + ' <\/span> ';
break;
}
findAndReplace(oldword[i], newword);
}
}
window.onload = function () {
doreplace();
};
</script>
</body>
</html>