homepage Welcome to WebmasterWorld Guest from 54.196.168.78
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

This 41 message thread spans 2 pages: 41 ( [1] 2 > >     
Bag-O-Tricks for PHP II
some code snippets that should be helpful for all in creating dynamic sites
andreasfriedrich




msg:1296699
 2:40 pm on Jan 30, 2003 (gmt 0)

This thread continues the collection of PHP tricks in our Php bag of tricks [webmasterworld.com], which among others is referenced in the Perl and PHP CGI Scripting library [webmasterworld.com].


Validating an URI

Validating an URI is a task that appears quite often. You may have a form where users have to enter a valid URL. You might want to check that a referrer [webmasterworld.com] that you will echo on a page is valid to prevent injection of bad code and linking to bad sites.

While PHP provides the [url=http://www.php.net/parse_url]parse_url[/url] function to parse a URL and return its components it still lacks some functionality that will come in handy when validating a URI.

-o0o-

Features that my is_url function provides:

  • Lets you specify which components are required
  • Reports which components are missing
  • Cleans up the parts to comply with RFC2396 and return them as an array
  • Returns true if all required components are present

-o0o-

How to use my is_url function:

For the impatient among you here´s a complete example first. It checks whether the HTTP_REFERER is valid and converts it into an absolute URI before using it on our page.


$cleaned = array();
$error = 0;
if (is_url($_SERVER['HTTP_REFERER'], PATH, $error, $cleaned)) {
if ($error & (SCHEME + AUTHORITY)) {
$_SERVER['HTTP_REFERER'] = make_abs($_SERVER['HTTP_REFERER'],
$_SERVER['SCRIPT_URI']);
} elseif ($cleaned['authority']!= $_SERVER['SERVER_NAME']) {
echo "Referer is from other domain. We do not include it.";
} else {
$_SERVER['HTTP_REFERER'] = make_uri($cleaned);
echo "<br>$_SERVER[HTTP_REFERER]";
}
} else {
echo "errors: $error<br>";
}

-o0o-

Step by step guide through the above example:

Now let´s have a closer look at how it does just that.


$cleaned = array();
$error = 0;
if (is_url($_SERVER['HTTP_REFERER'], PATH, $error, $cleaned)) {

After initializing the $cleaned and $error variables we call the is_url() function. The first parameter is the HTTP_REFERER as contained in the Referer header field in the client request header. The value of this field may be either an absolute or relative URI. Of course it would be possible to pass along just about any code that the user wants.

The second parameter specifies what components need to be present for is_url to be considering the URI to be valid. We only pass the PATH constant since that is all that is required for a relative URI. If we were to check for an absolute URI we would use SCHEME+AUTHORITY+PATH as the second argument.

As the third and fourth argument we pass references to the $error and $cleaned variables. Those will be filled with a value indicating the missing components and the cleaned components of the URI.


if ($error & (SCHEME + AUTHORITY)) {
$_SERVER['HTTP_REFERER'] = make_abs($_SERVER['HTTP_REFERER'],
$_SERVER['SCRIPT_URI']);
}

If there is no SCHEME and no AUTHORITY we know that we have a relative URI which we need to turn into an absolute one. The base URI that we use to resolve it is the requested URI which is contained in $_SERVER['SCRIPT_URI'].


elseif ($cleaned['authority']!= $_SERVER['SERVER_NAME']) {
echo "Referer is from other domain. We do not include it.";
}

Now can be reasonably sure1 that we have an absolute URI. If the authority component is not equal our server name the referrer is from another domain and we will not use it since we do not want to link to some other site.


} else {
$_SERVER['HTTP_REFERER'] = make_uri($cleaned);
echo "<br>$_SERVER[HTTP_REFERER]";
}

When we have an absolutely URI that is from our domain we assemble the cleaned parts to form a URI again.

-o0o-

Code of my is_url function:

Here´s the code for is_url:


define(SCHEME, 1);
define(AUTHORITY, 2);
define(PATH, 4);
define(QUERY, 8);
define(FRAGMENT, 16);
define(AUTHORITY_WF, 32);# AUTORITY_WELLFORMED
function is_url($string, $components, &$error, &$cleaned) {
$error = 0; // first clear error variable
$_error = 0;
#
$ret = ereg("^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?",
$string, $regs);
#
// return false if we were not even able to parse uri
if (!$ret) return false;
#
// check the seperate parts
if (empty($regs[2])) $_error += SCHEME;
if (empty($regs[4])) $_error += AUTHORITY;
if (!empty($regs[4]) and strcmp($regs[2], 'http') == 0) {
// do we have an ok hostname?
if (!ereg("((([a-z0-9]+)[a-z0-9_]¦\\-)+\\.)+".// subdomain + domain
"[a-z]{2,4}".// TLD
":?[0-9]{0,5}$",// port
$regs[4])) {
$_error += AUTHORITY_WF;
}
}
if (empty($regs[5])) $_error += PATH;
if (empty($regs[7])) $_error += QUERY;
if (empty($regs[9])) $_error += FRAGMENT;
#
if ($cleaned!= '') {
$cleaned['scheme'] = $regs[2];
$cleaned['authority'] = $regs[4];
$cleaned['path'] =
preg_replace("{[^-/:@&=+$,_.!~*()'a-zA-Z0-9]}", '', $regs[5]);
$cleaned['query'] =
preg_replace("{[^-;/?:@&=+$,_.!~*'()A-Za-z0-9%]}", '',
urlencode_querystring($regs[7]));
$cleaned['fragment'] =
preg_replace("{[^-;/?:@&=+$,_.!~*'()A-Za-z0-9%]}", '',
urlencode($regs[9]));
}
#
foreach (array(SCHEME, AUTHORITY, AUTHORITY_WF, PATH, QUERY, FRAGMENT)
as $comp) {
if ($components & $comp and $_error & $comp) $error += $comp;
}
#
if ($error > 0) {
$error = $_error;
return false;
}
$error = $_error;
return true;
}

-o0o-

Step by step guide through the above code:


define(SCHEME, 1);
define(AUTHORITY, 2);
define(PATH, 4);
define(QUERY, 8);
define(FRAGMENT, 16);
define(AUTHORITY_WF, 32);# AUTORITY_WELLFORMED

We define some constants that we use to specify the required parts and that is_url() uses to encode which parts of the URI are missing.


function is_url($string, $components, &$error, &$cleaned) {

$string is the URI that we want to check. $components is a numeric value specifying which components are required for a valid URI. Use the constants defined above. $error is a variable that is passed by reference. It will contain a numeric value specifying which components were missing. Note that it will report all components that are missing, not just those that were required. Use the constants to decode that value. $cleaned is an array that is passed by reference. It will contain the cleaned parts of the URI, i.e. they will contain only allowed characters. Invalid characters in the query and fragment component are url_encoded.


$ret = ereg("^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?",
$string, $regs);
#
// return false if we were not even able to parse uri
if (!$ret) return false;

We split the URI into its components using the regular expression given in RFC2396. We could have used PHP´s parse_uri, but I liked the RE better.

If the RE does not match at all, the given URI is something else and we return immediately with a return value of false.


if (empty($regs[2])) $_error += SCHEME;
if (empty($regs[4])) $_error += AUTHORITY;
if (!empty($regs[4]) and strcmp($regs[2], 'http') == 0) {
// do we have an ok hostname?
if (!ereg("((([a-z0-9]+)[a-z0-9_]¦\\-)+\\.)+".// subdomain + domain
"[a-z]{2,4}".// TLD
":?[0-9]{0,5}$",// port
$regs[4])) {
$_error += AUTHORITY_WF;
}
}
if (empty($regs[5])) $_error += PATH;
if (empty($regs[7])) $_error += QUERY;
if (empty($regs[9])) $_error += FRAGMENT;

Here we check the parts returned by the RE and build the $_error variable.


$cleaned['scheme'] = $regs[2];
$cleaned['authority'] = $regs[4];
$cleaned['path'] =
preg_replace("{[^-/:@&=+$,_.!~*()'a-zA-Z0-9]}", '', $regs[5]);
$cleaned['query'] =
preg_replace("{[^-;/?:@&=+$,_.!~*'()A-Za-z0-9%]}", '',
urlencode_querystring($regs[7]));
$cleaned['fragment'] =
preg_replace("{[^-;/?:@&=+$,_.!~*'()A-Za-z0-9%]}", '',
urlencode($regs[9]));

Checking the components for illegal characters. They are simply deleted from the URI. Note that scheme is not checked. It should be! The authority is not cleaned up as well. You can check whether $error contains AUTHORITY_WF to tell whether there are illegal characters. Better yet add some clean up code as well.


foreach (array(SCHEME, AUTHORITY, AUTHORITY_WF, PATH, QUERY, FRAGMENT)
as $comp) {
if ($components & $comp and $_error & $comp) $error += $comp;
}
#
if ($error > 0) {
$error = $_error;
return false;
}
$error = $_error;
return true;

So far we just checked which components were missing. Now we need to determine whether any required parts are missing. This is done in the foreach loop. When a component is required and it is missing we add that components numeric value to the $error variable. When error is larger than zero we return false. Otherwise we return true. In both cases we assign the $error variable the value of our internal $_error variable which contains all the missing elements.

-o0o-

Have fun giving it a try.

Andreas

-o0o-


Note: The WebmasterWorld posting software deletes spaces preceding the exclamation point "!" character. It also replaces a solid vertical pipe symbol with a broken vertical pipe "¦" symbol. Both of these changes will need to be undone in any code you copy from WebmasterWorld. Make sure to include a space preceding the "!" in mod_rewrite code, and always replace "¦" with a solid vertical pipe.

1 To be absolutely sure one would need to check for cases where there is a AUTHORITY but no SCHEME.

 

lorax




msg:1296700
 2:55 pm on Jan 30, 2003 (gmt 0)

Showoff...;)

But seriously, thanks A.

G.

andreasfriedrich




msg:1296701
 7:28 pm on Jan 31, 2003 (gmt 0)

Here´s another one, lorax ;-)

Resolving a relative URI*

Imagine you encounter a link like this when parsing an HTML document which you fetched from http://www.ac.com/other/chaps.html.


<a href="../aaron/carter.html">Aaron</a>

Why should I care:

When you need to insert that URI into a dabase or build a link to it on your own page you would need to use an absolute URI, i.e. you would need to resolve the relative URI relative to the base URI. Since http://www.ac.com/other/chaps.html did not contain a base element the base URI is the document´s URI.

One approach to resolving a relative URI would be not to resolve it at all and let the server or browser do that. You could just use a URI like this:


[ac.com...]

This would work just fine. However, it is not very slick and you need to store or transmit more data. The right way would be to resolve the URI yourself. That´s what make_abs($rel, $base) does.

-o0o-

Code of my make_abs function:

Here´s the code for make_abs:


function make_abs($rel_uri, $base, $REMOVE_LEADING_DOTS = true) {
preg_match("'^([^:]+://[^/]+)/'", $base, $m);
$base_start = $m[1];
if (preg_match("'^/'", $rel_uri)) {
return $base_start . $rel_uri;
}
$base = preg_replace("{[^/]+$}", '', $base);
$base .= $rel_uri;
$base = preg_replace("{^[^:]+://[^/]+}", '', $base);
$base_array = explode('/', $base);
if (count($base_array) and!strlen($base_array[0]))
array_shift($base_array);
$i = 1;
while ($i < count($base_array)) {
if ($base_array[$i - 1] == ".") {
array_splice($base_array, $i - 1, 1);
if ($i > 1) $i--;
} elseif ($base_array[$i] == ".." and $base_array[$i - 1]!= "..") {
array_splice($base_array, $i - 1, 2);
if ($i > 1) {
$i--;
if ($i == count($base_array)) array_push($base_array, "");
}
} else {
$i++;
}
}
if (count($base_array) and $base_array[-1] == ".")
$base_array[-1] = "";
/* How do we treat the case where there are still some leading ../
segments left? According to RFC2396 we are free to handle that
any way we want. The default is to remove them.
#
"If the resulting buffer string still begins with one or more
complete path segments of "..", then the reference is considered
to be in error. Implementations may handle this error by
retaining these components in the resolved path (i.e., treating
them as part of the final URI), by removing them from the
resolved path (i.e., discarding relative levels above the root),
or by avoiding traversal of the reference."
#
[faqs.org...] 5.2.6.g
*/
if ($REMOVE_LEADING_DOTS) {
while (count($base_array) and preg_match("/^\.\.?$/", $base_array[0])) {
array_shift($base_array);
}
}
return($base_start . '/' . implode("/", $base_array));
}

-o0o-

Usage of make_abs:

For an example on how to use this function see my previous post on Validating an URI [webmasterworld.com].

Andreas


* To freshen up your knowledge on the different part of a URI have a look at Path Information - The Base Element [webmasterworld.com].
andreasfriedrich




msg:1296702
 5:41 pm on Feb 26, 2003 (gmt 0)

Getting rid of those query strings

There are quite a few people who realize the problem of having query strings in an URL after having finished developing their site. While it is certainly better to plan query string-less URLs right from the beginning PHP [php.net] provides an easy solution to get rid of the query string in those other instances as well.

Caveats

This requires output buffering to be turned on which increases the load on your server. On Apache2 creating a postprocessing handler will be much more efficient.

Code

Getting rid of the query string is as easy as putting the following few lines at the start of your PHP [php.net] scripts.


ob_start('post_process') [php.net];
#
function post_process($buffer) {
return [php.net] preg_replace( [php.net]"'script.php\?([^\"\']+)'e",
"'script-'.implode( [php.net]'-', preg_split( [php.net]'/&¦=/', '\\1')).'.html'",
$test);
}

How it works

ob_start() [php.net] turns on output buffering and defines a call back function that is to be called when the buffer gets flushed. This will happen when ob_end_flush() [php.net] is called or at the end of your script. That is why you do not need to put anything at the end of your script.

The call back funtion is called and passed the content of the buffer. This buffer is then searched for any occurances of script.php?... The query string is considered to go on until a ' or " is encountered ([^"'] being a negative character class). It is "stored" in the backreference \1 and then split it up using preg_split() [php.net]. The array returned is then implode() [php.net]d. Then 'script-' and '.html' get concatenated to the start and and of the resulting string.

Usually the second parameter of preg_replace() [php.net] is just a replacement string. When you use the e pattern modifier the replacement string is considered to be PHP [php.net] code that gets evaluated and the result of that evaluation is then substituted for the text that was matched by the regular expression.

Example

When your PHP [php.net] script would have output the following HTML document


<html>
<p>
This is a standard compliant HTML document containing two links:
<a href="script.php?name=aaron&age=15">Aaron</a>
<a href="script.php?name=aschenbach&age=dead">Gustav</a>
</p>
</html>

it will now output


<html>
<p>
This is a standard compliant HTML document containing two links:
<a href="script-name-aaron-age-15.html">Aaron</a>
<a href="script-name-aschenbach-age-dead.html">Gustav</a>
</p>
</html>

Getting them back

Since there are no such files on your server you will need a way to undo these changes when the UA requests them.


RewriteRule [httpd.apache.org] (script.*)-([^-]+)-([^-]+)\.html$ $1?$2=$3 [N,QSA]
RewriteRule [httpd.apache.org] script\.html script.php

Those two rules will turn the fake script name back into the real thing with query strings, just like your scripts will expect them.

HTH Andreas


Note: The WebmasterWorld posting software deletes spaces preceding the exclamation point "!" character. It also replaces a solid vertical pipe symbol with a broken vertical pipe "¦" symbol. Both of these changes will need to be undone in any code you copy from WebmasterWorld. Make sure to include a space preceding the "!" in mod_rewrite code, and always replace "¦" with a solid vertical pipe.
dhdweb




msg:1296703
 9:09 pm on Feb 26, 2003 (gmt 0)

Who needs php.net? We have andreasfriedrich to guide us :)

Another great post andreas, thank you!

bcolflesh




msg:1296704
 9:17 pm on Feb 26, 2003 (gmt 0)

Dear Andreas,
RE: Getting rid of those query strings -

Is there any advantage to this over using mod_rewrite? Or is this code pre-suppose that mod_rewrite is not available?

Regards,
Brent

andreasfriedrich




msg:1296705
 1:26 pm on Feb 27, 2003 (gmt 0)

>>[does] this code pre-suppose that mod_rewrite is not available?

No, not at all. If you look at the Getting them back section you will see that mod_rewrite [httpd.apache.org] is still used to turn the nice looking but fake URL (nURL) back into the real ugly one with a query string (qURL).

mod_rewrite [httpd.apache.org] does not take care of turning qURLs into nURLs. It is a one way street (nURL -> qURL BUT NOT qURL -> nURL). When Apache [httpd.apache.org] receives a request it will have to translate the URI into a local filename. Before this is done mod_rewrite [httpd.apache.org] allows you to change the URL that Apache [httpd.apache.org] will use in that translation phase.

mod_rewrite [httpd.apache.org] does not see any content that Apache [httpd.apache.org] sends to a UA. So it is usually the developer´s job to make sure that the application produces nURLs. But quite often the requirement to have nURLs instead of qURLs is only discovered after the application has been written. I provided the above method as a quick fix for such situations.

Andreas

bcolflesh




msg:1296706
 2:13 pm on Feb 27, 2003 (gmt 0)

Dear Andreas,
Thanks for the explanation & all the code snippets!

Regards,
Brent

hakre




msg:1296707
 5:19 pm on Feb 28, 2003 (gmt 0)

eh andreas,

didn't know you're that firm with php. if i've got a question with regex, i tend to ask you... ;) i think i need to write some tuts soon...

bzprod




msg:1296708
 12:56 am on Mar 1, 2003 (gmt 0)

Wow...this seems exactly like something that I could use right now(query strings). I am very new to this so I have two quick questions.

1.)Shall I put the code into each one of my .php pages?

ob_start('post_process'); # function post_process($buffer) { return preg_replace("'script.php\?([^\"\']+)'e", "'script-'.implode('-', preg_split('/&¦=/', '\\1')).'.html'", $test); }

This is a typical dynamic page that is generated on my site.

view.php?PHPSESSID=ec5b133945ec65a501791ae4367af713&sr=0&pp=1&cp=1&ut=1

I have noticed that the following will call up the same page when entered directly into the address bar.
?i=1,?i=2,?i=3, etc...where the number is the user

The above super long url however is the one that is always showing in the url. Will this little script help me get everything down?

2.)Where shall I put the next two lines?


RewriteRule (script.*)-([^-]+)-([^-]+)\.html$ $1?$2=$3 [N,QSA]
RewriteRule script\.html script.php

Would this go into each .php page as well? Where about on the pages?

Thanks everyone for all the great information. Please forgive me if I am totally off the mark.

God Bless,
Patrick

hakre




msg:1296709
 10:17 am on Mar 1, 2003 (gmt 0)

hi Patrick,

1.) Shall I put the code into each one of my .php pages?

yes, you should put it on every page, exactly as described above (on top of the page)
2.)Where shall I put the next two lines?

these 2 lines are placed in the .htaccess file.

andreasfriedrich




msg:1296710
 10:52 am on Mar 1, 2003 (gmt 0)

>>these 2 lines are placed in the .htaccess file.

And since mod_rewrite [httpd.apache.org] will loop over any RewriteRule [httpd.apache.org]s from the top of the file down to the first of these rules you should put them on top of all other RewriteRule [httpd.apache.org]s you may already have in that file.

Andreas

bzprod




msg:1296711
 11:39 am on Mar 1, 2003 (gmt 0)

Thanks! You guys are great help here.

God Bless,

Julian Patrick Stockwell

Onza




msg:1296712
 4:54 pm on Mar 4, 2003 (gmt 0)

This looks very interesting. Can I do this on any server?

Where exactly do I place the code? Will I only have to include it once on every page or within every php script? I have a page that works with loads of includes.

Sorry, I don't really know too much about php coding.

ruserious




msg:1296713
 9:01 pm on Mar 4, 2003 (gmt 0)

Great work, andreas :)

However I would advice people to either not use Session-ids in the querystring (disable trans-sid) or to at least exclude them from being put in the URI (instead leave it in the querystring or just drop it). The reason is explained in this document: Common HTTP Implementation Problems [webmasterworld.com]

You don't want people using (bookmarking, copying, sending to friends) URLs where the session-id is included in the filename. IMHO. This not only goes against the idea and essence of what a URL/URI is, but it might very well throw up errors when the session dies...

andreasfriedrich




msg:1296714
 9:49 pm on Mar 4, 2003 (gmt 0)

>>Great work, andreas :)

Thanks, ruserious.

>>I would advice people to [...] not use Session-ids in the URI

I don´t mind having session ids in the URL for certain applications. And I do not believe that generally advising people against it is really wise. But I do admit that choosing your method of transmitting the session id should be an informed decision. You need to know the pros and cons of each approach. And when in doubt people should not use them in the URI ;).

>>goes against the idea and essence of what a URL/URI is

Not necessarily so. Since I am looking for a job right now I created an online resume and portfolio. The site is password protected and the session id is stored in the id. Now considering that a URI is used to point to a certain resource somewhere on the web that is exactly what my URIs containing a session id do. If you remove the session id it will point to a totally different resource, since all these pages are tailored to specific potential employers. Thus the session id does not only serve to identify who is accessing the page but also when somebody is accessing the page. Depending on the time resources may look different as well. Were the session id contained in a cookie the URL would not sufficiently identify a resource at any given time. Now that is what I´d call to go against the idea and essence of what a URL/URI is ;).

Putting the session id into the URL eliminated any problems with company firewalls filtering out cookies, cookies being turned off, browsers popping up a dialogue box to ask whether they should accept cookies, etc. All external links remove the session id from the URI to keep it from appearing in referrer logs. Sessions time out after 10 minutes of inactivity. I do believe this to be the best possible solution for this application.

Of course things would be different if I wanted these pages to be spidered by SEs.

So it seems that I would advise against session ids in the URL, too. I´d just add: unless you know what you are doing ;).

>>when the session dies

My sessions expire or time out but they never just die ;) Seriously, what kind of problems are you thinking about?

Andreas

andreasfriedrich




msg:1296715
 10:17 pm on Mar 4, 2003 (gmt 0)

>>Where exactly do I place the code?

You need to place the PHP [php.net] right at the top of your code before any output is sent to the browser since we need to catch it all for post processing.

The RewriteRule [httpd.apache.org]s go into either your httpd.conf file if you have root access to your server or the .htaccess file in the root folder of your web site. You should put them right at the top of any other RewriteRule [httpd.apache.org]s since those two rules loop over all parameter in the URL and you most certainly do not want to loop over your other RewriteRule [httpd.apache.org]s as well.

>>Will I only have to include it once on every page

Yes, just once per page at the top of your script. Do NOT put it in every include() [php.net]ed file.

HTH Andreas

ruserious




msg:1296716
 12:03 am on Mar 5, 2003 (gmt 0)

Hmm, makes me think. You have very valid arguments. I see that it totally depends on the appplication/website wether its a smart thing to do or not.
I was having a site much like this one in mind with forums, where users are just tracked most of the time and have the ability to do things like post/private messaging etc. In that scenario having the session-id in the url when displaying topics etc. is IMHO a 'drawback'. But I see I was overgeneralizing. ;)

With expired sessions in the above scenarios, I was thinking of "Your session expired pages" instead of the content-pages (threads, messages etc.), however that was not thorougly thought out, as this does not have to do with the modification you proposed, but with the general design of the application/website.

You got me on the relation between session-ids and SEs, since I mostly hang out over on the Google News forum. *G*

So I guess you laid out my 'points' better than I did, which is why i can simply agree instead of having to (re-)type all of what you said. :p ;)

andreasfriedrich




msg:1296717
 8:12 pm on Mar 11, 2003 (gmt 0)

Highlighting Search Terms

Highlighting the terms searched for in Google or Yahoo on your own page is really easy. It involves reading the Referrer string, parsing the search terms and searching for those terms in the text on your page.

If you get your text from a database all you need to do is run the text through the following code after retrieving it from the database. But even for other methods of generating your page there is an easy solution that we have already used to Get rid of those query strings [webmasterworld.com]. If you put the following code into the output buffering callback function you specified with ob_start() [php.net] all search terms in all of PHP [php.net]´s output will be highlighted.

Code

Put this code either into your output buffering callback function or use it on text retrieved from a database.


if (preg_match [php.net]("'google\.¦brisbane\.'", $_SERVER['HTTP_REFERER'])) {
preg_match [php.net]("'q=([^&]+)'", $_SERVER['HTTP_REFERER'], $match);
$terms = $match[1];
} elseif (preg_match [php.net]("'yahoo\.'", $_SERVER['HTTP_REFERER'])) {
preg_match [php.net]("'p=([^&]+)'", $_SERVER['HTTP_REFERER'], $match);
$terms = $match[1];
}
if ($terms!= '') {
$terms = preg_split [php.net]('/\s+¦\++/', $terms);
for($i=0;$i<count [php.net]($terms);$i++) {
$terms[$i] = "'\b(" . preg_quote [php.net](htmlentities [php.net]($terms[$i]), "'")
. ")\b'i";
}
$line = preg_replace [php.net]($terms, "<em>\\1</em>", $line);
}

How it works

If $_SERVER['HTTP_REFERER'] contains either Google¦Brisbane¦Yahoo we retrieve the search terms from the query string and store them in $terms. Given a referrer of http://www.google.de/search?q=Aaron+Carter&ie=ISO-8859-1&hl=de&meta= $terms will contain Aaron+Carter.

If $terms is unequal '' we split the search terms on whitespace (\s) or plus (+) into separate words and then loop over the array of terms and build a regular expression for each term. To continue the example from above our $terms array will look like this:


$terms[0] = "'\b(Aaron)\b'";
$terms[1] = "'\b(Carter)\b'";

The terms array is then fed into the preg_replace() [php.net] function which will replace it in $line with <em>\1</em>. A text like Aaron Carter rules! will get changed to read <em>Aaron</em> <em>Carter</em> rules!.

Some improvements

The above code will emphasize search terms but there is no way to distinguish between search terms. To achieve that you might want to build a parallel $replacement array which contains entries like these <em class="one">Aaron</em> and <em class="two">Carter</em>. The preg_replace() [php.net] line would then read:


$line = preg_replace [php.net]($terms, $replacements, $line);

Beware though that this will add quite a bit of processing to each request coming from a search engine. The nice effect it might have on your visitors may or may not make up for that slow down.

Andreas

geckofuel




msg:1296718
 6:46 pm on Mar 13, 2003 (gmt 0)

Andreas,

This is in reference to your "Getting rid of those query strings"

When you call the preg_replace function, I'm having trouble figuring out what the $test variable makes reference to.

At php.net, this variable is mentioned as the "subject" that gets searched for a particular expression. Where does $test get defined prior to its being passed to the preg_replace function?

Thanks.
Gecko

andreasfriedrich




msg:1296719
 7:12 pm on Mar 13, 2003 (gmt 0)

Thanks for reading my post really closely and discovering that error, Gecko.

If course $test should read $buffer. I believe thinks are a bit clearer now. If not, feel free to ask away :).

Andreas

geckofuel




msg:1296720
 10:05 pm on Mar 13, 2003 (gmt 0)

Andreas,
Could you walk me through a few of the elements of your Rewrite rules?

RewriteRule (script.*)-([^-]+)-([^-]+)\.html$ $1?$2=$3 [N,QSA]
RewriteRule script\.html script.php

In particular, could you explain this part:

$1?$2=$3

andreasfriedrich




msg:1296721
 12:00 am on Mar 14, 2003 (gmt 0)

You did it again Gecko. There is an error in the first RewriteRule [httpd.apache.org] as well. It should read:


RewriteRule (script.*)-([^-]+)-([^-]+)\.html$ $1.html?$2=$3 [N,QSA]


Those two RewriteRule [httpd.apache.org]s turn the nice URI back into the form the script expects them.

Imagine a request for http://www.ac.tld/script-name-aaron-age-15.html. The URL matches the first RewriteRule [httpd.apache.org]. script-name-aaron will be stored in $1, age in $2, and 15 in $3. So the URL we rewrite to looks like this: script-name-aaron.html?age=15. The QSA flag tells mod_rewrite [httpd.apache.org] to append the query string to an already existing one. The N flag causes mod_rewrite [httpd.apache.org] to start the rewriting process again but use the rewritten URL as the input.

So this time our URL looks like this http://www.ac.tld/script-name-aaron.html. The query string has already been stripped off and is available in mod_rewrite [httpd.apache.org] via the %{QUERY_STRING} environment variable. Now our RewriteRule [httpd.apache.org] matches again. script will be stored in $1, name in $2, and aaron in $3. So the URL we rewrite to looks like this: script.html?name=aaron. Again the query string is appended and rewriting starts again.

Now our URL looks like this http://www.ac.tld/script.html. The first RewriteRule [httpd.apache.org] does not match. The second RewriteRule [httpd.apache.org] matches and rewrites script.html to script.PHP [php.net]. So in the end mod_rewrite [httpd.apache.org] causes Apache [httpd.apache.org] to deliver the resource identified by the following URL: http://www.ac.tld/script.PHP [php.net]?name=aaron&age=15. This is exactly the URL we started out with in the PHP [php.net] script. All this rewriting happens transparently to the user.

Andreas

geckofuel




msg:1296722
 2:58 pm on Mar 14, 2003 (gmt 0)

Andreas,

First, the good news. I've got a raw, beta static UBB system up and running using much of the information you provided in this thread. I plan to post the code as soon as I make a few refinements.

One problem you could help me with is this:

return preg_replace("'script.php\?([^\"\']+)'e", "'script-'.implode('-', preg_split('/&¦=/', '\\1')).'.html'", $test);

This code performs a replacement on any URL that contains "script.php?"

What I need, is to be able to do a replacement on any URL that contains "script.php" (minus the?)

I tried taking out the "\?" but that didn't work. I'm sure this is an easy problem to fix but I'm running into walls trying to do so.

andreasfriedrich




msg:1296723
 3:18 pm on Mar 14, 2003 (gmt 0)

>>This code performs a replacement on any URL that contains
>>"script.PHP [php.net]?"

To be precise it performs a replacement on any URL that contains script.PHP [php.net]? followed by one or more characters that are not " or '. That´s the reason why simply removing the \? did not work. It tried to match script.PHP [php.net] followed by one or more characters that are not " or '.

>>What I need, is to be able to do a replacement on any URL
>>that contains "script.PHP [php.net]" (minus the?)

Why? Where´s the query string that you want to replace then? Posting the URL you want to match might help me understand what you are trying to do.

Andreas

geckofuel




msg:1296724
 3:38 pm on Mar 14, 2003 (gmt 0)

Andreas,
Here's the issue:

UBB posts, on every page, the following link:

[mydomain.org...]

I want the script to turn the above link into:

[mydomain.org...]

The reason for this is complex, and would take a lot of explaining to do. Basically, all I want, is for the preg_replace rule to be more inclusive. I want it to continue to perform all the replacements it does, but also to take script.php and convert it to script.html

This doesn't make sense in the context of your example, but it does in the context of the problem I'm trying to resolve.

andreasfriedrich




msg:1296725
 4:03 pm on Mar 14, 2003 (gmt 0)

OK Gecko, that makes sense :).

With preg_replace [php.net] you can use arrays for both pattern and replacement. This makes it very easy to just create a second pattern and replacement to do what you want to do. Including this case into the first pattern would have been a bit more complicated since the replacement gets eval [php.net]uated as PHP [php.net] code and we would have had to ensure that it runs ok. Simply adding another pattern and replacement save us this hassle.


return preg_replace [php.net](array [php.net](
"'script.PHP [php.net]\?([^\"\']+)'e",
"'script.PHP [php.net][\"\']'"
), array [php.net](
"'script-'.implode [php.net]('-', preg_split [php.net]('/&¦=/', '\\1')).'.html'",
"script.html"
), $buffer);

The second pattern will match script.PHP [php.net] followed by either single or double quotes.

HTH Andreas

andreasfriedrich




msg:1296726
 2:26 pm on Mar 21, 2003 (gmt 0)

Encoding email addresses

While this is certainly not a very safe way to prevent spam bots from picking up your email addresses it works for the time being since those bots are written with speed in mind so they generally do not decode entities before parsing the html code.


$html = preg_replace [php.net](
"'((?:mailto:)?[a-zA-Z0-9_-]+@[a-zA-Z0-9_-]+\.[a-zA-Z]{2,3})'e",
'encode_email($1)',
$html);
function encode_email($txt) {
$n = strlen [php.net]($txt);
for($i = 0; $i<$n; $i++) {
$a = substr [php.net]($txt, $i, 1);
$new .= sprintf [php.net]('&#%s;', ord($a));
}
return [php.net] $new;
}

The script will look for anything that looks like a mailto: link and convert it into numeric entities. Browsers will decode the link while spam bots will not.

If you retrieve the email address from a db you can call encode_email directly as well. And as always a call back function with ob_start [php.net] will work great for any post processing.

This method should be ok in the EU as well where you are required to put a working and easy to find and use email address on your site. I believe that JavaScript solutions will not suffice since you need JavaScript enabled to view the email address.

Andreas


If you like Perl [perl.com] better there is a Perl [perl.com] version in the Bag-O-Tricks for Perl [webmasterworld.com]
Birdman




msg:1296727
 10:45 pm on Mar 21, 2003 (gmt 0)

Image upload and Thumbnail Generator

This thread is going in my Bookmarks for sure. Here is my contribution and feel free scrutinize it, as I am sure there are more elegant ways to code some parts.

This script works with a MySQL [mysql.com] database that contains info for the pics. You need to create a table named pics.


CREATE TABLE `pics` (
`pid_id` VARCHAR NOT NULL ,
`pic_name` VARCHAR( 150 ) ,
`descrip` TEXT,
PRIMARY KEY ( `pid_id` )
)

The Form

<form enctype="multipart/form-data" action="loademup.php" method="post">
<input type="hidden" name="MAX_FILE_SIZE" value="300000">
<input name="userfile[]" type="file" /><br />
<input name="userfile[]" type="file" /><br />
<input name="userfile[]" type="file" /><br />
<input name="userfile[]" type="file" /><br />
<input name="userfile[]" type="file" /><br />
<input name="userfile[]" type="file" /><br />
<input type="submit" />
</form>

loademup.php

<html>
<head><title>Enter Name and Descriptions of Pics</title></head>
<body>
<h1>Enter Name and Descriptions of Pics</h1>
<form action="view.php" method="post">

<?php
$dbh=mysql_connect( [php.net]"localhost", "username", "password") or die ('I cannot connect to the database.');
mysql_select_db( [php.net]"db");

$uploaddir = '/home/username/public_html/images/';
$tot = count( [php.net]$userfile);
$num = 0;
for($q=0;$q<$tot;$q++){

if ($_FILES['userfile']['name'][$q] == "") continue;
$num = $num + 1;
$sql = "SELECT pic_id FROM pics ORDER BY pic_id DESC LIMIT 1";
$result = mysql_query( [php.net]$sql);

while($i = mysql_fetch_array($result)){
$new_id = $i['pic_id'] + 1;
$new_pic = "$new_id.jpg";
}

if (move_uploaded_file( [php.net]$_FILES['userfile']['tmp_name'][$q], $uploaddir . $new_pic)) {
$sql = "INSERT INTO `pics` ( `pic_id` , `name` , `descrip` )VALUES ('$new_id', NULL , NULL);";
mysql_query( [php.net]$sql);
} else {
print "<strong>$_FILES['userfile']['name'][$q]</strong> did not upload!";
}

$new_thumb = "thumbs/$new_pic";
$sourcefile = "$uploaddir$new_pic";
$picsize = getimagesize( [php.net]"$sourcefile");
$source_x = $picsize[0];
$source_y = $picsize;

if ($source_x > $source_y){
$dest_x = 200;
$dest_y = 150;
} else {
$dest_x = 150;
$dest_y = 200;
}

$targetfile = "$uploaddir$new_thumb";
$jpegqual = 75;
$source_id = imagecreatefromjpeg( [php.net]"$sourcefile");
$target_id = imagecreatetruecolor( [php.net]$dest_x, $dest_y);
$target_pic = imagecopyresized( [php.net]$target_id,$source_id,0,0,0,0,$dest_x,$dest_y,$source_x,$source_y);
imagejpeg( [php.net]$target_id,"$targetfile",$jpegqual);
?>
<div style="clear: both;">
<a href="/images/<?=$new_pic?>">
<img src="/images/<?=$new_thumb?>" style="float: left" />
</a>
<strong><?=$_FILES['userfile']['name'][$q]?></strong><br /><br />
<strong>Name:</strong><br />
<input type="text" name="pic_name[<?=$num?>]" /><br /><br />
<strong>Description:</strong><br />
<textarea name="descrip[<?=$num?>]"></textarea>
<input type="hidden" name="pic_id[<?=$num?>]" value="<?=$new_id?>" />
</div>
<?php
}
?>
<input type="submit" value="Click to Save Descriptions" />
</form>
</body>
</html>

view.php


<html>
<head><title>View Pics and Descriptions</title></head>
<body>
<h1>View Pics and Descriptions</h1>

<?php
$dbh=mysql_connect ("localhost", "username", "password") or die ('I cannot connect to the database.');
mysql_select_db ("db");$num = 0;
$tot = count($pic_id);

for ($q=0;$q<$tot;$q++){
$num = $num + 1;
$sql = "UPDATE pics SET descrip = '$descrip' , name = '$pic_name' WHERE pic_id = $pic_id[$num]";
mysql_query($sql);
?>
<div style="clear: both;">
<a href="/images/<?php echo "$pic_id[$num].jpg"?>">
<img src="/images/thumbs/<?php echo "$pic_id[$num].jpg"?>" style="float: left" />
</a>
<h2><?=$pic_name[$num]?></h2>
<p><?=$descrip[$num]?></p>
</div>
<?php
}
mysql_close( [php.net]);
?>
</body>
</html>

[1][edited by: jatar_k at 9:23 pm (utc) on July 21, 2004]
[edit reason] small syntax change [/edit]

Birdman




msg:1296728
 11:45 am on Mar 22, 2003 (gmt 0)

A couple of notes about the script above:

1) The MySQL [mysql.com] table should have read like this:

CREATE TABLE `pics` (
`pic_id` INT NOT NULL ,
`pic_name` VARCHAR( 150 ) ,
`descrip` TEXT,
PRIMARY KEY ( `pic_id` )
)

2) To use this script, the GD image library must have been installed with PHP [php.net] on your server. Try running phpinfo() [php.net] or just run the whole script to see if you get any errors.

3) Also, you may have to replace a function in loademup.php, depending on the version of GD you have.

replace imagecreatetruecolor() [php.net] with imagecreate() [php.net] if you have problems with the original script.

[edited by: jatar_k at 5:56 pm (utc) on Aug. 23, 2003]
[edit reason] fixed [/edit]

This 41 message thread spans 2 pages: 41 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved