homepage Welcome to WebmasterWorld Guest from 54.226.235.222
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

    
Caching data for PHP script; MySQL or Flat-File
ocon




msg:4513518
 9:39 pm on Oct 29, 2012 (gmt 0)

I have a script that receives a user's coordinates from url parameters. As a fail-safe, if no valid parameters are passed I use an API service to convert the user's IP address to their latitude and longitude.

if(!$coords){
$geo = unserialize(file_get_contents("http://www.geoplugin.net/php.gp?ip=".$_SERVER["REMOTE_ADDR"]));
$coords = array($geo["geoplugin_latitude"],$geo["geoplugin_longitude"]);}

The free API service works great and I love that I do not have to maintain a large IP-to-Geo database. However, I feel a little uncomfortable as it makes the script slower and I feel abusive if I send a lot of traffic to the service.

To resolve my concern I am looking at caching the results. If the IP address is in the cache the script will use the corresponding latitude and longitude. If the IP address is not in the cache the script will use the free API service to convert the IP address to the corresponding coordinates, add the IP address to the cache, and continue using the new found latitude and longitude. I could create a cron job to purge the cache on set intervals ensuring data freshness.

I would like recommendations on how I could create this cache, starting with should it be a MySQL or a flat-file database? My website's users will come from all over the world so there could be many IP addresses referenced. Also, speed is very important for the script.

 

ocon




msg:4513570
 12:32 am on Oct 30, 2012 (gmt 0)

I was originally thinking of a flat-file being a text file that I would have to parse through. I thought about it and realized a PHP file where the data was saved as variables might be better.

Am I moving in the right direction or is it a fool's errand and I should just start coding the script for a MySQL database. I would love some feedback.

if(!$coords){

$ip = sprintf("_%u", ip2long($_SERVER["REMOTE_ADDR"]));

if(file_exists("ipDatabase.php")) include("ipDatabase.php");
else file_put_contents("ipDatabase.php","<?php\r\n\r\n");

if(isset($$ip)) $coords = $$ip;
else{
$geo = unserialize(file_get_contents("http://www.geoplugin.net/php.gp?ip=".$_SERVER["REMOTE_ADDR"]));
$coords = array($geo["geoplugin_latitude"],$geo["geoplugin_longitude"]);
file_put_contents("ipDatabase.php","\$".$ip." = array(".$geo["geoplugin_latitude"].",".$geo["geoplugin_longitude"].");// ".$_SERVER["REMOTE_ADDR"]."\r\n", FILE_APPEND);}

}

Dinkar




msg:4513680
 9:00 am on Oct 30, 2012 (gmt 0)

IMO, save the coordinates in a text file => IP.txt example: 120.45.65.215.txt

ocon




msg:4513750
 12:37 pm on Oct 30, 2012 (gmt 0)

Dinkar, I think storing the data in separate files is an excellent idea? Is there a reason why you would suggest text files over a php file?

Dinkar




msg:4513787
 2:04 pm on Oct 30, 2012 (gmt 0)

I like to follow KISS approach.

If there is NO PHP code then I don't see any valid reason to use php file. You need to store data in text format so it should be text file. Plus it's a database file to store data; so you may choose either not to use any extension or use .db extension. It really doesn't make any difference.

Also it's better to use separate text file for each IP or for same class C IPs like 120.45.65.db to cover IP range from 120.45.65.0 to 120.45.65.255 (I think this will be much better, but not so sure.)

HTH.

Dinkar




msg:4513789
 2:10 pm on Oct 30, 2012 (gmt 0)


Oops! I forgot to write the reason for storing the data in separate files:)

If you store all data in one single text file then within few days (depends upon your site traffic) that file will get much bigger and so it will take more time and hard disk activity to access your data. And I am sure you and may be your host too won't like that.

jadebox




msg:4513904
 8:05 pm on Oct 30, 2012 (gmt 0)

You might consider using the memcache or memcached module in PHP.

vincevincevince




msg:4514473
 2:36 am on Nov 1, 2012 (gmt 0)

Definitely use MySQL but please convert the IP addresses using ip2long() before storage, and properly index it.

Put a column for when you created the record, and set up a cron job to prune records older than 30 days - or alternatively check the record age when you look it up, and fall-back to your API where it is more than 30 days old (and then update the record and added time).

Files are OK for <1k or so; but past that it will be awful to manage and increasingly slow.

brotherhood of LAN




msg:4514502
 4:54 am on Nov 1, 2012 (gmt 0)

Yes, I'd go with MySQL/memory over files. Disk seeks are kind of sacred and you want to avoid them.

If you have the memory, go with a MySQL MEMORY table. A 4 byte INT for the IP and 2 floats for the co-ordinates. Optionally add a 'time added' field for refreshing data. Make the IP column the primary key. Lookups will be very fast as the PK is just a hash of the IP value, and it's in memory. Bear in mind a memory table will be wiped if mysql or the server restarts.

There's a good argument to use different storage engine as the hits for any particular IP will be quite sparse. MySQL can be quite clever in what it keeps on disk and what it keeps in memory. At the very least using a DB keeps the data in 1/2 files.

incrediBILL




msg:4514509
 5:23 am on Nov 1, 2012 (gmt 0)

Disk seeks are kind of sacred and you want to avoid them


I've done it both ways and the disk seeks can actually be worse in a big database, much worse, unless it's used often and heavily cached. The difference with individual flat files isn't so much the seeking thanks to OS cache, it's the linear searching that goes on to traverse the OS file structure and access rights checking that burns time.

If you have a very busy site with lots of traffic like I've seen, the churn on a live IP tracking database file under a heavy load could cause it to crash which doesn't happen with individual flat files. The SQL version can be more elegant and efficient but it can also clog the RAM up and bring the site to a crawl under the right conditions but the site would probably be unusable under those same conditions anyway.

If you go the flat file route, don't put all your files in one directory as more than 4K files in a single directory can get kind of slow. I'd try splitting them up into 256 directories based on the first digit of the IP address. Since the OS caches the directory structure of the disk accessing those files should be pretty fast if they're frequently used.

brotherhood of LAN




msg:4514513
 5:44 am on Nov 1, 2012 (gmt 0)

Seems like it'd be a hassle to refresh data with a directory/file setup... and each 'record' will be typically 4 KB instead of about 30 bytes or so in an indexed DB.

Partitioning can help on really big tables. I'd probably go with a partitioned InnoDB table. InnoDB doesn't rely on the OS cache like MyISAM and it'll cache the most frequently accessed recrods into memory via the buffer pool.

IP ranges would certainly help keep the amount of cached data to a minimum but I guess the external API won't supply that.

vincevincevince




msg:4514515
 5:51 am on Nov 1, 2012 (gmt 0)

IP ranges

You should be fairly safe cutting off the last block of the IP (explode the text and cut off the 4th block). In almost all circumstances the full 255 will be at the same place.

incrediBILL




msg:4514516
 6:20 am on Nov 1, 2012 (gmt 0)

Using individual flat files for IP tracking is technically how PHP session files work and nobody has ever claimed they were slow and bogging down the site have they?

ocon




msg:4514833
 10:33 pm on Nov 1, 2012 (gmt 0)

Thank you everyone for your feedback.

The last time I check the boards I was opened to the idea of using individual txt files over large php files. It made me think of things differently and caused me to create the following script:

$folders = explode(".",$_SERVER["REMOTE_ADDR"]);
$file = $folders[0]."/".$folders[1]."/".$folders[2]."/".$_SERVER["REMOTE_ADDR"].".txt";

if(!$coords && file_exists($file)){
if(time()-filemtime($file)<30*86400) $coords = explode(",", file_get_contents($file));
else unlink($file);}

if(!$coords){
if(!is_dir($folders[0])) mkdir($folders[0]);
if(!is_dir($folders[0]."/".$folders[1])) mkdir($folders[0]."/".$folders[1]);
if(!is_dir($folders[0]."/".$folders[1]."/".$folders[2])) mkdir($folders[0]."/".$folders[1]."/".$folders[2]);
$geo = unserialize(file_get_contents("http://www.geoplugin.net/php.gp?ip=".$_SERVER["REMOTE_ADDR"]));
file_put_contents($file, $geo["geoplugin_latitude"].",".$geo["geoplugin_longitude"]);
$coords = array($geo["geoplugin_latitude"], $geo["geoplugin_longitude"]);}

The script creates an individual txt file for each ip address with only the coordinates saved (no formatting overhead). The files are also saved in sub-folders three levels deep organized by blocks. The creation date of the file is verified to make sure it is less than 30 days old and updates the file if older.

I just read that there is a big push for suggesting MySQL but I'm wondering if this new approach mitigates those reasons. The files sizes are very efficient and stored in a very organized way so there are no more than 256 items in a directory. The information is self updating without needing a cron job. I was thinking the biggest concern I currently had was somehow reducing the need for three statements to create the sub folders and handling IPv6 support (if the API eventually supports it). Would MySQL still be the better answer?

I use a well established cloud hosting solution and unfortunately I don't have some luxuries available like memcached.

vince: That's a great observation! If I find that's the case with my recorded ip addresses I will need to incorporate that.

incrediBILL




msg:4514863
 11:54 pm on Nov 1, 2012 (gmt 0)

If this is simply tracker coordinates per visitor, why not just save the coordinates in a session variable and let PHP worry about all those gory details?

Especially if the user is using a service which has an IP pool, one person could generate multiple IPs but PHP Sessions are already smart enough to handle that situation if I'm not mistaken and you would just need to update his coordinates when the IP changes.

Only a couple of lines of code and none of the file creation, clean up, all done for you.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved