|flock(2) or not? Please comment on this project I'm working on|
Share your opinions & experience if you think this is a good idea
Hi webmasters. I already know how to use flock() but I will deeply appreciate your comments and experiences on how much trust you put on this function before implementing a project. I've created online apps in Perl using <flat-files-databases> with no problem. Multiuser apps need file locking to avoid two people (or more) writing to the same file at once causing data loss or data corruption. Yes, I know databases do this but I'm still going to use Perl + flat files.
My way of implementing it: Whenever I'm using a file I first lock it, then read it-write to it and then unlock it. I made tests with flock(2) and if a user-process tries to read or write to the already locked file it will have to wait for the process to close-unlock the file before it can read it or write to it. This happens automatically without the need for the user to see any message and try again, the browser just waits milliseconds and the data is sent to it without any message or interruption, is transparent to the end user.
I had data loss and data corruption before using flock, not anymore so by now I'm happy with it. Then why do I ask? I need your comments:
- Please share your general comments and experiences using flock on important data projects.
- How much do you trust flock(2) ?
- Would you use flock in the kind of project I'm working on?
- Have you experienced data loss even using flock() ? **
* Using flock to lock files involves using it every time you access the file in every routine. In my experience and tests, also for read and write. Please correct me if I'm wrong.
The situation: a table of widgets (flat file database of 2,000 aprox lines with 30 fields) that will be read and updated many times a day by many online users. The thing is, the kind of data has me kinda worried so this is why I ask. For my peace of mind, I've solved this before using one flat file for each record (2,000 files) so, the chances of accessing the same file at once are really less than putting all the data on the same file but I don't want to grow the project to so many data files.
I could do it again if there's no other way (many files) but using a single file would be more appropriate and convenient for this project (I don't know how less secure it would be in terms of flock). That's why I will need your comments, the kind of data is related to money.
Thanks in advance.
I didn't experience any serious problems with flock (that is: problems that couldn't have been avoided by thinking correctly), but I don't see the point in storing mission critial data in a flat file. A database would be much better, including row level locks, easy handling, indexes etc pp. Why avoid that just because you can?
Apart from that, I recommend taking a look at File Locking Tricks and Traps [perl.plover.com] that will get you some ideas for potential problems and their solutions.
Thanks, very interested info, already bookmarked. I agree with you on the DB, but the thing is this app will have constant backups online from server to server (perhaps hourly, every account) for transparent switching from one to the other in case of failure. Tricky, but I came up with an idea to solve it.
We had similar experiences to yours using flock right here in the forum software. Once we moved to flock2 - things settled down.
If you are just appending data - we use stock flock because we want thread files to be able to read the data regardless if a new update has come in.
> How much do you trust flock(2) ?
The one outstanding issue is when you get alot of system load and your blocking flock slows the rest of the running threads down. On a desktop that is not a problem. However; in a online situation, it leads to people pressing f5 to solve a nonexistent problem and compounding the server load.
> Have you experienced data loss even using flock() ? **
> that will be read and updated many times a day by many online users.
How is the data structured? If it is fixed width (or can be made fixed width) fields, then how about using direct disk seeks to read/write data? It would be much faster is most cases. Depending on the data formated, you could probably get away without using flock2 at all. You could be writing to record 500 while reading from record 1201 with no access conflict or race conditions.
Below is our code that updates pageviews per thread that executes on every page view. Each thread on WebmasterWorld consists of a descriptor line with subject, poster, and overhead data (including page views). The first dozen fields of the descriptor are fixed width fields. The rest of the flat file consists of one line per message in the thread.
open(FILE,"+<$threaddat"); #open the thread file (first line is header/descriptor)
sysseek FILE,56,0; # seek byte 56 - our pageview field
sysread FILE,$pageviews,12; # read 12 chars in pageviews into $pageviews
$pageviews++; # add a pageview
$pageviews = substr("$sp$pageviews",-12,12); #pad field back out with spaces.
sysseek FILE,56,0; # seek byte 56 - our pageview field
syswrite FILE,"$pageviews",12; # write back our pageview field
> the above is formated using a new, beta undocumented UBB code.
Start the block with [syntax=perl] and end it with [/syntax]
Thanks Brett. Flock2 has worked fine in my work and "stress" tests, but just as you mention too many read and writes might cause a slow down due to the accumulative delays. That's still fine with me.
One thing I forgot to mention is, regarding the info found on several docs on the web:
- do whatever and write
*close (automatically unlocks the file and flush the buffer)
The "do whatever" caused problems when the scripts still had like $c++; or whatever instruction besides the write to file. The only thing I managed to get it to work is to "do the whatever" before opening and lock the file. So I ended with flock - writing - close. Tried on 3 servers and still the same.
Your post gives me more ideas and stuff to work on, besides clearing my mind on how reliable is flock on real life tests.
Thanks for the info.