depending on how you're writing to the file it could happen when a link is clicked on twice in a short span of time (extremely short.. but if you're getting tens of thousands it's likely that it could happen). I would look into prewritten logging systems, that way you wouldn't have to worry about this.
At the risk of sounding like a newsgroup, if it is written in Perl, this is covered in a perlfaq5 under incrementing a file (yeah, there is no way you would know that if you didn't read them .
Search for file locking in whatever language you are using. What is happening (or what I am guessing is happening) is that two people hit the script at the same time. Rough overview of why, although it is highly dependent on how the particular script works
Instance 1 -- opens file and reads value and increments counter
Instance 1 -- erases file/opens for writing, whatever
Instance 2 -- opens file and reads value. Problem, Instance 1 has cleared the file, but hasn't written the new value in. So Instance 2 is reading and incrementing a blank value.
Instance 1 -- writes incremented counter out
Instance 2 -- writes out new value (which is reset to zero or one).
As for the suggestion to use MySQL instead of flat file. If all you are doing is holding a click count for a page, it would be very pointless and rather inefficient to use MySQL -- after all, consider what MySQL is going to do. it's going to read the value from the table file(s), increment it, and write it back out. Wait a minute, you are already doing that, and without MySQL overhead
Last edited by SimplyDiff; 04-25-2004 at 04:44 PM.
flock should be safe enough for a script that doesn't attract 100k hits a day. but instead of starting over from zero(reading a pre-written and post-cleared file), it usually undercounts hits(reading from a read and pre-written file) because it will lock itself from reading the file, so its used after the file is read and before its written, in this short period of time there's stil a chance that other scripts reading the same file and using the old data. you also must ensure that all scripts that will potentially write in the file use flock, otherwise flock will be of no effect.
my resource consumming approach is to creat a xxx.tmp file before the targetted file is read, and delete ot right away after the file is written. when other requests to read/write in the file at the same time and xxx.tmp exists, it sleeps for sometime(0.1 to 1 sec in random orders) waiting for the absence of xxx.tmp and continue the process. it will ensure the file is locked from being read/written when some other script is reading/writting/doing some calculation based on the data of the file.
lwknet, the creating a file and checking exist creates a race condition and you still end up with a problem on a high hit script.
it usually undercounts hits(reading from a read and pre-written file) because it will lock itself from reading the file, so its used after the file is read and before its written, in
That shouldn't happen with a properly seqeunced script. The file shouldn't be opened for both reading and writing at seperate times, it should be opened for read and write at the same time.
open file for r+w
move pointer/file handle back to beginning of file
Everything is done in a single open. Since nothing is read or written before the file is locked, there is no problem with concurrent accesses. If it is already locked, flock is going to block until the lock is released on the close or unlock.