Hello all,

I am in a bind with Apache's multi process limit. Let me explain what I am doing. There's this website which has career details of all the football players since the beginning of professional football. They have a simple web form which allows you to look at a player's profile by entering his name or his 7 digit numeric id number (on that website).

One of my client wants a list of all the players with a certain "flag" in their profile. So I created an automatic form submission and HTML parsing script to get details of all the players with that "flag" in their profile. Let me not go into too much details and tell you that after applying a few pattern rules to the id number, the number of possible id numbers comes to about 1 million (instead of 10^7; each field can have {0,1,2,3,4,5,6,7,8,9}=10 digits, so net combinations = 10*10*10*10*10*10*10).

Therefore, to completely automate this process I wrote a script which would generate an id number, submit the form with that id number, and parse the resulting HTML profile for the "flag". If the script finds a hit on the flag, it stores all the fields of that player in a database. This script is working absolutely fine but the speed I was getting was about one check per second which means that I would have to leave the script running for about 11 days (to process all of about 1 million checks).

So i came up with this idea to divide the check into ten parts and i created separate scripts for each part. Now basically the first script checks for the first 100 thousand combinations, the second checks for another 100 thousand combinations, and so on.

The problem is that I am able to get only two of these scripts running at the same time. So it would still take me at least 5 days to get all the results. The rest of the scripts just sit there in the server's backlog. This is definitely due to Apache's limitation to handle multiple processes. The server I am using to run this script as well the target webserver both run on Apache2. I am sure it's not a problem with the receiving server. It has to be my Apache web server which is running the scripts. I have tried using mpm_winnt (on a windows server) as well as the prefork and worker modules (on a linux server) without any luck. Has any of you ever faced the same situation?

Please guys help me out here.

Best,
Tony Miller

PS: For those concerned about the legitimacy of this work, rest assured, this is absolutely legit. There's nothing in the website's use policy which restricts somebody from doing this. Moreover, my client hired me to do this only because the website owners were not able to hand over the data he required. They gave the stupid reason that they are helpless in providing the data because they don't have a system in place which would allow them to do a search restriction!