I have many PHP scripts that crawl sites and extract a lot of data - sometimes it takes hours for finishing (1-3 hrs depending on the site
What I would like to know is on what basis will be a violation of rules ?
The code has a loop which iterates a lot of times. Each iteration may take many minutes - but after each iteration, I have included the line usleep(15); -> sleep for 15 msecs ! Will that do any good in not using the CPU resources for 15 msecs and then back to the intensive crawling ?
Currently my site has no problems but I want to make sure as I need to crawl many many more and this time I'll make it more automated and less manual (like input of categories etc)
It would be better to monitor what resource are in use while your scripts runs.
Even if your scripts runs for hours doesn't mean it is resource intensive. As you describe it, you are getting data from web site. Doing TCP/IP connection is a really slow process.
While you're getting data from one site and not processing it, it shouldn't slow down anything.
If getting raw data is what takes the longer in your script you shouldn't worry about the resources, although if it's while processing the received data, then you should consider using usleep() function in that section or allow a maximum CPU usage for your scripts.