PDA

View Full Version : Spider.php is killed at the command line


bforsyth
12-02-2004, 01:59 PM
Hi - I have done a pretty thorough search of the forums and can't find anything that relates to my problem.

I have a site http://www.globalwaterintel.com runnign phpdig (1.8.3). So far it has been great - thanks for all those who had a hand in creating it and those who monitor these forums. This site has approximately 1,200 pages on it and is expanding at the rate of about 50 pages/month. I have set up a page with links to every page on the site (http://www.globalwaterintel.com/list.php) and point the spider to that page.

When I try and run the spider from the command line, it runs for a bit over a minute and then the process is killed. It doesn't even get through the part where it prints the +++++++ 's.

The site is on shared hosting, so I am working on the assumption that the script is being terminated for hogging too much resource (memory or cpu) although they are yet to confirm this.

I am able to idex via the web interface, but it is slow and I would really like to automate the indexing via cron. If it does turn out that the script is being killed because of resource issues, is there any way that I might be able to get around it by introducing some kind of sleep() to pause indexing to free up resources?

I guess the other idea is to split the page sthat are idexed into smaller chunks of say 200 pages and index them seperately?

Any ideas greatly appreciated!

vinyl-junkie
12-02-2004, 06:06 PM
I've had the same problem myself and haven't been able to get any answers either here in the forum or from my provider. :(

bforsyth
12-05-2004, 01:09 PM
OK - here is what the 3rd level support at my host says:

The script was being killed because it was using up too much CPU time. The
maximum amount of CPU time a process can use is 20%. This script was
regularly using 80-90% of the cpu cycles on this machine, which is
unacceptable in a shared hosting environment.

One alternative may be to run the script with a different niceness value.
This can be done using:

nice --adjust=19 /usr/bin/php4 -f spider.php
http://www.globalwaterintel.com/list.php

Adjust can be any value between 0 (normal priority) or 19 (as nice as
possible).

If you just place lots of sleeps in the code, then what may happen is that
the program uses no CPU time, then uses a large amount for a short burst.
If the process monitor happens to see it during a short burst of high
activity, then it may still kill it.

The thing that I don't understand is, why does the broswer version run OK. Surely it would use more resource than running it from the shell as it is having to output to HTML - which I assume is buffered.