PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 12-02-2004, 01:59 PM   #1
bforsyth
Green Mole
 
Join Date: Jun 2004
Posts: 22
Unhappy Spider.php is killed at the command line

Hi - I have done a pretty thorough search of the forums and can't find anything that relates to my problem.

I have a site http://www.globalwaterintel.com runnign phpdig (1.8.3). So far it has been great - thanks for all those who had a hand in creating it and those who monitor these forums. This site has approximately 1,200 pages on it and is expanding at the rate of about 50 pages/month. I have set up a page with links to every page on the site (http://www.globalwaterintel.com/list.php) and point the spider to that page.

When I try and run the spider from the command line, it runs for a bit over a minute and then the process is killed. It doesn't even get through the part where it prints the +++++++ 's.

The site is on shared hosting, so I am working on the assumption that the script is being terminated for hogging too much resource (memory or cpu) although they are yet to confirm this.

I am able to idex via the web interface, but it is slow and I would really like to automate the indexing via cron. If it does turn out that the script is being killed because of resource issues, is there any way that I might be able to get around it by introducing some kind of sleep() to pause indexing to free up resources?

I guess the other idea is to split the page sthat are idexed into smaller chunks of say 200 pages and index them seperately?

Any ideas greatly appreciated!
bforsyth is offline   Reply With Quote
Old 12-02-2004, 06:06 PM   #2
vinyl-junkie
Purple Mole
 
Join Date: Jan 2004
Posts: 694
I've had the same problem myself and haven't been able to get any answers either here in the forum or from my provider.
vinyl-junkie is offline   Reply With Quote
Old 12-05-2004, 01:09 PM   #3
bforsyth
Green Mole
 
Join Date: Jun 2004
Posts: 22
OK - here is what the 3rd level support at my host says:

The script was being killed because it was using up too much CPU time. The
maximum amount of CPU time a process can use is 20%. This script was
regularly using 80-90% of the cpu cycles on this machine, which is
unacceptable in a shared hosting environment.

One alternative may be to run the script with a different niceness value.
This can be done using:

nice --adjust=19 /usr/bin/php4 -f spider.php
http://www.globalwaterintel.com/list.php

Adjust can be any value between 0 (normal priority) or 19 (as nice as
possible).

If you just place lots of sleeps in the code, then what may happen is that
the program uses no CPU time, then uses a large amount for a short burst.
If the process monitor happens to see it during a short burst of high
activity, then it may still kill it.


The thing that I don't understand is, why does the broswer version run OK. Surely it would use more resource than running it from the shell as it is having to output to HTML - which I assume is buffered.
bforsyth is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
spider from command line twanoo Troubleshooting 3 01-14-2005 10:04 AM
How To call spider from command line with debth options? jburnett How-to Forum 1 01-12-2005 01:03 PM
Problem running spider from Command Line joshuag200 Troubleshooting 17 09-13-2004 07:57 PM
Command Line Spider spiders all sites Wayne McBryde Troubleshooting 3 01-27-2004 05:15 PM
Spider in command line : 3 errors Yannick Troubleshooting 2 12-19-2003 03:01 AM


All times are GMT -8. The time now is 07:50 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.