bforsyth
12-02-2004, 01:59 PM
Hi - I have done a pretty thorough search of the forums and can't find anything that relates to my problem.
I have a site http://www.globalwaterintel.com runnign phpdig (1.8.3). So far it has been great - thanks for all those who had a hand in creating it and those who monitor these forums. This site has approximately 1,200 pages on it and is expanding at the rate of about 50 pages/month. I have set up a page with links to every page on the site (http://www.globalwaterintel.com/list.php) and point the spider to that page.
When I try and run the spider from the command line, it runs for a bit over a minute and then the process is killed. It doesn't even get through the part where it prints the +++++++ 's.
The site is on shared hosting, so I am working on the assumption that the script is being terminated for hogging too much resource (memory or cpu) although they are yet to confirm this.
I am able to idex via the web interface, but it is slow and I would really like to automate the indexing via cron. If it does turn out that the script is being killed because of resource issues, is there any way that I might be able to get around it by introducing some kind of sleep() to pause indexing to free up resources?
I guess the other idea is to split the page sthat are idexed into smaller chunks of say 200 pages and index them seperately?
Any ideas greatly appreciated!
I have a site http://www.globalwaterintel.com runnign phpdig (1.8.3). So far it has been great - thanks for all those who had a hand in creating it and those who monitor these forums. This site has approximately 1,200 pages on it and is expanding at the rate of about 50 pages/month. I have set up a page with links to every page on the site (http://www.globalwaterintel.com/list.php) and point the spider to that page.
When I try and run the spider from the command line, it runs for a bit over a minute and then the process is killed. It doesn't even get through the part where it prints the +++++++ 's.
The site is on shared hosting, so I am working on the assumption that the script is being terminated for hogging too much resource (memory or cpu) although they are yet to confirm this.
I am able to idex via the web interface, but it is slow and I would really like to automate the indexing via cron. If it does turn out that the script is being killed because of resource issues, is there any way that I might be able to get around it by introducing some kind of sleep() to pause indexing to free up resources?
I guess the other idea is to split the page sthat are idexed into smaller chunks of say 200 pages and index them seperately?
Any ideas greatly appreciated!