![]() |
|
![]() |
#1 |
Orange Mole
Join Date: Jan 2005
Posts: 31
|
Cronjob problem
Hello,
I've a web site I wish index with this great tool: www.john-howe.com No problem with the web interface with a depth of 5000, execpt it' stop after 30 minutes (once after 1 Hours 23 minutes). I read some post in this forum, i try to get it from a cronjobs, I try my crontab on my host where I am with john-howe.com No way, it won't work... ;o( I try from my personnal web site: www.metadelic.com on my cpanel with this synthax: Code:
php -f http://www.john-howe.com/admin/spider.php http://www.john-howe.com Code:
No input file specified Code:
wget http://www.john-howe.com/admin/spider.php http://www.john-howe.com Code:
Subject: Cron <metadeco@server857> wget http://www.john-howe.com/admin/spider.php http://www.john-howe.com All headers --13:22:00-- http://www.john-howe.com/admin/spider.php => `spider.php.3' Resolving www.john-howe.com... done. Connecting to www.john-howe.com[213.131.229.3]:80... connected. HTTP request sent, awaiting response... 404 Not Found 13:22:01 ERROR 404: Not Found. --13:22:01-- http://www.john-howe.com/ => `index.html.6' Connecting to www.john-howe.com[213.131.229.3]:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] 0K .......... ......... 54.50 KB/s 13:22:01 (54.50 KB/s) - `index.html.6' saved [19587] FINISHED --13:22:01-- Downloaded: 19,587 bytes in 1 files So how can I index the whole site? Any suggestion. A lot of thx for your help and times, Dominique Javet |
![]() |
![]() |
![]() |
#2 |
Orange Mole
Join Date: Jan 2005
Posts: 31
|
And I forgot... it save nothing into DB!
Regards, DOM |
![]() |
![]() |
![]() |
#3 |
Orange Mole
Join Date: Jan 2005
Posts: 31
|
Stop indexing in web interface
Hello,
I've install the last verison of phpdig and al is ok, I can index a part of my web site (where is installed phpdig), but when I try to index the whole site, after a certain time (r****mly?) it's stop indexing! and keep lock the site. I've safe_mode off and I dont think it's the timeout. I've also a few problem with cron job (don't work), but i wish make the first index form web and from root (my site had a 1500 pages (stat + dynamic)). -> http://www.phpdig.net/forum/showthread.php?t=1706 Do you experience some problem like this? What can I do? Do I index part to part of my site or can I say index the whole site and let turn spider.php all the night to index via web interface? How do you proceed? Regards, Dom |
![]() |
![]() |
![]() |
#4 |
Purple Mole
Join Date: Jan 2004
Posts: 694
|
Is the following URL where your spider.php file is?
Code:
http://www.john-howe.com/admin/spider.php Code:
/[PHPDIG-directory]/admin/spider.php Code:
/home/username/public_html/[PHPDIG-directory]/admin/spider.php Code:
php -f /home/username/public_html/[PHPDIG-directory]/admin/spider.php http://www.john-howe.com/ |
![]() |
![]() |
![]() |
#5 |
Orange Mole
Join Date: Jan 2005
Posts: 31
|
Thx for your reply.
I try this one from a external server: Code:
wget http://www.john-howe.com/search/admin/spider.php http://www.john-howe.com Code:
Subject: Cron <metadeco@server857> wget http://www.john-howe.com/search/admin/spider.php http://www.john-howe.com All headers --08:55:00-- http://www.john-howe.com/search/admin/spider.php => `spider.php.5' Resolving www.john-howe.com... done. Connecting to www.john-howe.com[213.131.229.3]:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] 0K 15.52 KB/s 08:55:01 (15.52 KB/s) - `spider.php.5' saved [747] --08:55:01-- http://www.john-howe.com/ => `index.html.10' Connecting to www.john-howe.com[213.131.229.3]:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] 0K .......... ......... 81.64 KB/s 08:55:02 (81.64 KB/s) - `index.html.10' saved [19563] FINISHED --08:55:02-- Downloaded: 20,310 bytes in 2 files My unprotected admin: http://www.john-howe.com/search/admin/ Why when I do the root index it stop aftre a few time? ![]() I try also this internal via my admin web panel with no result ![]() Code:
/usr/bin/php -f /home/www/web330/html/search/admin/temp/spider.php forceall http://www.john-howe.com/search/admin/cronfile.txt >> spider.log /usr/bin/php -f /home/www/web330/html/search/admin/spider.php http://www.john-howe.com >> spider.log /usr/bin/php5 -f /home/www/web330/html/search/admin/spider.php http://www.john-howe.com I apprecieate your help and time. Regards, Dominique |
![]() |
![]() |
![]() |
#6 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
As for the spider stopping prematurely, packets get lost, connections drop, browsers or servers timeout, hosts may kill the process, take your pick. As for setting a cron or running PhpDig from shell, see section 7 of the updated documentation.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
![]() |
![]() |
![]() |
#7 |
Orange Mole
Join Date: Jan 2005
Posts: 31
|
Thx a lot! Seem to works
![]() It's indexing since 2 hours via cron and it's continue. But I must notice that without replacing the relative path in the files write in config.php, it's not working (for me), I have to replace in all the scripts page and then and only then it's working... I use the conr job on the same host where is my site. From a another web cron server physically distanced, it's not working. I mus search why.... BTW, now it's working and indexing my site. Thx a lot for your explaination. Now is clear with your updated documentation. All my best regards, Dominique PS: Super soft et job, merci! Bonjour de la Suisse. |
![]() |
![]() |
![]() |
#8 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Glad it's working for you, but you don't have to change all the files, just set ABSOLUTE_SCRIPT_PATH in the config file.
PHP Code:
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
![]() |
![]() |
![]() |
#9 |
Orange Mole
Join Date: Jan 2005
Posts: 31
|
I've done, but when I go after this on the web interface, I've a blank white screen... and then after the replace of relative path with the absolute, all is working well again.
Hummm... Dom |
![]() |
![]() |
![]() |
#10 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
What version of PhpDig are you using?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
![]() |
![]() |
![]() |
#11 |
Orange Mole
Join Date: Jan 2005
Posts: 31
|
Hello,
The last one, 1.8.6 on Linux. What I notice too, is that my cron job work only with forceall! When I use all or my domain to update, it's not working... Regards, Dom |
![]() |
![]() |
![]() |
#12 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
That doesn't make sense. Read the documentation in toto and see if it doesn't help.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
![]() |
![]() |
![]() |
#13 |
Orange Mole
Join Date: Jan 2005
Posts: 31
|
I understand and read very carefully the documentation but that the truth
![]() Maybe it's depending from the ISP I don't know... I try to with the wget command from a external site, and that's doesnt work. Why it's working for somebody and not for me? Don't know. I'm still trying and test. Dom |
![]() |
![]() |
![]() |
#14 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
What is your LIMIT_DAYS set to in the config file?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
![]() |
![]() |
![]() |
#15 |
Orange Mole
Join Date: Jan 2005
Posts: 31
|
define('SEARCH_DEFAULT_LIMIT',10); //results per page
define('SPIDER_MAX_LIMIT',2000); //max recurse levels in spider define('RESPIDER_LIMIT',5); //recurse respider limit for update define('LINKS_MAX_LIMIT',20); //max links per each level define('RELINKS_LIMIT',5); //recurse links limit for an update //for limit to directory, URL format must either have file at end or ending slash at end //e.g., http://www.domain.com/dirs/ (WITH ending slash) or http://www.domain.com/dirs/dirs/index.php define('LIMIT_TO_DIRECTORY',false); //limit index to given (sub)directory, no sub dirs of dirs are indexed define('LIMIT_DAYS',0); //default days before reindex a page define('SMALL_WORDS_SIZE',2); //words to not index - must be 2 or more define('MAX_WORDS_SIZE',30); //max word size Dom |
![]() |
![]() |
![]() |
|
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Spider problem, Search mb_ereg_replace problem. (Fixed?!) | cpeter | Troubleshooting | 0 | 02-24-2006 01:56 PM |
index just homepage with a cronjob (the mole is too deep) | propain | How-to Forum | 0 | 02-14-2005 12:02 AM |
the same problem - not index all links | redlock | Troubleshooting | 0 | 12-28-2004 06:36 AM |
Cronjob for spidering doen't work anymore with PhpDig 1.8.6 | gaam | Troubleshooting | 0 | 12-22-2004 12:28 AM |
Index problem: missing files | gvelden | Troubleshooting | 2 | 04-21-2004 04:54 AM |