![]() |
Cronjob problem
Hello,
I've a web site I wish index with this great tool: www.john-howe.com No problem with the web interface with a depth of 5000, execpt it' stop after 30 minutes (once after 1 Hours 23 minutes). I read some post in this forum, i try to get it from a cronjobs, I try my crontab on my host where I am with john-howe.com No way, it won't work... ;o( I try from my personnal web site: www.metadelic.com on my cpanel with this synthax: Code:
php -f http://www.john-howe.com/admin/spider.php http://www.john-howe.com Code:
No input file specified Code:
wget http://www.john-howe.com/admin/spider.php http://www.john-howe.com Code:
Subject: Cron <metadeco@server857> wget http://www.john-howe.com/admin/spider.php http://www.john-howe.com All headers So how can I index the whole site? Any suggestion. A lot of thx for your help and times, Dominique Javet |
And I forgot... it save nothing into DB!
Regards, DOM |
Stop indexing in web interface
Hello,
I've install the last verison of phpdig and al is ok, I can index a part of my web site (where is installed phpdig), but when I try to index the whole site, after a certain time (r****mly?) it's stop indexing! and keep lock the site. I've safe_mode off and I dont think it's the timeout. I've also a few problem with cron job (don't work), but i wish make the first index form web and from root (my site had a 1500 pages (stat + dynamic)). -> http://www.phpdig.net/forum/showthread.php?t=1706 Do you experience some problem like this? What can I do? Do I index part to part of my site or can I say index the whole site and let turn spider.php all the night to index via web interface? How do you proceed? Regards, Dom |
Is the following URL where your spider.php file is?
Code:
http://www.john-howe.com/admin/spider.php Code:
/[PHPDIG-directory]/admin/spider.php Code:
/home/username/public_html/[PHPDIG-directory]/admin/spider.php Code:
php -f /home/username/public_html/[PHPDIG-directory]/admin/spider.php http://www.john-howe.com/ |
Thx for your reply.
I try this one from a external server: Code:
wget http://www.john-howe.com/search/admin/spider.php http://www.john-howe.com Code:
Subject: Cron <metadeco@server857> wget http://www.john-howe.com/search/admin/spider.php http://www.john-howe.com All headers My unprotected admin: http://www.john-howe.com/search/admin/ Why when I do the root index it stop aftre a few time? :bang: I try also this internal via my admin web panel with no result :angry: : Code:
/usr/bin/php -f /home/www/web330/html/search/admin/temp/spider.php forceall http://www.john-howe.com/search/admin/cronfile.txt >> spider.log I apprecieate your help and time. Regards, Dominique |
As for the spider stopping prematurely, packets get lost, connections drop, browsers or servers timeout, hosts may kill the process, take your pick. As for setting a cron or running PhpDig from shell, see section 7 of the updated documentation.
|
Thx a lot! Seem to works :banana:
It's indexing since 2 hours via cron and it's continue. But I must notice that without replacing the relative path in the files write in config.php, it's not working (for me), I have to replace in all the scripts page and then and only then it's working... I use the conr job on the same host where is my site. From a another web cron server physically distanced, it's not working. I mus search why.... BTW, now it's working and indexing my site. Thx a lot for your explaination. Now is clear with your updated documentation. All my best regards, Dominique PS: Super soft et job, merci! Bonjour de la Suisse. |
Glad it's working for you, but you don't have to change all the files, just set ABSOLUTE_SCRIPT_PATH in the config file.
PHP Code:
|
I've done, but when I go after this on the web interface, I've a blank white screen... and then after the replace of relative path with the absolute, all is working well again.
Hummm... Dom |
What version of PhpDig are you using?
|
Hello,
The last one, 1.8.6 on Linux. What I notice too, is that my cron job work only with forceall! When I use all or my domain to update, it's not working... Regards, Dom |
That doesn't make sense. Read the documentation in toto and see if it doesn't help.
|
I understand and read very carefully the documentation but that the truth :o
Maybe it's depending from the ISP I don't know... I try to with the wget command from a external site, and that's doesnt work. Why it's working for somebody and not for me? Don't know. I'm still trying and test. Dom |
What is your LIMIT_DAYS set to in the config file?
|
define('SEARCH_DEFAULT_LIMIT',10); //results per page
define('SPIDER_MAX_LIMIT',2000); //max recurse levels in spider define('RESPIDER_LIMIT',5); //recurse respider limit for update define('LINKS_MAX_LIMIT',20); //max links per each level define('RELINKS_LIMIT',5); //recurse links limit for an update //for limit to directory, URL format must either have file at end or ending slash at end //e.g., http://www.domain.com/dirs/ (WITH ending slash) or http://www.domain.com/dirs/dirs/index.php define('LIMIT_TO_DIRECTORY',false); //limit index to given (sub)directory, no sub dirs of dirs are indexed define('LIMIT_DAYS',0); //default days before reindex a page define('SMALL_WORDS_SIZE',2); //words to not index - must be 2 or more define('MAX_WORDS_SIZE',30); //max word size Dom |
All times are GMT -8. The time now is 05:26 AM. |
Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.