0 links found
Hi, I applied the patch from the http://www.phpdig.net/showthread.php?threadid=573 thread. And i'm still getting 0 links found. Here is the stdout from cmd line.
%php -f spider.php forceall 47472: old priority 0, new priority 18 Spidering in progress... ----------------------------- SITE : http://maggiv8.funpic.de/ Exclude paths : - @NONE@ No link in temporary table links found : 0 ----------------------------- SITE : http://rbhs.ath.cx/ Exclude paths : - @NONE@ No link in temporary table links found : 0 ----------------------------- SITE : http://localhost/ Exclude paths : - @NONE@ No link in temporary table links found : 0 Optimizing tables... Indexing complete ! % running Fbsd 4.9 w/Apache/1.3.29 (Unix) PHP/4.3.4 any suggestions? |
Hi xibalba, and welcome to PhpDig.net!
Perhaps something in this thread might help. Below is output using search depth one: SITE : http://maggiv8.funpic.de/ Exclude paths : - @NONE@ 1:http://maggiv8.funpic.de/ (time : 00:00:15) + + + level 1... 2:http://maggiv8.funpic.de/www/ (time : 00:00:27) 3:http://maggiv8.funpic.de/search.php (time : 00:00:33) 4:http://maggiv8.funpic.de/phpinfo.php (time : 00:00:41) No link in temporary table -------------------------------------------------------------------------------- links found : 4 http://maggiv8.funpic.de/ http://maggiv8.funpic.de/www/ http://maggiv8.funpic.de/search.php http://maggiv8.funpic.de/phpinfo.php Optimizing tables... Indexing complete ! SITE : http://rbhs.ath.cx/ Exclude paths : - @NONE@ 1:http://rbhs.ath.cx/ (time : 00:00:09) + + + + + level 1... 2:http://rbhs.ath.cx/uebimiau/ (time : 00:00:23) 3:http://rbhs.ath.cx/webalizer/ (time : 00:00:29) 4:http://rbhs.ath.cx/moregroupware/ (time : 00:00:35) 5:http://rbhs.ath.cx/phpMyAdmin/ (time : 00:00:41) 6:http://rbhs.ath.cx/phpSysInfo/ (time : 00:00:49) No link in temporary table -------------------------------------------------------------------------------- links found : 6 http://rbhs.ath.cx/ http://rbhs.ath.cx/uebimiau/ http://rbhs.ath.cx/webalizer/ http://rbhs.ath.cx/moregroupware/ http://rbhs.ath.cx/phpMyAdmin/ http://rbhs.ath.cx/phpSysInfo/ Optimizing tables... Indexing complete ! |
search depth
should I be careful with how high I set the search depth?
Even with the search depth set as one for both freebsd.org and rbhs.ath.cx, i get the following output. %php -f spider.php forceall 47723: old priority 0, new priority 18 Spidering in progress... ----------------------------- SITE : http://rbhs.ath.cx/ Exclude paths : - @NONE@ No link in temporary table links found : 0 ----------------------------- SITE : http://freebsd.org/ Exclude paths : - @NONE@ No link in temporary table links found : 0 Optimizing tables... Indexing complete ! % perhaps something is wrong in my configuration. I read over the other thread you linked me too and couldn't find anything in there that would seem to have fixed this problem. Weird...it seems to correctly crawl if I add a URI via the command line %php -f spider.php http://maggiv8.funpic.de/ 47732: old priority 0, new priority 18 Spidering in progress... ----------------------------- SITE : http://maggiv8.funpic.de/ Exclude paths : - @NONE@ +1:http://maggiv8.funpic.de/ (time : 00:00:07) + + + level 1... +2:http://maggiv8.funpic.de/phpinfo.php (time : 00:00:28) +3:http://maggiv8.funpic.de/search.php (time : 00:00:35) +4:http://maggiv8.funpic.de/www/ (time : 00:00:40) + + + + + + + + + + + + level 2... ..... |
Hi. The forceall option is meant to try and force the reindex of sites already indexed regardless of the default days before reindex. If the sites haven't been previously indexed, forceall won't index them.
|
Thanks for the help Charter. On a tangent, is it possible to setup phpDig in a distributed fashion?
Say I want to crawl a huge domain, www.example.com with multiple machines crawling that domain. Is there a way currently to set phpdig up in this style? |
Hi. Some users have run spider.php on different (sub)domains at the same time using the same database tables without incident. However, PhpDig doesn't specifically account for multithreading issues.
If you want to try running PhpDig in a distributed fashion on the same domain, perhaps set the the following in the config.php file, where X is one or two: PHP Code:
Code:
prompt> php -f spider.php http://www.domain.com/dir1/ & |
All times are GMT -8. The time now is 03:40 AM. |
Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.