PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 03-09-2004, 10:00 AM   #1
xibalba
Green Mole
 
Join Date: Mar 2004
Posts: 9
Question 0 links found

Hi, I applied the patch from the http://www.phpdig.net/showthread.php?threadid=573 thread. And i'm still getting 0 links found. Here is the stdout from cmd line.

%php -f spider.php forceall
47472: old priority 0, new priority 18

Spidering in progress...
-----------------------------
SITE : http://maggiv8.funpic.de/
Exclude paths :
- @NONE@
No link in temporary table
links found : 0

-----------------------------
SITE : http://rbhs.ath.cx/
Exclude paths :
- @NONE@
No link in temporary table
links found : 0

-----------------------------
SITE : http://localhost/
Exclude paths :
- @NONE@
No link in temporary table
links found : 0
Optimizing tables...
Indexing complete !
%

running Fbsd 4.9 w/Apache/1.3.29 (Unix) PHP/4.3.4

any suggestions?
xibalba is offline   Reply With Quote
Old 03-09-2004, 11:15 AM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi xibalba, and welcome to PhpDig.net!

Perhaps something in this thread might help.

Below is output using search depth one:

SITE : http://maggiv8.funpic.de/
Exclude paths :
- @NONE@
1:http://maggiv8.funpic.de/
(time : 00:00:15)
+ + +
level 1...
2:http://maggiv8.funpic.de/www/
(time : 00:00:27)

3:http://maggiv8.funpic.de/search.php
(time : 00:00:33)

4:http://maggiv8.funpic.de/phpinfo.php
(time : 00:00:41)

No link in temporary table

--------------------------------------------------------------------------------

links found : 4
http://maggiv8.funpic.de/
http://maggiv8.funpic.de/www/
http://maggiv8.funpic.de/search.php
http://maggiv8.funpic.de/phpinfo.php
Optimizing tables...
Indexing complete !


SITE : http://rbhs.ath.cx/
Exclude paths :
- @NONE@
1:http://rbhs.ath.cx/
(time : 00:00:09)
+ + + + +
level 1...
2:http://rbhs.ath.cx/uebimiau/
(time : 00:00:23)

3:http://rbhs.ath.cx/webalizer/
(time : 00:00:29)

4:http://rbhs.ath.cx/moregroupware/
(time : 00:00:35)

5:http://rbhs.ath.cx/phpMyAdmin/
(time : 00:00:41)

6:http://rbhs.ath.cx/phpSysInfo/
(time : 00:00:49)

No link in temporary table

--------------------------------------------------------------------------------

links found : 6
http://rbhs.ath.cx/
http://rbhs.ath.cx/uebimiau/
http://rbhs.ath.cx/webalizer/
http://rbhs.ath.cx/moregroupware/
http://rbhs.ath.cx/phpMyAdmin/
http://rbhs.ath.cx/phpSysInfo/
Optimizing tables...
Indexing complete !
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 03-09-2004, 12:23 PM   #3
xibalba
Green Mole
 
Join Date: Mar 2004
Posts: 9
search depth

should I be careful with how high I set the search depth?

Even with the search depth set as one for both freebsd.org and rbhs.ath.cx, i get the following output.

%php -f spider.php forceall
47723: old priority 0, new priority 18

Spidering in progress...
-----------------------------
SITE : http://rbhs.ath.cx/
Exclude paths :
- @NONE@
No link in temporary table
links found : 0

-----------------------------
SITE : http://freebsd.org/
Exclude paths :
- @NONE@
No link in temporary table
links found : 0
Optimizing tables...
Indexing complete !
%

perhaps something is wrong in my configuration.
I read over the other thread you linked me too and couldn't find anything in there that would seem to have fixed this problem.


Weird...it seems to correctly crawl if I add a URI via the command line
%php -f spider.php http://maggiv8.funpic.de/
47732: old priority 0, new priority 18

Spidering in progress...
-----------------------------
SITE : http://maggiv8.funpic.de/
Exclude paths :
- @NONE@
+1:http://maggiv8.funpic.de/
(time : 00:00:07)
+ + +
level 1...
+2:http://maggiv8.funpic.de/phpinfo.php
(time : 00:00:28)

+3:http://maggiv8.funpic.de/search.php
(time : 00:00:35)
+4:http://maggiv8.funpic.de/www/
(time : 00:00:40)
+ + + + + + + + + + + +
level 2...
.....

Last edited by xibalba; 03-09-2004 at 12:29 PM.
xibalba is offline   Reply With Quote
Old 03-09-2004, 12:39 PM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. The forceall option is meant to try and force the reindex of sites already indexed regardless of the default days before reindex. If the sites haven't been previously indexed, forceall won't index them.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 03-09-2004, 12:50 PM   #5
xibalba
Green Mole
 
Join Date: Mar 2004
Posts: 9
Thanks for the help Charter. On a tangent, is it possible to setup phpDig in a distributed fashion?

Say I want to crawl a huge domain, www.example.com with multiple machines crawling that domain. Is there a way currently to set phpdig up in this style?
xibalba is offline   Reply With Quote
Old 03-09-2004, 01:45 PM   #6
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Some users have run spider.php on different (sub)domains at the same time using the same database tables without incident. However, PhpDig doesn't specifically account for multithreading issues.

If you want to try running PhpDig in a distributed fashion on the same domain, perhaps set the the following in the config.php file, where X is one or two:
PHP Code:
define('SPIDER_MAX_LIMIT',X);         //max recurse levels in spider
define('SPIDER_DEFAULT_LIMIT',X);     //default value
define('RESPIDER_LIMIT',X);           //recurse limit for update
define('LIMIT_DAYS',0);               //default days before reindex a page 
and try entering the site at different spots, for example:
Code:
prompt> php -f spider.php http://www.domain.com/dir1/ &
prompt> php -f spider.php http://www.domain.com/dir2/ &
The & backgrounds the process and returns you to the shell prompt.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
No links found... pwoc Troubleshooting 0 11-10-2004 09:05 PM
Another: links found : 1 majestique Bug Tracker 11 07-12-2004 01:19 AM
0 links found, yes, another one juzzi Troubleshooting 5 07-05-2004 08:31 AM
links found : 0 w/ example squatty Troubleshooting 3 06-21-2004 06:00 AM
Links found: 1 CafeenMan Troubleshooting 10 05-12-2004 09:35 PM


All times are GMT -8. The time now is 12:15 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.