PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   Troubleshooting (http://www.phpdig.net/forum/forumdisplay.php?f=22)
-   -   Problem Spidering (http://www.phpdig.net/forum/showthread.php?t=2215)

Trallis 10-30-2005 06:01 AM

Problem Spidering
 
I cannot index any sites with my install of phpDig. I have v1.8.8 RC1 on a windows box and apache. Directory permissions are already set correctly and I verified that allow_url_fopen is enabled.

I am trying to index: http://www.noland.com/noland/index.php

When the spider starts, it seems to pull the parent directory www.noland.com (which is unavailable to the web as it redirects to www.noland.com/noland)

When I try to spider an external site such as www.mtslink.com it will not work either.

Here is the output that I get:

Spidering in progress... [Stop spider]

--------------------------------------------------------------------------------
SITE : http://www.noland.com/
Exclude paths :
- Admin/
- auctiondata/
- calendar/
- cgi-local/
- enoland/
- itemmaint/
- mail/
- msds/
- nol****nline/
- nolandtest/
- obis/
- Orders/
- phpinc/
- squidalizer/
- Stylesheets/
- test/
- webmail/
- webalizer/
- squidalizer-detail/

Wait...
1:http://www.noland.com/noland/
(time : 00:00:05)
No link in temporary table

--------------------------------------------------------------------------------

links found : 1
http://www.noland.com/noland/
Optimizing tables...
Indexing complete !

Charter 11-01-2005 02:52 PM

Is your site and the PhpDig install on a server that uses load balancing?

Trallis 11-01-2005 05:25 PM

No... I was able to get it to spider individual pages just fine by playing with the config, but it doesn't seem to want to follow any links no matter what I try.

Charter 11-01-2005 05:40 PM

Try setting PHPDIG_IN_DOMAIN to true, LIMIT_TO_DIRECTORY to false, both in the config file, and then from the admin panel, use a large search depth, set links per to zero, and choose the no option. You can increase search depth beyond twenty by editing SPIDER_MAX_LIMIT in the config file.

Trallis 11-01-2005 05:46 PM

Done
 
Ok, I verified those 2 settings and I'm still able to get a single page indexed, but it will not follow any of the links. I'd be happy to provide you with the login information (via e-mail) if you think that would help to diagnose the problem.

Thanks for your help.

John

Charter 11-01-2005 05:49 PM

Your install gives:
Code:

Spidering in progress... [Stop spider]
SITE : http://www.mtslink.com/
Exclude paths :
- @NONE@

Wait...
1:http://www.mtslink.com/
(time : 00:00:36)
+ + + + + + + + + +
No link in temporary table
links found : 1
http://www.mtslink.com/
Optimizing tables...
Indexing complete ! [Back] to admin interface.

My install gives:
Code:

Spidering in progress... [Stop spider]
SITE : http://www.mtslink.com/
Exclude paths :
- @NONE@

Wait...
1:http://www.mtslink.com/
(time : 00:00:12)
+ + + + + + + + + +
level 1...

Wait...
2:http://www.mtslink.com/pricing.php
(time : 00:00:29)
+ + + + + +

Wait...
3:http://www.mtslink.com/medicalintranet.php
(time : 00:00:39)
+

Wait...
4:http://www.mtslink.com/contact.php
(time : 00:00:47)

Wait...
5:http://www.mtslink.com/ann.php
(time : 00:00:56)
+

And so forth...

Hmm, what version of PHP are you using?

Charter 11-02-2005 07:58 AM

Set your PHP display_errors to on and keep error_reporting(E_ALL); in the config file. With display_errors to off, error_reporting does not show anything onscreen. If you don't want to do this in PHP directly, try setting the following in an htaccess file in the main PhpDig directory and then do an index:
Code:

PHP_VALUE display_errors 1
Also, your PHP info page says that your MySQL Client API version is 4.0.25 but PhpDig 1.8.8 RC1 needs MySQL 4.1.7+ as the version. The PhpDig 1.8.8 RC1 requirements are listed here. Sometimes the PHP reported API is not the 'real' version (see here as to probable reason) so run the following MySQL query:
Code:

SELECT VERSION();


All times are GMT -8. The time now is 04:28 AM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.