View Single Post
Old 03-23-2004, 01:18 PM   #1
mih
Green Mole
 
Join Date: Mar 2004
Location: New York
Posts: 3
Indexing problem: PhpDig will not spider all of the site

Installed PhpDig version 1.8.0 successfully

-------------------------------------
in the config.php

define('SPIDER_MAX_LIMIT',900);
define('SPIDER_DEFAULT_LIMIT',900);
define('RESPIDER_LIMIT',900);

define('LIMIT_DAYS',0);


// if set to true, full path to external binary required
define('PHPDIG_INDEX_MSWORD',true);
define('PHPDIG_PARSE_MSWORD','/usr/ports/textproc/catdoc');
define('PHPDIG_OPTION_MSWORD','-s 8859-1');

define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','/usr/ports/print/pstotext');
define('PHPDIG_OPTION_PDF','-cork');
----------------------------------------------------------
Server information as follows:

Platform: FreeBSD 4.8-RELEASE #0
Web Server version: Apache/1.3.29 (Unix)
PHP 4.3.4
MySQL 4.0.13
PERL v5.8.0 built for i386-freebsd
-----------------------------------------
only 1 tld
----------------------
I have tried to re-create the index more than one time and get very similar result everytime
----------------------------------------------------------
I have created a page that includes a link to all the pages/files that I want to index. and I can not get it to spider the whole

site.
------------------------------------------------------------
It will not spider all the site. On some directories it will only do the first 13 while others it did the first 27 files. It will only do

html files only even though it is supposed to do 'doc' and 'pdf' files.

What am I doing wrong?

The more urgent problem is that it does not spider all the site. In some directories there are more than 100 files and some

are very large (over 1 meg), some of the PDF files contain only graphics and are as big as 40 megs.

Please help and thank you in advance.
mh
mih is offline   Reply With Quote