PDA

View Full Version : pstotext problem.


DoWn
04-09-2004, 07:04 AM
Hi. Again a problem trying to index pdf files.

First : the environment

Debian linux running Apache 1.3.26 . PHP 4.1.2.

PHP dig 1.8.0

Succesfully installed pstotext.

In console mode, pstotext runs very well :

The command 'pstotext file.pdf ' display the text contained in the pdf on the screen.


I also tried to redirect output of pstotext in a text file successfully.

phpdig config :

define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','/usr/bin/pstotext');
define('PHPDIG_OPTION_PDF','-cork');

verified (twice) that pstotext is in /usr/bin/ directory

The trouble is the following :

phpdig seems to read correctly pdf files but doensn't index them at all.

help me please.

Charter
04-09-2004, 08:24 AM
Hi. Are the directories to pstotext and the pstotext file itself set to 755 permissions?

DoWn
04-10-2004, 02:14 AM
Hi.

Thank you for answering so quiclky.

The directories and pstotext file itself are set to 755 rights (rwxr-xr-x)

phpdig reads the pdf files but doesn't index them.

:(

Charter
04-10-2004, 01:44 PM
Hi. Maybe something in this (http://www.phpdig.net/showthread.php?threadid=799) thread will help.

DoWn
04-13-2004, 12:19 AM
Hi.

Thank you for your help.

I patched spider.php and robot_functions.php and it seems to be working now.

Phpdig now index some of my pdf.

I still have some problems when trying to index a directory containing only pdf files, but i'm still searching.

Thank you again :)

Charter
04-13-2004, 07:32 PM
>> I still have some problems when trying to index a directory containing only pdf files, but i'm still searching.

Hi. Are there links to all these PDF files? As PhpDig follows links, it won't index a standalone directory of files. Also, it seems some PDF files just take too much memory. See this (http://www.phpdig.net/showthread.php?threadid=534) thread for more details.