View Full Version : spider hangs on indexing pdf (pstotext)
sushie
06-07-2005, 10:39 AM
hi there,
i try to use phpdig for the first time...
i read a lot of threads about problems with pstotext, and tried several hints, but still can't get it work...
my system:
------------------------
-FreeBSD 4.10
-PHP Version 4.3.1
-PHPDIG_VERSION 1.8.7
------------------------
from command line pstotext seems to work correctly (it outputs the file content on STDOUT as expected)
the paths in config.php are ok:
------------------------
define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','/usr/local/bin/pstotext');
define('PHPDIG_OPTION_PDF','-cork');
------------------------
i tweaked the spider.php and robot_functions.php as mentoined somewhere. this are the outputs:
------------------------
Is result test http an array: 1
What is result test http status: PDF
Is result test an array: 1
What is result test status: PDF
Use is executable is set to: 1
Index the pdf is set to: 1
Parse the pdf is set to: /usr/local/bin/pstotext
Does parse pdf exist: 1
Is parse pdf executable: 1
------------------------
... just after printing that, the spider hangs without any error message...
can anyone help?
Charter
06-07-2005, 11:29 AM
Okay, that all looks good, so remove the code to print those outputs, and instead, in robot_functions.php find:
$command = PHPDIG_PARSE_PDF.' '.PHPDIG_OPTION_PDF.' '.$tempfile2;
And replace with:
$command = PHPDIG_PARSE_PDF.' '.PHPDIG_OPTION_PDF.' '.$tempfile2.' 2>&1';
Then try an index of a PDF file and see what prints onscreen.
Also, if the PDFs were not from dvips, then try the following:
define('PHPDIG_OPTION_PDF','');
And of course, since output is STDOUT, use the following:
define('PHPDIG_PDF_EXTENSION','');
sushie
06-10-2005, 07:31 AM
hi carter,
thanks for your reply!
i tried your advises, but without success... the spider still hangs on indexing the pdf.
this is the last the spider prints out:
----------
Is result test http an array: 1
What is result test http status: PDF
----------
this are my settings:
----------
define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','/usr/local/bin/pstotext');
define('PHPDIG_OPTION_PDF','');
define('PHPDIG_PDF_EXTENSION','');
----------
is saw that the file-permissons to '/usr/local/bin/pstotext' are all set to 755 except the file itself wich has 555 ... could that be a problem?
since i am not adminsitrator of the server (it's a commercional provider) i'm not be able to change any of the file-permissions...
*thanks for further support!
Charter
06-10-2005, 10:11 AM
As you cannot change permission on pstotext, see if your host will change the permission or try pdftotext instead. There are instructions for pdftotext here (http://www.phpdig.net/forum/faq.php?faq=phpdig_ext_bin#faq_phpdig_pdftotext).
sushie
06-13-2005, 12:31 PM
hi charter,
there was a problem with 'allow_url_fopen', now it still dont indexes pdf but the spider don't hangs anymore (still trying with 'pstotext') ... here's the output:
-----------------------
Is result test http an array: 1
What is result test http status: PDF
Is result test an array: 1
What is result test status: PDF
Use is executable is set to: 1
Index the pdf is set to: 1
Parse the pdf is set to: /usr/local/bin/pstotext
Does parse pdf exist: 1
Is parse pdf executable: 1
Command is: /usr/local/bin/pstotext ../admin/temp/66912182.tmp 2>&1
Result contains: Array ( [0] => gs: not found )
Return value is: 3
-----------------------
whats means 'gs: not found' ?
*thanks for your support
(... im now going to try 'pdftotext')
sushie
06-13-2005, 12:49 PM
hi again,
with 'pdftotext' it dont work either (i use the linux-binary on the freeBSD host...)
config:
---------------------------
define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','/home/local/bin/pdftotext');
define('PHPDIG_OPTION_PDF','');
define('PHPDIG_PDF_EXTENSION','.txt');
---------------------------
output:
---------------------------
Is result test http an array: 1
What is result test http status: PDF
Is result test an array: 1
What is result test status: PDF
Use is executable is set to: 1
Index the pdf is set to: 1
Parse the pdf is set to: /home/local/bin/pdftotext
Does parse pdf exist: 1
Is parse pdf executable: 1
Command is: /home/ekifch/bin/pdftotext ../admin/temp/89121942.tmp 2>&1
Result contains: Array ( [0] => Abort trap )
Return value is: 134
---------------------------
*any idea?
Charter
06-13-2005, 06:57 PM
> Result contains: Array ( [0] => gs: not found )
That probably means that Ghostscript cannot be found.
> Result contains: Array ( [0] => Abort trap )
That might be a memory issue. Try pdftotext on a small PDF file.
sushie
06-15-2005, 05:57 AM
thanks to your support, some help from my server-admin and lots of hours searching for a solution i finnaly got it work!
the problem was that somehow the 'pstotext' did not find the 'ghostscript'-library when run per web-php-script.
i had to add "export PATH=$PATH:my_path_to_lib; " to the exec command in 'robot_functions.php'...
here's the full change-instruction in case anyone runs into the same problem:
in config.inc (some where near 'EXTERNAL TOOLS SETUP') add:
define('PHPDIG_PATH_TO_BIN','/usr/local/bin');
in robot_functions.php (near line #1089) find:
if ($usetool) {
rename($tempfile1,$tempfile2);
exec($command,$result,$retval);
and replace with:
if ($usetool) {
if(PHPDIG_PATH_TO_BIN)
$setpath="export PATH=$PATH:".PHPDIG_PATH_TO_BIN."; ";
rename($tempfile1,$tempfile2);
exec($setpath.$command,$result,$retval);
maybe that helps someone
*cheers*
vBulletin® v3.7.3, Copyright ©2000-2025, Jelsoft Enterprises Ltd.