![]() |
spider hangs on indexing pdf (pstotext)
hi there,
i try to use phpdig for the first time... i read a lot of threads about problems with pstotext, and tried several hints, but still can't get it work... my system: ------------------------ -FreeBSD 4.10 -PHP Version 4.3.1 -PHPDIG_VERSION 1.8.7 ------------------------ from command line pstotext seems to work correctly (it outputs the file content on STDOUT as expected) the paths in config.php are ok: ------------------------ define('PHPDIG_INDEX_PDF',true); define('PHPDIG_PARSE_PDF','/usr/local/bin/pstotext'); define('PHPDIG_OPTION_PDF','-cork'); ------------------------ i tweaked the spider.php and robot_functions.php as mentoined somewhere. this are the outputs: ------------------------ Is result test http an array: 1 What is result test http status: PDF Is result test an array: 1 What is result test status: PDF Use is executable is set to: 1 Index the pdf is set to: 1 Parse the pdf is set to: /usr/local/bin/pstotext Does parse pdf exist: 1 Is parse pdf executable: 1 ------------------------ ... just after printing that, the spider hangs without any error message... can anyone help? |
Okay, that all looks good, so remove the code to print those outputs, and instead, in robot_functions.php find:
Code:
$command = PHPDIG_PARSE_PDF.' '.PHPDIG_OPTION_PDF.' '.$tempfile2; Code:
$command = PHPDIG_PARSE_PDF.' '.PHPDIG_OPTION_PDF.' '.$tempfile2.' 2>&1'; Also, if the PDFs were not from dvips, then try the following: Code:
define('PHPDIG_OPTION_PDF',''); Code:
define('PHPDIG_PDF_EXTENSION',''); |
still hanging !
hi carter,
thanks for your reply! i tried your advises, but without success... the spider still hangs on indexing the pdf. this is the last the spider prints out: ---------- Is result test http an array: 1 What is result test http status: PDF ---------- this are my settings: ---------- define('PHPDIG_INDEX_PDF',true); define('PHPDIG_PARSE_PDF','/usr/local/bin/pstotext'); define('PHPDIG_OPTION_PDF',''); define('PHPDIG_PDF_EXTENSION',''); ---------- is saw that the file-permissons to '/usr/local/bin/pstotext' are all set to 755 except the file itself wich has 555 ... could that be a problem? since i am not adminsitrator of the server (it's a commercional provider) i'm not be able to change any of the file-permissions... *thanks for further support! |
As you cannot change permission on pstotext, see if your host will change the permission or try pdftotext instead. There are instructions for pdftotext here.
|
hi charter,
there was a problem with 'allow_url_fopen', now it still dont indexes pdf but the spider don't hangs anymore (still trying with 'pstotext') ... here's the output: ----------------------- Is result test http an array: 1 What is result test http status: PDF Is result test an array: 1 What is result test status: PDF Use is executable is set to: 1 Index the pdf is set to: 1 Parse the pdf is set to: /usr/local/bin/pstotext Does parse pdf exist: 1 Is parse pdf executable: 1 Command is: /usr/local/bin/pstotext ../admin/temp/66912182.tmp 2>&1 Result contains: Array ( [0] => gs: not found ) Return value is: 3 ----------------------- whats means 'gs: not found' ? *thanks for your support (... im now going to try 'pdftotext') |
pdftotext error
hi again,
with 'pdftotext' it dont work either (i use the linux-binary on the freeBSD host...) config: --------------------------- define('PHPDIG_INDEX_PDF',true); define('PHPDIG_PARSE_PDF','/home/local/bin/pdftotext'); define('PHPDIG_OPTION_PDF',''); define('PHPDIG_PDF_EXTENSION','.txt'); --------------------------- output: --------------------------- Is result test http an array: 1 What is result test http status: PDF Is result test an array: 1 What is result test status: PDF Use is executable is set to: 1 Index the pdf is set to: 1 Parse the pdf is set to: /home/local/bin/pdftotext Does parse pdf exist: 1 Is parse pdf executable: 1 Command is: /home/ekifch/bin/pdftotext ../admin/temp/89121942.tmp 2>&1 Result contains: Array ( [0] => Abort trap ) Return value is: 134 --------------------------- *any idea? |
> Result contains: Array ( [0] => gs: not found )
That probably means that Ghostscript cannot be found. > Result contains: Array ( [0] => Abort trap ) That might be a memory issue. Try pdftotext on a small PDF file. |
yeah it works now
thanks to your support, some help from my server-admin and lots of hours searching for a solution i finnaly got it work!
the problem was that somehow the 'pstotext' did not find the 'ghostscript'-library when run per web-php-script. i had to add "export PATH=$PATH:my_path_to_lib; " to the exec command in 'robot_functions.php'... here's the full change-instruction in case anyone runs into the same problem: in config.inc (some where near 'EXTERNAL TOOLS SETUP') add: PHP Code:
PHP Code:
PHP Code:
*cheers* |
All times are GMT -8. The time now is 12:03 AM. |
Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.