PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   External Binaries (http://www.phpdig.net/forum/forumdisplay.php?f=36)
-   -   spider hangs on indexing pdf (pstotext) (http://www.phpdig.net/forum/showthread.php?t=2035)

sushie 06-07-2005 10:39 AM

spider hangs on indexing pdf (pstotext)
 
hi there,

i try to use phpdig for the first time...

i read a lot of threads about problems with pstotext, and tried several hints, but still can't get it work...

my system:
------------------------
-FreeBSD 4.10
-PHP Version 4.3.1
-PHPDIG_VERSION 1.8.7
------------------------

from command line pstotext seems to work correctly (it outputs the file content on STDOUT as expected)

the paths in config.php are ok:
------------------------
define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','/usr/local/bin/pstotext');
define('PHPDIG_OPTION_PDF','-cork');
------------------------

i tweaked the spider.php and robot_functions.php as mentoined somewhere. this are the outputs:
------------------------
Is result test http an array: 1
What is result test http status: PDF

Is result test an array: 1
What is result test status: PDF
Use is executable is set to: 1
Index the pdf is set to: 1
Parse the pdf is set to: /usr/local/bin/pstotext
Does parse pdf exist: 1
Is parse pdf executable: 1
------------------------

... just after printing that, the spider hangs without any error message...

can anyone help?

Charter 06-07-2005 11:29 AM

Okay, that all looks good, so remove the code to print those outputs, and instead, in robot_functions.php find:
Code:

$command = PHPDIG_PARSE_PDF.' '.PHPDIG_OPTION_PDF.' '.$tempfile2;
And replace with:
Code:

$command = PHPDIG_PARSE_PDF.' '.PHPDIG_OPTION_PDF.' '.$tempfile2.' 2>&1';
Then try an index of a PDF file and see what prints onscreen.

Also, if the PDFs were not from dvips, then try the following:
Code:

define('PHPDIG_OPTION_PDF','');
And of course, since output is STDOUT, use the following:
Code:

define('PHPDIG_PDF_EXTENSION','');

sushie 06-10-2005 07:31 AM

still hanging !
 
hi carter,
thanks for your reply!

i tried your advises, but without success... the spider still hangs on indexing the pdf.

this is the last the spider prints out:
----------
Is result test http an array: 1
What is result test http status: PDF
----------

this are my settings:
----------
define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','/usr/local/bin/pstotext');
define('PHPDIG_OPTION_PDF','');
define('PHPDIG_PDF_EXTENSION','');
----------

is saw that the file-permissons to '/usr/local/bin/pstotext' are all set to 755 except the file itself wich has 555 ... could that be a problem?

since i am not adminsitrator of the server (it's a commercional provider) i'm not be able to change any of the file-permissions...

*thanks for further support!

Charter 06-10-2005 10:11 AM

As you cannot change permission on pstotext, see if your host will change the permission or try pdftotext instead. There are instructions for pdftotext here.

sushie 06-13-2005 12:31 PM

hi charter,

there was a problem with 'allow_url_fopen', now it still dont indexes pdf but the spider don't hangs anymore (still trying with 'pstotext') ... here's the output:

-----------------------
Is result test http an array: 1
What is result test http status: PDF

Is result test an array: 1
What is result test status: PDF
Use is executable is set to: 1
Index the pdf is set to: 1
Parse the pdf is set to: /usr/local/bin/pstotext
Does parse pdf exist: 1
Is parse pdf executable: 1

Command is: /usr/local/bin/pstotext ../admin/temp/66912182.tmp 2>&1
Result contains: Array ( [0] => gs: not found )
Return value is: 3
-----------------------

whats means 'gs: not found' ?

*thanks for your support

(... im now going to try 'pdftotext')

sushie 06-13-2005 12:49 PM

pdftotext error
 
hi again,

with 'pdftotext' it dont work either (i use the linux-binary on the freeBSD host...)

config:
---------------------------
define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','/home/local/bin/pdftotext');
define('PHPDIG_OPTION_PDF','');
define('PHPDIG_PDF_EXTENSION','.txt');
---------------------------

output:
---------------------------
Is result test http an array: 1
What is result test http status: PDF

Is result test an array: 1
What is result test status: PDF
Use is executable is set to: 1
Index the pdf is set to: 1
Parse the pdf is set to: /home/local/bin/pdftotext
Does parse pdf exist: 1
Is parse pdf executable: 1

Command is: /home/ekifch/bin/pdftotext ../admin/temp/89121942.tmp 2>&1
Result contains: Array ( [0] => Abort trap )
Return value is: 134
---------------------------

*any idea?

Charter 06-13-2005 06:57 PM

> Result contains: Array ( [0] => gs: not found )

That probably means that Ghostscript cannot be found.

> Result contains: Array ( [0] => Abort trap )

That might be a memory issue. Try pdftotext on a small PDF file.

sushie 06-15-2005 05:57 AM

yeah it works now
 
thanks to your support, some help from my server-admin and lots of hours searching for a solution i finnaly got it work!

the problem was that somehow the 'pstotext' did not find the 'ghostscript'-library when run per web-php-script.

i had to add "export PATH=$PATH:my_path_to_lib; " to the exec command in 'robot_functions.php'...

here's the full change-instruction in case anyone runs into the same problem:

in config.inc (some where near 'EXTERNAL TOOLS SETUP') add:
PHP Code:

define('PHPDIG_PATH_TO_BIN','/usr/local/bin'); 

in robot_functions.php (near line #1089) find:
PHP Code:

if ($usetool) {
  
rename($tempfile1,$tempfile2);
  
exec($command,$result,$retval); 

and replace with:
PHP Code:

if ($usetool) {
  if(
PHPDIG_PATH_TO_BIN)
    
$setpath="export PATH=$PATH:".PHPDIG_PATH_TO_BIN."; ";
  
rename($tempfile1,$tempfile2);
  
exec($setpath.$command,$result,$retval); 

maybe that helps someone
*cheers*


All times are GMT -8. The time now is 12:03 AM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.