PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > External Binaries

Reply
 
Thread Tools
Old 06-07-2005, 10:39 AM   #1
sushie
Former Member
 
Join Date: May 2005
Posts: 5
Question spider hangs on indexing pdf (pstotext)

hi there,

i try to use phpdig for the first time...

i read a lot of threads about problems with pstotext, and tried several hints, but still can't get it work...

my system:
------------------------
-FreeBSD 4.10
-PHP Version 4.3.1
-PHPDIG_VERSION 1.8.7
------------------------

from command line pstotext seems to work correctly (it outputs the file content on STDOUT as expected)

the paths in config.php are ok:
------------------------
define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','/usr/local/bin/pstotext');
define('PHPDIG_OPTION_PDF','-cork');
------------------------

i tweaked the spider.php and robot_functions.php as mentoined somewhere. this are the outputs:
------------------------
Is result test http an array: 1
What is result test http status: PDF

Is result test an array: 1
What is result test status: PDF
Use is executable is set to: 1
Index the pdf is set to: 1
Parse the pdf is set to: /usr/local/bin/pstotext
Does parse pdf exist: 1
Is parse pdf executable: 1
------------------------

... just after printing that, the spider hangs without any error message...

can anyone help?
sushie is offline   Reply With Quote
Old 06-07-2005, 11:29 AM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Okay, that all looks good, so remove the code to print those outputs, and instead, in robot_functions.php find:
Code:
$command = PHPDIG_PARSE_PDF.' '.PHPDIG_OPTION_PDF.' '.$tempfile2;
And replace with:
Code:
$command = PHPDIG_PARSE_PDF.' '.PHPDIG_OPTION_PDF.' '.$tempfile2.' 2>&1';
Then try an index of a PDF file and see what prints onscreen.

Also, if the PDFs were not from dvips, then try the following:
Code:
define('PHPDIG_OPTION_PDF','');
And of course, since output is STDOUT, use the following:
Code:
define('PHPDIG_PDF_EXTENSION','');
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 06-10-2005, 07:31 AM   #3
sushie
Former Member
 
Join Date: May 2005
Posts: 5
Question still hanging !

hi carter,
thanks for your reply!

i tried your advises, but without success... the spider still hangs on indexing the pdf.

this is the last the spider prints out:
----------
Is result test http an array: 1
What is result test http status: PDF
----------

this are my settings:
----------
define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','/usr/local/bin/pstotext');
define('PHPDIG_OPTION_PDF','');
define('PHPDIG_PDF_EXTENSION','');
----------

is saw that the file-permissons to '/usr/local/bin/pstotext' are all set to 755 except the file itself wich has 555 ... could that be a problem?

since i am not adminsitrator of the server (it's a commercional provider) i'm not be able to change any of the file-permissions...

*thanks for further support!
sushie is offline   Reply With Quote
Old 06-10-2005, 10:11 AM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
As you cannot change permission on pstotext, see if your host will change the permission or try pdftotext instead. There are instructions for pdftotext here.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 06-13-2005, 12:31 PM   #5
sushie
Former Member
 
Join Date: May 2005
Posts: 5
Question

hi charter,

there was a problem with 'allow_url_fopen', now it still dont indexes pdf but the spider don't hangs anymore (still trying with 'pstotext') ... here's the output:

-----------------------
Is result test http an array: 1
What is result test http status: PDF

Is result test an array: 1
What is result test status: PDF
Use is executable is set to: 1
Index the pdf is set to: 1
Parse the pdf is set to: /usr/local/bin/pstotext
Does parse pdf exist: 1
Is parse pdf executable: 1

Command is: /usr/local/bin/pstotext ../admin/temp/66912182.tmp 2>&1
Result contains: Array ( [0] => gs: not found )
Return value is: 3
-----------------------

whats means 'gs: not found' ?

*thanks for your support

(... im now going to try 'pdftotext')
sushie is offline   Reply With Quote
Old 06-13-2005, 12:49 PM   #6
sushie
Former Member
 
Join Date: May 2005
Posts: 5
Question pdftotext error

hi again,

with 'pdftotext' it dont work either (i use the linux-binary on the freeBSD host...)

config:
---------------------------
define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','/home/local/bin/pdftotext');
define('PHPDIG_OPTION_PDF','');
define('PHPDIG_PDF_EXTENSION','.txt');
---------------------------

output:
---------------------------
Is result test http an array: 1
What is result test http status: PDF

Is result test an array: 1
What is result test status: PDF
Use is executable is set to: 1
Index the pdf is set to: 1
Parse the pdf is set to: /home/local/bin/pdftotext
Does parse pdf exist: 1
Is parse pdf executable: 1

Command is: /home/ekifch/bin/pdftotext ../admin/temp/89121942.tmp 2>&1
Result contains: Array ( [0] => Abort trap )
Return value is: 134
---------------------------

*any idea?
sushie is offline   Reply With Quote
Old 06-13-2005, 06:57 PM   #7
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
> Result contains: Array ( [0] => gs: not found )

That probably means that Ghostscript cannot be found.

> Result contains: Array ( [0] => Abort trap )

That might be a memory issue. Try pdftotext on a small PDF file.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 06-15-2005, 05:57 AM   #8
sushie
Former Member
 
Join Date: May 2005
Posts: 5
Smile yeah it works now

thanks to your support, some help from my server-admin and lots of hours searching for a solution i finnaly got it work!

the problem was that somehow the 'pstotext' did not find the 'ghostscript'-library when run per web-php-script.

i had to add "export PATH=$PATH:my_path_to_lib; " to the exec command in 'robot_functions.php'...

here's the full change-instruction in case anyone runs into the same problem:

in config.inc (some where near 'EXTERNAL TOOLS SETUP') add:
PHP Code:
define('PHPDIG_PATH_TO_BIN','/usr/local/bin'); 
in robot_functions.php (near line #1089) find:
PHP Code:
if ($usetool) {
  
rename($tempfile1,$tempfile2);
  
exec($command,$result,$retval); 
and replace with:
PHP Code:
if ($usetool) {
  if(
PHPDIG_PATH_TO_BIN)
    
$setpath="export PATH=$PATH:".PHPDIG_PATH_TO_BIN."; ";
  
rename($tempfile1,$tempfile2);
  
exec($setpath.$command,$result,$retval); 
maybe that helps someone
*cheers*
sushie is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Phpdig hangs when asked to spider any url using 1.83 steviec Troubleshooting 0 02-15-2006 12:27 AM
phpdig spider hangs (a powerpoint file problem) davideyre Troubleshooting 1 03-29-2004 12:35 PM
Indexing hangs, nothing in db WunderStump Troubleshooting 6 02-25-2004 10:36 AM
pdf indexing with pstotext zevince External Binaries 22 01-12-2004 04:51 AM
PDF indexing lelandv External Binaries 15 12-08-2003 04:23 PM


All times are GMT -8. The time now is 10:49 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.