Thread: PDF indexing
View Single Post
Old 12-07-2003, 12:52 PM   #11
lelandv
Green Mole
 
Join Date: Dec 2003
Posts: 11
Quote:
Originally posted by Charter
Hi. Just "pdf2txt myfile.pdf" with no .cgi or .pl extension? How does it know to treat it as a perl program?

Try using pdftohtml in define('PHPDIG_PARSE_PDF','/usr/local/bin/pdftohtml'); because I'm thinking PhpDig should clean the results of tags.
the permissions on the wrapper are 0755 (executable) and the first line contains #!/usr/bin/perl forcing the shell to use perl to execute it.

For example, if you do it from the command line itself:

leland@taranta:~/public_html/pdftest> /usr/local/bin/pdf2txt InstrumentPilot39.pdf

Engine Management
1
Intelligence Reports
2
Bashing the Beam
6
European Flight Planning
8
Dew Point Review
10
PPL/IR Europe Web Site
12
14
Bert Maes and I attended the engine efficiency and many others. It was very

<snip>

---
Having said that, I've just added a little hook in the wrapper to detect if the wrapper has even been called, but it looks like the spider isn't even attempting to use it.

Despite the settings in config.php:
define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','/usr/local/bin/pdf2txt');
define('PHPDIG_OPTION_PDF','');

the externals are called with "exec" are they not? If they are, then it should at least fall into the trap, but it looks as if it's not even getting that far.


L.
lelandv is offline   Reply With Quote