PDA

View Full Version : Indexing PDFs doesen't really work


N100101
06-18-2004, 10:40 AM
OS: Linux
PHP Version 4.3.2


**********************************
Spidering in progress...

SITE : http://localhost/
Exclude paths :
- @NONE@

Is result test an array: 1
What is result test status: PDF
Use is executable is set to: 1
Index the pdf is set to: 1
Parse the pdf is set to: /usr/local/bin/pdftotext
Does parse pdf exist: 1
Is parse pdf executable: 1

Command is: /usr/local/bin/pdftotext ../admin/temp/7672.tmp
Result contains: Array ( )
Return value is: 3

1:http://localhost/pub/info/info_st.pdf
(time : 00:00:05)
No link in temporary table

links found : 1
http://localhost/pub/info/info_st.pdf
Optimizing tables...
Indexing complete
**********************************

Indexing via terminal works without any problems.

Any hints?

Thanks in advance.

Charter
06-18-2004, 01:46 PM
Hi. In robot_functions.php try changing:

$command = PHPDIG_PARSE_PDF.' '.PHPDIG_OPTION_PDF.' '.$tempfile2;

to the following:

$command = PHPDIG_PARSE_PDF.' '.PHPDIG_OPTION_PDF.' '.$tempfile2.' 2>&1';

And see if it will echo the problem.

N100101
06-18-2004, 03:32 PM
Here is the result:

Command is: /usr/local/bin/pdftotext ../admin/temp/5952.tmp 2>&1
Result contains: Array ( [0] => Error: Bad annotation action [1] => Error: Copying of text from this document is not allowed. )
Return value is: 3

Hm, what does this mean? :confused:

N100101
06-18-2004, 04:59 PM
Arrgh, sure that PDF cannot be copied... :bang:

I have tested it with another PDF and it works!

Thanks a lot.