Thread: PDF indexing
View Single Post
Old 12-07-2003, 10:05 AM   #1
lelandv
Green Mole
 
Join Date: Dec 2003
Posts: 11
Quote:
Originally posted by Charter
Hi. Delete anything in the temp directory, and then try setting the following in the config file:

define('PHPDIG_PDF_EXTENSION','.txt');
Hi.. I have a similar problem to the other poster. Difference here is that the debug test, it does successfully detect that it's a PDF file, and creates the temporary file and promptly deletes it again.

I have added the define above as per the previous problem, the but the contents of the PDF are still not indexed. I'm using "pdftohtml" with a wrapper which removes all HTML formatting resulting in PDF -> TEXT. (syntax: pdf2txt file.pdf --- which results in a STDOUT output of plain text).

Of course in the database, there is no hint of the contents of the PDF file, thus not indexed... just the filename itself (which is not really what we want here.)

Any help would be appreciated.



Leland
lelandv is offline   Reply With Quote