Thread: indexing pdf
View Single Post
Old 02-16-2004, 01:37 PM   #3
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. From http://research.compaq.com/SRC/virtu.../pstotext.html ...

pstotext is a program that works with Ghostscript (version 3.33 or later) to extract plain text from PostScript and PDF files (you should have Ghostscript 3.51 or later for PDF).

PHP version 4.2.2/3 seems to have issue with running exec pdftotext as in this thread, but I am not sure if pstotext would have the same problem.

The PHP strip_tags function was replaced with a regular expression in version 1.6.3. Version 1.8.0 should not index HTML tags.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote