No.. I'm using actually pdftohtml since the pdftotext and xpdf doesn't support encrypted PDF's. The only utility that I have available to do this is pdftohtml which creates an output in HTML (to STDOUT). I therefore use a wrapper around it and call the binary from the wrapper. The wrapper removes the HTML tags leaving only the plain text... just a simple 4-line perl script:
#!/usr/bin/perl
$filename = shift;
$output = `/usr/local/bin/pdftohtml -i -stdout -noframes $filename`;
$output =~ s/<.*>//g;
print $output;
As a result, to get the text out of the pdf, simply "pdf2txt myfile.pdf" at the command line, and it outputs the text to STDOUT.
Noted on the freshmeat site... guess I should have waited a day before downloading it then
Really need to sort out this PDF indexing issue though... it's annoying and I really need for the search engine to be able to search based on the contents of a PDF file... there are several other spiders/search-engines available in Php, BUT none of them can do as comprehensive indexing and searching as phpdig...
Leland