PhpDig.net - View Single Post - Junk in keywords table

Bege · 04-08-2004, 11:52 PM

I have 1.8.0 installed on Redhat Linux.
When I index my pdf files I get lots of junk in the keywords table. It finds the file ok, but I don't get anything worth any value. Below is a snippit of some of the data. I have used the sample .PDF from this site with no luck. I have read through most of the forums with no luck. I am using pdftotext to create my plaintext file. It dosen't support STDOUT, but does create a txt file that I can open and see that it did parse the file correctly. I have also included a little snippit of my config.php. It almost looks like it is getting the encoding wrong, does anyone have any ideas? Thanks much,
Bege

+----------------+
| keyword |
+----------------+
| 6aeyqo,n |
| E#b5 |
| k��de�� |
| 3Iqha:cp |
+----------------+

define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','/usr/local/bin/pdftotext');
define('PHPDIG_OPTION_PDF','');
define('PHPDIG_PDF_EXTENSION','txt');

04-08-2004, 11:52 PM	#1
Bege Green Mole Join Date: Nov 2003 Posts: 3	Junk in keywords table - Indexing PDF I have 1.8.0 installed on Redhat Linux. When I index my pdf files I get lots of junk in the keywords table. It finds the file ok, but I don't get anything worth any value. Below is a snippit of some of the data. I have used the sample .PDF from this site with no luck. I have read through most of the forums with no luck. I am using pdftotext to create my plaintext file. It dosen't support STDOUT, but does create a txt file that I can open and see that it did parse the file correctly. I have also included a little snippit of my config.php. It almost looks like it is getting the encoding wrong, does anyone have any ideas? Thanks much, Bege +----------------+ \| keyword \| +----------------+ \| 6aeyqo,n \| \| E#b5 \| \| k��de�� \| \| 3Iqha:cp \| +----------------+ define('PHPDIG_INDEX_PDF',true); define('PHPDIG_PARSE_PDF','/usr/local/bin/pdftotext'); define('PHPDIG_OPTION_PDF',''); define('PHPDIG_PDF_EXTENSION','txt');