PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   External Binaries (http://www.phpdig.net/forum/forumdisplay.php?f=36)
-   -   Junk in keywords table - Indexing PDF (http://www.phpdig.net/forum/showthread.php?t=793)

Bege 04-08-2004 11:52 PM

Junk in keywords table - Indexing PDF
 
I have 1.8.0 installed on Redhat Linux.
When I index my pdf files I get lots of junk in the keywords table. It finds the file ok, but I don't get anything worth any value. Below is a snippit of some of the data. I have used the sample .PDF from this site with no luck. I have read through most of the forums with no luck. I am using pdftotext to create my plaintext file. It dosen't support STDOUT, but does create a txt file that I can open and see that it did parse the file correctly. I have also included a little snippit of my config.php. It almost looks like it is getting the encoding wrong, does anyone have any ideas? Thanks much,
Bege

+----------------+
| keyword |
+----------------+
| 6aeyqo,n |
| E#b5 |
| k��de�� |
| 3Iqha:cp |
+----------------+

define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','/usr/local/bin/pdftotext');
define('PHPDIG_OPTION_PDF','');
define('PHPDIG_PDF_EXTENSION','txt');

Bege 04-09-2004 12:13 AM

More Research
 
I have looked at the temp files in the text_content dir, and all of the junk that i am getting in the database is in this file. How is the file getting created? When i run the pdftotext in bash everything works just fine, what is the difference?

Charter 04-09-2004 07:15 AM

Hi. Don't forget the period...
PHP Code:

define('PHPDIG_PDF_EXTENSION','.txt'); 

Try man pdftotext and look for enc to set the encoding if needed.


All times are GMT -8. The time now is 08:18 AM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.