PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > External Binaries

Reply
 
Thread Tools
Old 04-08-2004, 11:52 PM   #1
Bege
Green Mole
 
Join Date: Nov 2003
Posts: 3
Angry Junk in keywords table - Indexing PDF

I have 1.8.0 installed on Redhat Linux.
When I index my pdf files I get lots of junk in the keywords table. It finds the file ok, but I don't get anything worth any value. Below is a snippit of some of the data. I have used the sample .PDF from this site with no luck. I have read through most of the forums with no luck. I am using pdftotext to create my plaintext file. It dosen't support STDOUT, but does create a txt file that I can open and see that it did parse the file correctly. I have also included a little snippit of my config.php. It almost looks like it is getting the encoding wrong, does anyone have any ideas? Thanks much,
Bege

+----------------+
| keyword |
+----------------+
| 6aeyqo,n |
| E#b5 |
| k��de�� |
| 3Iqha:cp |
+----------------+

define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','/usr/local/bin/pdftotext');
define('PHPDIG_OPTION_PDF','');
define('PHPDIG_PDF_EXTENSION','txt');
Bege is offline   Reply With Quote
Old 04-09-2004, 12:13 AM   #2
Bege
Green Mole
 
Join Date: Nov 2003
Posts: 3
More Research

I have looked at the temp files in the text_content dir, and all of the junk that i am getting in the database is in this file. How is the file getting created? When i run the pdftotext in bash everything works just fine, what is the difference?
Bege is offline   Reply With Quote
Old 04-09-2004, 07:15 AM   #3
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Don't forget the period...
PHP Code:
define('PHPDIG_PDF_EXTENSION','.txt'); 
Try man pdftotext and look for enc to set the encoding if needed.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Not indexing pages, keywords, etc.. patrick@online- Troubleshooting 5 04-15-2006 02:10 AM
keywords missing after indexing 123av Troubleshooting 2 10-21-2004 08:28 AM
excluding keywords from indexing Fking How-to Forum 1 10-05-2004 05:43 PM
Indexing finds all pages, but doesn't index all keywords arakune Troubleshooting 2 08-25-2004 05:18 PM
Reduce duplicates in keywords table through more intelligent indexing jerrywin5 Mod Requests 1 04-20-2004 08:06 AM


All times are GMT -8. The time now is 12:05 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.