Go Back > PhpDig Forums > External Binaries

Thread Tools
Old 04-08-2004, 11:52 PM   #1
Green Mole
Join Date: Nov 2003
Posts: 3
Angry Junk in keywords table - Indexing PDF

I have 1.8.0 installed on Redhat Linux.
When I index my pdf files I get lots of junk in the keywords table. It finds the file ok, but I don't get anything worth any value. Below is a snippit of some of the data. I have used the sample .PDF from this site with no luck. I have read through most of the forums with no luck. I am using pdftotext to create my plaintext file. It dosen't support STDOUT, but does create a txt file that I can open and see that it did parse the file correctly. I have also included a little snippit of my config.php. It almost looks like it is getting the encoding wrong, does anyone have any ideas? Thanks much,

| keyword |
| 6aeyqo,n |
| E#b5 |
| k��de�� |
| 3Iqha:cp |

Bege is offline   Reply With Quote
Old 04-09-2004, 12:13 AM   #2
Green Mole
Join Date: Nov 2003
Posts: 3
More Research

I have looked at the temp files in the text_content dir, and all of the junk that i am getting in the database is in this file. How is the file getting created? When i run the pdftotext in bash everything works just fine, what is the difference?
Bege is offline   Reply With Quote
Old 04-09-2004, 07:15 AM   #3
Head Mole
Charter's Avatar
Join Date: May 2003
Posts: 2,539
Hi. Don't forget the period...
PHP Code:
Try man pdftotext and look for enc to set the encoding if needed.
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Not indexing pages, keywords, etc.. patrick@online- Troubleshooting 5 04-15-2006 02:10 AM
keywords missing after indexing 123av Troubleshooting 2 10-21-2004 08:28 AM
excluding keywords from indexing Fking How-to Forum 1 10-05-2004 05:43 PM
Indexing finds all pages, but doesn't index all keywords arakune Troubleshooting 2 08-25-2004 05:18 PM
Reduce duplicates in keywords table through more intelligent indexing jerrywin5 Mod Requests 1 04-20-2004 08:06 AM

All times are GMT -8. The time now is 10:57 PM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2015, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.