PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > External Binaries

Reply
 
Thread Tools
Old 09-29-2003, 11:32 AM   #1
chazter
Green Mole
 
Join Date: Sep 2003
Posts: 8
Search PDF files

Im a newbie at this and I am following the instructions per documentation but there is one part that I am not clear on
Quote:
3.3. File types wich can be indexed PhpDig indexes HTML and text files by itself.
PhpDig could index PDF, MS-Word and MS-Excel files if you install external binaries on the spidering machines to this purpose. PhpDig could index PDF, MS-Word and MS-Excel files if you install external binaries on the spidering machines to this purpose.
I dont understand the "PhpDig could index PDF, MS-Word and MS-Excel files if you install external binaries on the spidering machines to this purpose."

I have access of my files when I FTP to a directory that my webhost gives me, but as for adding external binaries, I am not sure.

All my pdf files are in a specific directory buy how does somebody search a particular pdf file?

If anyone can give me clarification or instructions on how to do this, I would really appreciate it.

Thanks in advance.
chazter is offline   Reply With Quote
Old 09-29-2003, 04:54 PM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. External binaries are certain programs that your host may, or may not, have to convert PDF/DOC/XLS files to text files.

Here is a short list of such external binaries and their uses:

Code:
name         purpose
-----------------------------------
catdoc       convert DOC to TXT
pstotext     convert PS/PDF to TXT
pdftotext    convert PDF to TXT
xls2csv      convert XLS to CSV
If you know, or can find, the path to such external binaries from your host, then just use that path in the appropriate defintion in the config file.

If your host doesn't have such external binaries, or you cannot find the path, then you could FTP them to one of your directories, and then include that path in the appropriate defintion in the config file.

Depending on the type of output that the external binaries produce, you may find this thread useful. Also, this thread may be useful.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 09-29-2003, 06:08 PM   #3
chazter
Green Mole
 
Join Date: Sep 2003
Posts: 8
Quote:
Originally posted by Charter
Hi. External binaries are certain programs that your host may, or may not, have to convert PDF/DOC/XLS files to text files.

Here is a short list of such external binaries and their uses:

Code:
name         purpose
-----------------------------------
catdoc       convert DOC to TXT
pstotext     convert PS/PDF to TXT
pdftotext    convert PDF to TXT
xls2csv      convert XLS to CSV
If you know, or can find, the path to such external binaries from your host, then just use that path in the appropriate defintion in the config file.

If your host doesn't have such external binaries, or you cannot find the path, then you could FTP them to one of your directories, and then include that path in the appropriate defintion in the config file.

Depending on the type of output that the external binaries produce, you may find this thread useful. Also, this thread may be useful.

Thanks for the reply . A couple of follow-up questions.

1. I am having a hard time contacting and getting answers from my ISP. Where do I get the binary "pdftotext"?

2. Once I get it what do I do with it. Do I create a directory called PDFTOTEXT in my website root directory and put the file there?

3. Once I put it there, do I run anything? and I assume I would have to configure my config file to point to that path.

Sorry for asking these questions if they seem obvious.

Thanks again in Advance
chazter is offline   Reply With Quote
Old 10-01-2003, 06:44 PM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
1. To download the binary pdftotext, just find the one you need from Google.

2. You can place the binary pdftotext file wherever you'd like.

3. If you download the binary pdftotext, then it's ready to use, so just put the path to it in the config file.

This thread may also be useful.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 10-02-2003, 06:47 AM   #5
chazter
Green Mole
 
Join Date: Sep 2003
Posts: 8
Thanks Charter,

I talked to my ISP and found out that they had the external binaries installed. I did configure the path as you suggested and the links were helpful too.

Thanks for the other suggestion regarding Google as others may encounter the problem in the future.

Have a great day.
chazter is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
problems making phpdig able to search pdf files butterivenne Script Installation 1 07-29-2008 01:26 PM
searching PDF files bcunico External Binaries 3 02-24-2006 01:40 AM
problem with .pdf and .doc files mleray External Binaries 11 12-09-2004 10:26 PM
Not indexing pdf files jayhawk External Binaries 13 02-18-2004 05:13 AM
How to index a directory with pdf files simonced How-to Forum 3 02-13-2004 10:41 AM


All times are GMT -8. The time now is 09:46 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.