PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   External Binaries (http://www.phpdig.net/forum/forumdisplay.php?f=36)
-   -   Search PDF files (http://www.phpdig.net/forum/showthread.php?t=106)

chazter 09-29-2003 11:32 AM

Search PDF files
 
Im a newbie at this and I am following the instructions per documentation but there is one part that I am not clear on
Quote:

3.3. File types wich can be indexed PhpDig indexes HTML and text files by itself.
PhpDig could index PDF, MS-Word and MS-Excel files if you install external binaries on the spidering machines to this purpose. PhpDig could index PDF, MS-Word and MS-Excel files if you install external binaries on the spidering machines to this purpose.
I dont understand the "PhpDig could index PDF, MS-Word and MS-Excel files if you install external binaries on the spidering machines to this purpose."

I have access of my files when I FTP to a directory that my webhost gives me, but as for adding external binaries, I am not sure.

All my pdf files are in a specific directory buy how does somebody search a particular pdf file?

If anyone can give me clarification or instructions on how to do this, I would really appreciate it.

Thanks in advance.

Charter 09-29-2003 04:54 PM

Hi. External binaries are certain programs that your host may, or may not, have to convert PDF/DOC/XLS files to text files.

Here is a short list of such external binaries and their uses:

Code:

name        purpose
-----------------------------------
catdoc      convert DOC to TXT
pstotext    convert PS/PDF to TXT
pdftotext    convert PDF to TXT
xls2csv      convert XLS to CSV

If you know, or can find, the path to such external binaries from your host, then just use that path in the appropriate defintion in the config file.

If your host doesn't have such external binaries, or you cannot find the path, then you could FTP them to one of your directories, and then include that path in the appropriate defintion in the config file.

Depending on the type of output that the external binaries produce, you may find this thread useful. Also, this thread may be useful.

chazter 09-29-2003 06:08 PM

Quote:

Originally posted by Charter
Hi. External binaries are certain programs that your host may, or may not, have to convert PDF/DOC/XLS files to text files.

Here is a short list of such external binaries and their uses:

Code:

name        purpose
-----------------------------------
catdoc      convert DOC to TXT
pstotext    convert PS/PDF to TXT
pdftotext    convert PDF to TXT
xls2csv      convert XLS to CSV

If you know, or can find, the path to such external binaries from your host, then just use that path in the appropriate defintion in the config file.

If your host doesn't have such external binaries, or you cannot find the path, then you could FTP them to one of your directories, and then include that path in the appropriate defintion in the config file.

Depending on the type of output that the external binaries produce, you may find this thread useful. Also, this thread may be useful.


Thanks for the reply . A couple of follow-up questions.

1. I am having a hard time contacting and getting answers from my ISP. Where do I get the binary "pdftotext"?

2. Once I get it what do I do with it. Do I create a directory called PDFTOTEXT in my website root directory and put the file there?

3. Once I put it there, do I run anything? and I assume I would have to configure my config file to point to that path.

Sorry for asking these questions if they seem obvious.

Thanks again in Advance

Charter 10-01-2003 06:44 PM

1. To download the binary pdftotext, just find the one you need from Google.

2. You can place the binary pdftotext file wherever you'd like.

3. If you download the binary pdftotext, then it's ready to use, so just put the path to it in the config file.

This thread may also be useful.

chazter 10-02-2003 06:47 AM

Thanks Charter,

I talked to my ISP and found out that they had the external binaries installed. I did configure the path as you suggested and the links were helpful too.

Thanks for the other suggestion regarding Google as others may encounter the problem in the future.

Have a great day.


All times are GMT -8. The time now is 10:43 AM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.