PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   External Binaries (http://www.phpdig.net/forum/forumdisplay.php?f=36)
-   -   indexing pdf (http://www.phpdig.net/forum/showthread.php?t=557)

philippeguerind 02-20-2004 03:46 PM

indexing pdf
 
Hi from France,
You will excuse my english. I can't make phpdig indexing pdf files.
I put the following lines into the config.php file.

define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','./pdftotext');
define('PHPDIG_OPTION_PDF','');
define('PHPDIG_PDF_EXTENSION','.txt');

as pdftotext.exe is located at the root. The indexing works perfectly with html files even with ASCII files but not with pdf files. My web site is located on Lycos server.
I uploaded pdftotext.exe at the root, then set permissions to 755. When I run phpdir from the administration panel asking to dig a pdf file indicating the full path, I get a green sign in front indicating the file is indexed. When I search for any word inside the pdf file I get no record.
What could I try? I have been looking at this Forum for weeks before posting. Now I have no more ideas.
Thanks for helping a novice.
Philippe.

tomas 02-20-2004 04:04 PM

hi philippe,

is your server running on windows or unix/linux ?

tomas

philippeguerind 02-20-2004 04:21 PM

Thank's. Lycos Servers are running Unix. I use 1.6.4 phpdig version.
Philippe

Charter 02-20-2004 04:27 PM

Hi. Does Lycos allow commands such as exec to run on its servers?

philippeguerind 02-20-2004 04:36 PM

Hi, I don't know. I just asked their support service by posting a thread. I'm waiting for the answer ...
Philippe

philippeguerind 02-20-2004 04:49 PM

If running exec is not allowed, is there any wy I could run pdftotext onto my PC as a shell ?
Philippe

Charter 02-20-2004 05:38 PM

Hi. Perhaps check at the following link for a version that would work with your PC:

http://www.foolabs.com/xpdf/download.html

tomas 02-20-2004 05:56 PM

hello,

philippe - try this setting:
define('PHPDIG_PDF_EXTENSION','');

run spider and take a look into text_content directory -
are there temp-files? are they empty?

after this test reset to:
define('PHPDIG_PDF_EXTENSION','.txt');

what is your servers php-version?

tomas

Charter 02-20-2004 06:02 PM

OT: Thanks tomas for helping! :D

tomas 02-20-2004 06:24 PM

hello again philippe,

in your first post you wrote "pdftotext.exe" -
it seems that you installed the dos-version on an unix-server???

the unix download is:
http://www.foolabs.com/xpdf/download.html
x86, Linux (glibc 2.2, staticly linked to Motif, t1lib, and FreeType 2):
xpdf-3.00-linux.tar.gz (4544077 bytes)

tomas

philippeguerind 02-21-2004 05:58 AM

I wasn't using the unix version of pdftotext.
Now I do. is the line below is correct? as www is my root. It still doesn't work but I still go on ...

define('PHPDIG_PARSE_PDF','./usr/local/bin/pdftotext');
Philippe

tomas 02-21-2004 10:50 AM

hi philippe,

i don't think so -
please try this:
1) upload: pdftotext binary into the same folder where phpdig is
2) set: 755 permissions for pdftotext and admin/temp
3) set: define('PHPDIG_PARSE_PDF','/path/to/pdftotext');
4) set: define('PHPDIG_PDF_EXTENSION','');

run spider and take a look into text_content directory -
are there temp-files? are they empty?

kind regards
tomas


All times are GMT -8. The time now is 08:49 AM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.