PDA

View Full Version : cannot index pdf's


tdsongo
09-14-2004, 12:31 PM
I am working on a win 2K3 box with PHP 4.3.8 running in ISAPI mode on IIS. I have gotten PHP dig installed and indexing everything but PDFs. I have run through the Checklist and can run Ghostscript at the command line on the Server. I have even added the pstottext files that come with GSView (seems to be the only way to it now) to the Ghostscript bin file. I can even extract text from a PDF through GSview which uses pstotext directly.

Question 1. what executable am I supposed to be running against. pstotxt3.exe or gswinC3.exe. This would seem to take care of the PHPDIG_PARSE_PDF paramater.

Question 2. What is the extension we are chasing. I know we start with a PDF but as I understand it PHPdig is after the text file. So I would think that the extension would be .txt. However Ghostscript converts to PS files and then pstotext converts to .txt.

Here is the script output.
Is result test an array: 1
What is result test status: PDF
Use is executable is set to: 1
Index the pdf is set to: 1
Parse the pdf is set to: D:\gs\gswin32c.exe
Does parse pdf exist: 1

Command is: D:\gs\gswin32c.exe -cork ../admin/temp/96216472.tmp
Result contains: Array ( [0] => AFPL Ghostscript 8.14 (2004-02-20) [1] => Copyright (C) 2004 artofcode LLC, Benicia, CA. All rights reserved. [2] => This software comes with NO WARRANTY: see the file PUBLIC for details. [3] => Error: /undefined in .. [4] => Operand stack: [5] => [6] => Execution stack: [7] => %interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- [8] => Dictionary stack: [9] => --dict:1109/1686(ro)(G)-- --dict:0/20(G)-- --dict:70/200(L)-- [10] => Current allocation mode is local [11] => Last OS error: No such file or directory )
Return value is: 1

4:http://www.duesc.org/prof_development/lpdc/lpdc_handbook.pdf

Any thoughts

Charter
09-14-2004, 01:14 PM
Hi. Try these settings and give it a whirl.

define('PHPDIG_INDEX_PDF',true); // obviously we need true here
define('PHPDIG_PARSE_PDF','C:\\ADD_PATH_TO\\pstotxt3.exe'); // add path
define('PHPDIG_OPTION_PDF',''); // two single quotes, no space between

define('PHPDIG_PDF_EXTENSION','.txt'); // don't forget the period in .txt