PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > External Binaries

Reply
 
Thread Tools
Old 05-26-2004, 09:07 PM   #1
freak
Green Mole
 
Join Date: May 2004
Posts: 1
Unhappy problem with pdftotext

hello there,

I'm having problems indexing pdfs. I already read most of the post in here and didn't find where is the problem.

I'm using Apache 2.0.45 + PHP 4.3.6 running on Windows2k SP4. I just downloaded xpdf-3.00-win32 and extract the pdftotext.exe file.

This is my config file:

define('USE_IS_EXECUTABLE_COMMAND','0');
...
define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','C:\\Apache Group\\Apache2\\htdocs\\phpdig\\bin\\pdftotext.exe');
define('PHPDIG_OPTION_PDF','');
...
define('PHPDIG_PDF_EXTENSION','.txt');

and this what i got when i try to index a local site with one page that has only one link to a pdf file.

I put the extracode for debugging...

--------------------------------------------------------------------------------
SITE : http://ivan02/
Exclude paths :
- @NONE@


Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: C:\Apache Group\Apache2\htdocs\phpdig\bin\pdftotext.exe
Does parse pdf exist: 1
1:http://ivan02/test/
(time : 00:00:06)
+
level 1...


Is result test an array: 1
What is result test status: PDF
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: C:\Apache Group\Apache2\htdocs\phpdig\bin\pdftotext.exe
Does parse pdf exist: 1

Command is :C:\Apache Group\Apache2\htdocs\phpdig\bin\pdftotext.exe ../admin/temp/61216332.tmp

Result contains: Array ( )
Return value is: 1

2:http://ivan02/test/proy01.pdf
(time : 00:00:16)

No link in temporary table

--------------------------------------------------------------------------------

links found : 2
http://ivan02/test/
http://ivan02/test/proy01.pdf
Optimizing tables...
Indexing complete !
--------------------------------------------------------------------------------

The spider find the pdf file but doesn't extract any content from it. Also there is no marks before the link number 2. I mean there is no "good mark" and no "bad mark".

Could somebody please help me? I just don't know what's going on here..

Thanks!

PS: Please excuse my english!
__________________
FrEAk YoUR MiNd!

Last edited by freak; 05-26-2004 at 09:13 PM.
freak is offline   Reply With Quote
Old 06-02-2004, 06:20 AM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Perhaps the space in Apache Group is causing the command not to execute correctly. Try renaming Apache Group to ApacheGroup or try quoting the path in the PHPDIG_PARSE_PDF constant.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
pdftotext on VPS FreeBSD adimpact External Binaries 0 09-19-2007 10:33 AM
pdftotext issue JonnyNoog External Binaries 6 07-14-2006 11:40 PM
pdftotext no solution Art External Binaries 7 04-11-2005 04:39 AM
A question about pdftotext installation , thanks m(_ _)m mynamesucks External Binaries 5 02-22-2005 10:00 PM
not indexing with pdftotext davideyre External Binaries 2 03-30-2004 12:55 PM


All times are GMT -8. The time now is 08:59 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.