PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > External Binaries

Reply
 
Thread Tools
Old 07-13-2006, 05:57 PM   #1
JonnyNoog
Green Mole
 
Join Date: Oct 2005
Posts: 12
pdftotext issue

Hi,

I am trying to get pdftotext to work with phpdig. I have followed the instructions in the sticky at the top of the forum section and the output I am getting is this:


Is result test http an array: 1
What is result test http status: HTML

Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 1
Index the pdf is set to: 1
Parse the pdf is set to: D:\Internet\WWWROOT\anmc\Xpdf\pdftotext.exe
Does parse pdf exist: 1


This is all I get if I try to re-index the whole site or if I try to reindex only a sub-section of the site. As you can see by the path, I am unfortunately forced to be installing on a Windows box.

I have tried out pdftotext via the command line and it appears to work... It makes a text file in the Xpdf dir that contains the expected text from the PDF I gave it.

I've searched the forum repeatedly, but nothing I have yet found has solved my problem, any help would be greatly appreciated.
JonnyNoog is offline   Reply With Quote
Old 07-13-2006, 06:22 PM   #2
sandychan
Green Mole
 
Join Date: Jul 2006
Posts: 9
May I know your system configuration?
sandychan is offline   Reply With Quote
Old 07-13-2006, 07:49 PM   #3
JonnyNoog
Green Mole
 
Join Date: Oct 2005
Posts: 12
IIS 5 with PHP Version 4.3.1 (CGI I think)

MySQL 3.23.52

Not sure what else is relevant...?
JonnyNoog is offline   Reply With Quote
Old 07-13-2006, 08:05 PM   #4
JonnyNoog
Green Mole
 
Join Date: Oct 2005
Posts: 12
Erm... PhpDig v.1.8.8, that's probably relevant, hey .

Is there a way to edit posts on this forum by the way? Can't seem to see the option... Or am I just having a blonde day?
JonnyNoog is offline   Reply With Quote
Old 07-14-2006, 01:21 AM   #5
JonnyNoog
Green Mole
 
Join Date: Oct 2005
Posts: 12
Well after much stuffing around, I have now installed PHP 5. The is_executable() function not being available for PHP 4 with Windows as I have found out (only took me like 4 hours to get that all worked out! ). So I now am getting the output as below:


Is result test http an array: 1
What is result test http status: HTML

Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 1
Index the pdf is set to: 1
Parse the pdf is set to: D:\Internet\WWWROOT\anmc\Xpdf\pdftotext.exe
Does parse pdf exist: 1
Is parse pdf executable: 1


Still no PDF indexing action to be seen. Any help much appreciated, I think I'm going to now get as far away from the computer as possible before I smash it with a hammer.
JonnyNoog is offline   Reply With Quote
Old 07-14-2006, 09:21 PM   #6
JonnyNoog
Green Mole
 
Join Date: Oct 2005
Posts: 12
So coming back to my problem with fresh eyes, it looks like the extra lines in robot_functions.php:

Code:
echo "<br>Command is: " . $command . "<br>";
echo "Result contains: ";
print_r($result);
echo "<br>Return value is: " . $retval . "<br><br>";
Are not being run... Which would lead me to think that the switch statement in robot_functions.php (switch ($result_test['status']) is not running and setting $usetool to true.

Any help, any help at all would be greatly appreciated at this point. If I can't get PDF indexing working with phpdig then I'll be forced to use some other search engine and I really like phpdig! I'm really not any kind of PHP guru at all and I have a suspicion that perhaps my problem stems from the fact the I am forced to be setting phpdig and pdftotext up on a Windows system with IIS... Perhaps some kind of permission problem with the pdftotext executable and the php exec() function, I don't know...
JonnyNoog is offline   Reply With Quote
Old 07-14-2006, 11:40 PM   #7
JonnyNoog
Green Mole
 
Join Date: Oct 2005
Posts: 12
Well... Following on down the path from my last post, $result_test['status'] was not being set as 'PDF' so the switch statement was not in turn running the 'PDF' case. So I wanted to see what would happen if I told phpdig index the full address to a particular PDF.

Is result test http an array: 1
What is result test http status: PDF

Code:
Is result test an array: 1
What is result test status: PDF
Use is executable is set to: 1
Index the pdf is set to: 1
Parse the pdf is set to: D:\Internet\WWWROOT\anmc\Xpdf\pdftotext.exe
Does parse pdf exist: 1
Is parse pdf executable: 1

Command is: D:\Internet\WWWROOT\anmc\Xpdf\pdftotext.exe ../admin/temp/67264762.tmp 2>&1
Result contains: Array ( )
Return value is: 0

5:http://XXX/docs/Modified_Form_A_0607.pdf
(time : 00:00:11)
Success! It indexed the PDF. So it now seems that in fact, pdftotext was working the whole time and the problem was that it just wasn't finding the PDF files to index in the first place, because I hadn't set phpdig to look for enough links on each level... I think.

But all's well that ends well I guess. Too bad I can't rename this thread to the jonny vs. jonny thread...

How are you doing now jonny?

I'm doing well thanks, jonny...

That's great to hear, jonny. Take care.
JonnyNoog is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Spidering issue cefiro How-to Forum 0 02-28-2005 09:01 AM
Indexing Issue tajmahal Troubleshooting 8 02-19-2005 11:03 AM
config issue baskamer Troubleshooting 2 12-18-2004 12:33 PM
Installation issue... again jinx Script Installation 1 06-14-2004 08:31 PM
pstotext issue killer27 External Binaries 7 05-12-2004 01:28 PM


All times are GMT -8. The time now is 01:18 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.