pdftotext issue
Hi,
I am trying to get pdftotext to work with phpdig. I have followed the instructions in the sticky at the top of the forum section and the output I am getting is this: Is result test http an array: 1 What is result test http status: HTML Is result test an array: 1 What is result test status: HTML Use is executable is set to: 1 Index the pdf is set to: 1 Parse the pdf is set to: D:\Internet\WWWROOT\anmc\Xpdf\pdftotext.exe Does parse pdf exist: 1 This is all I get if I try to re-index the whole site or if I try to reindex only a sub-section of the site. As you can see by the path, I am unfortunately forced to be installing on a Windows box. I have tried out pdftotext via the command line and it appears to work... It makes a text file in the Xpdf dir that contains the expected text from the PDF I gave it. I've searched the forum repeatedly, but nothing I have yet found has solved my problem, any help would be greatly appreciated. :) |
May I know your system configuration?
|
IIS 5 with PHP Version 4.3.1 (CGI I think)
MySQL 3.23.52 Not sure what else is relevant...? |
Erm... PhpDig v.1.8.8, that's probably relevant, hey :).
Is there a way to edit posts on this forum by the way? Can't seem to see the option... Or am I just having a blonde day? |
Well after much stuffing around, I have now installed PHP 5. The is_executable() function not being available for PHP 4 with Windows as I have found out (only took me like 4 hours to get that all worked out! :what: ). So I now am getting the output as below:
Is result test http an array: 1 What is result test http status: HTML Is result test an array: 1 What is result test status: HTML Use is executable is set to: 1 Index the pdf is set to: 1 Parse the pdf is set to: D:\Internet\WWWROOT\anmc\Xpdf\pdftotext.exe Does parse pdf exist: 1 Is parse pdf executable: 1 Still no PDF indexing action to be seen. Any help much appreciated, I think I'm going to now get as far away from the computer as possible before I smash it with a hammer. :angry: |
So coming back to my problem with fresh eyes, it looks like the extra lines in robot_functions.php:
Code:
echo "<br>Command is: " . $command . "<br>"; Any help, any help at all would be greatly appreciated at this point. If I can't get PDF indexing working with phpdig then I'll be forced to use some other search engine and I really like phpdig! :yes: I'm really not any kind of PHP guru at all and I have a suspicion that perhaps my problem stems from the fact the I am forced to be setting phpdig and pdftotext up on a Windows system with IIS... Perhaps some kind of permission problem with the pdftotext executable and the php exec() function, I don't know... :bang: |
Well... Following on down the path from my last post, $result_test['status'] was not being set as 'PDF' so the switch statement was not in turn running the 'PDF' case. So I wanted to see what would happen if I told phpdig index the full address to a particular PDF.
Is result test http an array: 1 What is result test http status: PDF Code:
Is result test an array: 1 But all's well that ends well I guess. Too bad I can't rename this thread to the jonny vs. jonny thread... How are you doing now jonny? I'm doing well thanks, jonny... That's great to hear, jonny. Take care. |
All times are GMT -8. The time now is 10:03 PM. |
Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.