PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > External Binaries

Reply
 
Thread Tools
Old 02-12-2004, 10:23 AM   #1
jayhawk
Green Mole
 
Join Date: Feb 2004
Posts: 13
Not indexing pdf files

I am using pdftotext to index my pdf files. It converts the pdf to a txt file. I can do this successfully from the command prompt. However, when I try to index my site with phpdig it does not index the txt file. I have the following set in my config.php file:

define('PHPDIG_PDF_EXTENSION','.txt');

define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','F:/internet/search/pdftotext/pdftotext');
define('PHPDIG_OPTION_PDF','');

Any suggesions?
jayhawk is offline   Reply With Quote
Old 02-13-2004, 07:13 AM   #2
jayhawk
Green Mole
 
Join Date: Feb 2004
Posts: 13
Any ideas? Anyone?

jayhawk is offline   Reply With Quote
Old 02-13-2004, 09:57 AM   #3
tomas
Orange Mole
 
Join Date: Feb 2004
Posts: 47
hi jayhhawk,

define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','/path/to/your/pdftotext');
define('PHPDIG_OPTION_PDF','');

//---------EXTERNAL TOOLS EXTENSIONS
define('PHPDIG_PDF_EXTENSION','.txt');

this settings shold work - please make shure that here:
define('PHPDIG_OPTION_PDF','');
after the comma there are two single quotes!

hope this helps :-)
tomas
tomas is offline   Reply With Quote
Old 02-13-2004, 10:50 AM   #4
jayhawk
Green Mole
 
Join Date: Feb 2004
Posts: 13
They are two single quotes. I have been trying to track down the problem. One thing to note is that when I index the site it lists the url for the pdf, but it does not have a green checkmark next to it. Does that provide any clues to the problem I am having?
jayhawk is offline   Reply With Quote
Old 02-14-2004, 11:43 AM   #5
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Perhaps check that the permissions are 755 for the directories to pdftotext and also for the pdftotext file.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 02-16-2004, 12:58 PM   #6
jayhawk
Green Mole
 
Join Date: Feb 2004
Posts: 13
Permissions are full control (just to see if I can get it to work). Still no luck.
jayhawk is offline   Reply With Quote
Old 02-16-2004, 01:48 PM   #7
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. What version of PHP are you running? Perhaps you are experiencing the same problem as in this thread.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 02-16-2004, 02:09 PM   #8
jayhawk
Green Mole
 
Join Date: Feb 2004
Posts: 13
I'm running PHP version 4.3.4.
jayhawk is offline   Reply With Quote
Old 02-16-2004, 02:18 PM   #9
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Try echoing out the statements like was done in this thread. What do you get?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 02-16-2004, 02:43 PM   #10
jayhawk
Green Mole
 
Join Date: Feb 2004
Posts: 13
Here is what I get:


Is result test http an array: 1
What is result test http status: HTML



Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: F:\dhi-internet\search\Ghostgum\pstotxt\pstotxt3
Does parse pdf exist:
1:http://dhi-internet/
(time : 00:00:08)
+ + + + +
2: http://dhi-internet/ Was recently indexed
(time : 00:00:14)

3: http://dhi-internet/ Was recently indexed
(time : 00:00:19)

4: http://dhi-internet/ Was recently indexed
(time : 00:00:24)

level 1...


Is result test http an array: 1
What is result test http status: HTML



Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: F:\dhi-internet\search\Ghostgum\pstotxt\pstotxt3
Does parse pdf exist:
5:http://dhi-internet/index.php?=PHPB8...9-4C7B08C10000
(time : 00:00:35)


Is result test http an array: 1
What is result test http status: PDF



Is result test an array: 1
What is result test status: PDF
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: F:\dhi-internet\search\Ghostgum\pstotxt\pstotxt3
Does parse pdf exist:
6:http://dhi-internet/test/acobook.pdf
(time : 00:00:40)


Is result test http an array: 1
What is result test http status: PDF



Is result test an array: 1
What is result test status: PDF
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: F:\dhi-internet\search\Ghostgum\pstotxt\pstotxt3
Does parse pdf exist:
7:http://dhi-internet/docs/seanresume0204.pdf
(time : 00:00:45)


Is result test http an array: 1
What is result test http status: HTML



Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: F:\dhi-internet\search\Ghostgum\pstotxt\pstotxt3
Does parse pdf exist:
8:http://dhi-internet/test/regs.html
(time : 00:00:53)



Is result test http an array: 1
What is result test http status: HTML



Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: F:\dhi-internet\search\Ghostgum\pstotxt\pstotxt3
Does parse pdf exist:
9:http://dhi-internet/test/dhi.html
(time : 00:01:00)

No link in temporary table
jayhawk is offline   Reply With Quote
Old 02-16-2004, 03:16 PM   #11
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. It looks like the following is not returning a value:
PHP Code:
echo "Does parse pdf exist: " file_exists(PHPDIG_PARSE_PDF) . "<br>"
Try setting different paths in the following code, run it from the browser, and then use the path that produces "Does parse pdf exist: 1" onscreen.
PHP Code:
<?php
$filename 
"F:\\\\dhi-internet\\\\search\\\\Ghostgum\\\\pstotxt\\\\pstotxt3";
echo 
"Does parse pdf exist: " file_exists($filename);
?>
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 02-17-2004, 06:23 AM   #12
jayhawk
Green Mole
 
Join Date: Feb 2004
Posts: 13
I fixed the path problem, but it is still not indexing the pdfs. Here is what it displays:

SITE : http://dhi-internet/
Exclude paths :
- @NONE@


Is result test http an array: 1
What is result test http status: HTML



Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: F:\dhi-internet\search\pdftotext\pdftotext.exe
Does parse pdf exist: 1
1:http://dhi-internet/
(time : 00:00:08)
+ + + + + Error: Couldn't open file '.txt' Error: Couldn't open file '.txt'
level 1...


Is result test http an array: 1
What is result test http status: HTML



Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: F:\dhi-internet\search\pdftotext\pdftotext.exe
Does parse pdf exist: 1
2:http://dhi-internet/index.php?=PHPB8...9-4C7B08C10000
(time : 00:00:20)


Is result test http an array: 1
What is result test http status: PDF



Is result test an array: 1
What is result test status: PDF
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: F:\dhi-internet\search\pdftotext\pdftotext.exe
Does parse pdf exist: 1
Hello PDFRValue 13:http://dhi-internet/docs/seanresume0204.pdf
(time : 00:00:25)


Is result test http an array: 1
What is result test http status: PDF



Is result test an array: 1
What is result test status: PDF
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: F:\dhi-internet\search\pdftotext\pdftotext.exe
Does parse pdf exist: 1
Hello PDFRValue 14:http://dhi-internet/test/acobook.pdf
(time : 00:00:59)


Is result test http an array: 1
What is result test http status: HTML



Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: F:\dhi-internet\search\pdftotext\pdftotext.exe
Does parse pdf exist: 1
5:http://dhi-internet/test/regs.html
(time : 00:01:07)



Is result test http an array: 1
What is result test http status: HTML



Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: F:\dhi-internet\search\pdftotext\pdftotext.exe
Does parse pdf exist: 1
6:http://dhi-internet/test/dhi.html
(time : 00:01:14)

No link in temporary table
jayhawk is offline   Reply With Quote
Old 02-17-2004, 11:10 AM   #13
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Do you have the following in the config file?

define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','F:\\dhi-internet\\search\\pdftotext\\pdftotext');
define('PHPDIG_OPTION_PDF','');
define('PHPDIG_PDF_EXTENSION','.txt');
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 02-18-2004, 05:13 AM   #14
jayhawk
Green Mole
 
Join Date: Feb 2004
Posts: 13
I got this to work finally. Thanks for all of your help!
jayhawk is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
searching PDF files bcunico External Binaries 3 02-24-2006 01:40 AM
problem with .pdf and .doc files mleray External Binaries 11 12-09-2004 10:26 PM
How to index a directory with pdf files simonced How-to Forum 3 02-13-2004 10:41 AM
Add PDF files to be indexed chazter External Binaries 4 10-07-2003 06:43 AM
Search PDF files chazter External Binaries 4 10-02-2003 06:47 AM


All times are GMT -8. The time now is 08:47 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.