PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   External Binaries (http://www.phpdig.net/forum/forumdisplay.php?f=36)
-   -   Not indexing pdf files (http://www.phpdig.net/forum/showthread.php?t=515)

jayhawk 02-12-2004 10:23 AM

Not indexing pdf files
 
I am using pdftotext to index my pdf files. It converts the pdf to a txt file. I can do this successfully from the command prompt. However, when I try to index my site with phpdig it does not index the txt file. I have the following set in my config.php file:

define('PHPDIG_PDF_EXTENSION','.txt');

define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','F:/internet/search/pdftotext/pdftotext');
define('PHPDIG_OPTION_PDF','');

Any suggesions?

jayhawk 02-13-2004 07:13 AM

Any ideas? Anyone?

:bang:

tomas 02-13-2004 09:57 AM

hi jayhhawk,

define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','/path/to/your/pdftotext');
define('PHPDIG_OPTION_PDF','');

//---------EXTERNAL TOOLS EXTENSIONS
define('PHPDIG_PDF_EXTENSION','.txt');

this settings shold work - please make shure that here:
define('PHPDIG_OPTION_PDF','');
after the comma there are two single quotes!

hope this helps :-)
tomas

jayhawk 02-13-2004 10:50 AM

They are two single quotes. I have been trying to track down the problem. One thing to note is that when I index the site it lists the url for the pdf, but it does not have a green checkmark next to it. Does that provide any clues to the problem I am having?

Charter 02-14-2004 11:43 AM

Hi. Perhaps check that the permissions are 755 for the directories to pdftotext and also for the pdftotext file.

jayhawk 02-16-2004 12:58 PM

Permissions are full control (just to see if I can get it to work). Still no luck.

Charter 02-16-2004 01:48 PM

Hi. What version of PHP are you running? Perhaps you are experiencing the same problem as in this thread.

jayhawk 02-16-2004 02:09 PM

I'm running PHP version 4.3.4.

Charter 02-16-2004 02:18 PM

Hi. Try echoing out the statements like was done in this thread. What do you get?

jayhawk 02-16-2004 02:43 PM

Here is what I get:


Is result test http an array: 1
What is result test http status: HTML



Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: F:\dhi-internet\search\Ghostgum\pstotxt\pstotxt3
Does parse pdf exist:
1:http://dhi-internet/
(time : 00:00:08)
+ + + + +
2: http://dhi-internet/ Was recently indexed
(time : 00:00:14)

3: http://dhi-internet/ Was recently indexed
(time : 00:00:19)

4: http://dhi-internet/ Was recently indexed
(time : 00:00:24)

level 1...


Is result test http an array: 1
What is result test http status: HTML



Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: F:\dhi-internet\search\Ghostgum\pstotxt\pstotxt3
Does parse pdf exist:
5:http://dhi-internet/index.php?=PHPB8...9-4C7B08C10000
(time : 00:00:35)


Is result test http an array: 1
What is result test http status: PDF



Is result test an array: 1
What is result test status: PDF
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: F:\dhi-internet\search\Ghostgum\pstotxt\pstotxt3
Does parse pdf exist:
6:http://dhi-internet/test/acobook.pdf
(time : 00:00:40)


Is result test http an array: 1
What is result test http status: PDF



Is result test an array: 1
What is result test status: PDF
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: F:\dhi-internet\search\Ghostgum\pstotxt\pstotxt3
Does parse pdf exist:
7:http://dhi-internet/docs/seanresume0204.pdf
(time : 00:00:45)


Is result test http an array: 1
What is result test http status: HTML



Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: F:\dhi-internet\search\Ghostgum\pstotxt\pstotxt3
Does parse pdf exist:
8:http://dhi-internet/test/regs.html
(time : 00:00:53)



Is result test http an array: 1
What is result test http status: HTML



Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: F:\dhi-internet\search\Ghostgum\pstotxt\pstotxt3
Does parse pdf exist:
9:http://dhi-internet/test/dhi.html
(time : 00:01:00)

No link in temporary table

Charter 02-16-2004 03:16 PM

Hi. It looks like the following is not returning a value:
PHP Code:

echo "Does parse pdf exist: " file_exists(PHPDIG_PARSE_PDF) . "<br>"

Try setting different paths in the following code, run it from the browser, and then use the path that produces "Does parse pdf exist: 1" onscreen.
PHP Code:

<?php
$filename 
"F:\\\\dhi-internet\\\\search\\\\Ghostgum\\\\pstotxt\\\\pstotxt3";
echo 
"Does parse pdf exist: " file_exists($filename);
?>


jayhawk 02-17-2004 06:23 AM

I fixed the path problem, but it is still not indexing the pdfs. Here is what it displays:

SITE : http://dhi-internet/
Exclude paths :
- @NONE@


Is result test http an array: 1
What is result test http status: HTML



Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: F:\dhi-internet\search\pdftotext\pdftotext.exe
Does parse pdf exist: 1
1:http://dhi-internet/
(time : 00:00:08)
+ + + + + Error: Couldn't open file '.txt' Error: Couldn't open file '.txt'
level 1...


Is result test http an array: 1
What is result test http status: HTML



Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: F:\dhi-internet\search\pdftotext\pdftotext.exe
Does parse pdf exist: 1
2:http://dhi-internet/index.php?=PHPB8...9-4C7B08C10000
(time : 00:00:20)


Is result test http an array: 1
What is result test http status: PDF



Is result test an array: 1
What is result test status: PDF
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: F:\dhi-internet\search\pdftotext\pdftotext.exe
Does parse pdf exist: 1
Hello PDFRValue 13:http://dhi-internet/docs/seanresume0204.pdf
(time : 00:00:25)


Is result test http an array: 1
What is result test http status: PDF



Is result test an array: 1
What is result test status: PDF
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: F:\dhi-internet\search\pdftotext\pdftotext.exe
Does parse pdf exist: 1
Hello PDFRValue 14:http://dhi-internet/test/acobook.pdf
(time : 00:00:59)


Is result test http an array: 1
What is result test http status: HTML



Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: F:\dhi-internet\search\pdftotext\pdftotext.exe
Does parse pdf exist: 1
5:http://dhi-internet/test/regs.html
(time : 00:01:07)



Is result test http an array: 1
What is result test http status: HTML



Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: F:\dhi-internet\search\pdftotext\pdftotext.exe
Does parse pdf exist: 1
6:http://dhi-internet/test/dhi.html
(time : 00:01:14)

No link in temporary table

Charter 02-17-2004 11:10 AM

Hi. Do you have the following in the config file?

define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','F:\\dhi-internet\\search\\pdftotext\\pdftotext');
define('PHPDIG_OPTION_PDF','');
define('PHPDIG_PDF_EXTENSION','.txt');

jayhawk 02-18-2004 05:13 AM

I got this to work finally. Thanks for all of your help!:D


All times are GMT -8. The time now is 06:51 PM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.