freak
05-26-2004, 09:07 PM
hello there,
I'm having problems indexing pdfs. I already read most of the post in here and didn't find where is the problem. :confused:
I'm using Apache 2.0.45 + PHP 4.3.6 running on Windows2k SP4. I just downloaded xpdf-3.00-win32 and extract the pdftotext.exe file.
This is my config file:
define('USE_IS_EXECUTABLE_COMMAND','0');
...
define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','C:\\Apache Group\\Apache2\\htdocs\\phpdig\\bin\\pdftotext.exe');
define('PHPDIG_OPTION_PDF','');
...
define('PHPDIG_PDF_EXTENSION','.txt');
and this what i got when i try to index a local site with one page that has only one link to a pdf file.
I put the extracode for debugging...
--------------------------------------------------------------------------------
SITE : http://ivan02/
Exclude paths :
- @NONE@
Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: C:\Apache Group\Apache2\htdocs\phpdig\bin\pdftotext.exe
Does parse pdf exist: 1
1:http://ivan02/test/
(time : 00:00:06)
+
level 1...
Is result test an array: 1
What is result test status: PDF
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: C:\Apache Group\Apache2\htdocs\phpdig\bin\pdftotext.exe
Does parse pdf exist: 1
Command is :C:\Apache Group\Apache2\htdocs\phpdig\bin\pdftotext.exe ../admin/temp/61216332.tmp
Result contains: Array ( )
Return value is: 1
2:http://ivan02/test/proy01.pdf
(time : 00:00:16)
No link in temporary table
--------------------------------------------------------------------------------
links found : 2
http://ivan02/test/
http://ivan02/test/proy01.pdf
Optimizing tables...
Indexing complete !
--------------------------------------------------------------------------------
The spider find the pdf file but doesn't extract any content from it. Also there is no marks before the link number 2. I mean there is no "good mark" and no "bad mark".
Could somebody please help me? I just don't know what's going on here..
Thanks!
PS: Please excuse my english!
I'm having problems indexing pdfs. I already read most of the post in here and didn't find where is the problem. :confused:
I'm using Apache 2.0.45 + PHP 4.3.6 running on Windows2k SP4. I just downloaded xpdf-3.00-win32 and extract the pdftotext.exe file.
This is my config file:
define('USE_IS_EXECUTABLE_COMMAND','0');
...
define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','C:\\Apache Group\\Apache2\\htdocs\\phpdig\\bin\\pdftotext.exe');
define('PHPDIG_OPTION_PDF','');
...
define('PHPDIG_PDF_EXTENSION','.txt');
and this what i got when i try to index a local site with one page that has only one link to a pdf file.
I put the extracode for debugging...
--------------------------------------------------------------------------------
SITE : http://ivan02/
Exclude paths :
- @NONE@
Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: C:\Apache Group\Apache2\htdocs\phpdig\bin\pdftotext.exe
Does parse pdf exist: 1
1:http://ivan02/test/
(time : 00:00:06)
+
level 1...
Is result test an array: 1
What is result test status: PDF
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: C:\Apache Group\Apache2\htdocs\phpdig\bin\pdftotext.exe
Does parse pdf exist: 1
Command is :C:\Apache Group\Apache2\htdocs\phpdig\bin\pdftotext.exe ../admin/temp/61216332.tmp
Result contains: Array ( )
Return value is: 1
2:http://ivan02/test/proy01.pdf
(time : 00:00:16)
No link in temporary table
--------------------------------------------------------------------------------
links found : 2
http://ivan02/test/
http://ivan02/test/proy01.pdf
Optimizing tables...
Indexing complete !
--------------------------------------------------------------------------------
The spider find the pdf file but doesn't extract any content from it. Also there is no marks before the link number 2. I mean there is no "good mark" and no "bad mark".
Could somebody please help me? I just don't know what's going on here..
Thanks!
PS: Please excuse my english!