Phantom
07-24-2005, 03:47 AM
Hi, I'm using PhpDig v.1.8.7
Indexing of PDFs via document specific URLs in the Admin Command line Interface works fine.
Problem very similiar to this one (http://www.phpdig.net/forum/showthread.php?t=1860)
I checked your external binaries checklist (http://www.phpdig.net/forum/showthread.php?t=799)
Everything is as you suggest, except that I'm running PHP 4.2.3.
For PHP 4.2.3. you link to a post on this topic (http://www.phpdig.net/showthread.php?threadid=570) but the link doesn't work.
I've added the source code debug changes you suggested to robot_functions.php and spider.php and have included a section below for a page that refers to many PDF documents. It's as if the crawler doesn't find the PDF files which are referred/linked to in each of the pages.
phpdigTestUrl(http://www.nhs.vic.edu.au/system/style.css) Parse content-type header : text : css
phpdigTestUrl(http://www.nhs.vic.edu.au/system/printer.php?id=38) Parse content-type header : text : html
+
phpdigTestUrl(http://www.nhs.vic.edu.au/index.php?id=40) Parse content-type header : text : html
Is result test http an array: 1
What is result test http status: HTML
Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 1
Index the pdf is set to: 1
Parse the pdf is set to: c:\newnhsweb\system\cms\phpdig\pdftotext\pdftotext.exe
Does parse pdf exist: 1
Is parse pdf executable: 1
42:http://www.nhs.vic.edu.au/index.php?id=40
(time : 00:04:52)
thanks.
Indexing of PDFs via document specific URLs in the Admin Command line Interface works fine.
Problem very similiar to this one (http://www.phpdig.net/forum/showthread.php?t=1860)
I checked your external binaries checklist (http://www.phpdig.net/forum/showthread.php?t=799)
Everything is as you suggest, except that I'm running PHP 4.2.3.
For PHP 4.2.3. you link to a post on this topic (http://www.phpdig.net/showthread.php?threadid=570) but the link doesn't work.
I've added the source code debug changes you suggested to robot_functions.php and spider.php and have included a section below for a page that refers to many PDF documents. It's as if the crawler doesn't find the PDF files which are referred/linked to in each of the pages.
phpdigTestUrl(http://www.nhs.vic.edu.au/system/style.css) Parse content-type header : text : css
phpdigTestUrl(http://www.nhs.vic.edu.au/system/printer.php?id=38) Parse content-type header : text : html
+
phpdigTestUrl(http://www.nhs.vic.edu.au/index.php?id=40) Parse content-type header : text : html
Is result test http an array: 1
What is result test http status: HTML
Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 1
Index the pdf is set to: 1
Parse the pdf is set to: c:\newnhsweb\system\cms\phpdig\pdftotext\pdftotext.exe
Does parse pdf exist: 1
Is parse pdf executable: 1
42:http://www.nhs.vic.edu.au/index.php?id=40
(time : 00:04:52)
thanks.