View Single Post
Old 07-24-2005, 03:47 AM   #1
Phantom
Green Mole
 
Phantom's Avatar
 
Join Date: Jul 2005
Location: Melbourne, Australia
Posts: 2
Problem with PDF indexing

Hi, I'm using PhpDig v.1.8.7

Indexing of PDFs via document specific URLs in the Admin Command line Interface works fine.

Problem very similiar to this one

I checked your external binaries checklist

Everything is as you suggest, except that I'm running PHP 4.2.3.
For PHP 4.2.3. you link to a post on this topic but the link doesn't work.

I've added the source code debug changes you suggested to robot_functions.php and spider.php and have included a section below for a page that refers to many PDF documents. It's as if the crawler doesn't find the PDF files which are referred/linked to in each of the pages.

Quote:
phpdigTestUrl(http://www.nhs.vic.edu.au/system/style.css) Parse content-type header : text : css

phpdigTestUrl(http://www.nhs.vic.edu.au/system/printer.php?id=38) Parse content-type header : text : html
+

phpdigTestUrl(http://www.nhs.vic.edu.au/index.php?id=40) Parse content-type header : text : html


Is result test http an array: 1
What is result test http status: HTML

Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 1
Index the pdf is set to: 1
Parse the pdf is set to: c:\newnhsweb\system\cms\phpdig\pdftotext\pdftotext.exe
Does parse pdf exist: 1
Is parse pdf executable: 1
42:http://www.nhs.vic.edu.au/index.php?id=40
(time : 00:04:52)

thanks.
Phantom is offline   Reply With Quote