![]() |
Problem with PDF indexing
Hi, I'm using PhpDig v.1.8.7
Indexing of PDFs via document specific URLs in the Admin Command line Interface works fine. Problem very similiar to this one I checked your external binaries checklist Everything is as you suggest, except that I'm running PHP 4.2.3. For PHP 4.2.3. you link to a post on this topic but the link doesn't work. I've added the source code debug changes you suggested to robot_functions.php and spider.php and have included a section below for a page that refers to many PDF documents. It's as if the crawler doesn't find the PDF files which are referred/linked to in each of the pages. Quote:
thanks. |
Hi. For broken links like http://www.phpdig.net/showthread.php?threadid=570 try adding what's in bold like http://www.phpdig.net/forum/showthread.php?threadid=570 to the link. The forum moved from the main directory to the forum subdirectory, but not all links got updated.
To try and index the PDFs at http://www.nhs.vic.edu.au/index.php?id=40 open the config file and set LIMIT_TO_DIRECTORY to false, PHPDIG_IN_DOMAIN to true, and then stick the link in the PhpDig admin panel text box, set search depth to a large number, links per to zero, and choose the no option. |
Fixed via correct href path
Problem fixed...
It turns out that the embedded page hyperlinks to the PDFs that I was attempting to index were invalid. However, ever browser known to man seemed to compensate for the invalid path, so I never picked up the error (until now). The phpDig crawler didn't compensate for the error (No surprise really). The incorrect relative path from the root level was: ../content/docs/newsletter/newsletter501.pdf The relative path from the root level should have been either: ./content/docs/newsletter/newsletter501.pdf or content/docs/newsletter/newsletter501.pdf Thanks for your help. Now on to MS Word Documents..... :) |
All times are GMT -8. The time now is 04:35 AM. |
Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.