Does PHPDig ignore <base href...?
I have used the BASE HREF=directive in my dynamic site so that pages that appear to be in subfolders (but actually aren't) can point to external images and css files correctly.
This is correct as far as HTML goes and gives no trouble in any tested browsers. However PHPDig seems to ignore this setting. If a page that appears to be in a folder called 'news' links to index.html in the root the link will read 'href='index.html' instead of '../index.html'. The base href tag tells the browser to calculate any realtive URLs fron the root rather than from the current folder (which in my case doesn't exist) The result of this is that PHPDig finds multiple copies of each page. It thinks that index.html is in a subfolder of news and thus spiders a complete duplicate of the whole site. Up till now I have been using exclusions to get round this but this requires a lot of manual fiddling every time the site is changed. Is there a solution or is it a bug in PHPDig? |
Hi. PhpDig looks for links that match the following regex and then processes those links via the phpdigRewriteUrl function.
PHP Code:
Code:
<HTML> |
Wouldn't it be fairly simple to check the <head> for the existence of a BASE tag and prefix any relative URLs with that instead of the current path?
Is it worth posting this to the suggestions forum? |
Hi. In robot_functions.php is a function called phpdigExplore.
In this function, replace the following: PHP Code:
PHP Code:
Code:
<HTML> Code:
<BASE HREF="http://www.domain.com/file.html"> |
Fantastic! Thanks...
|
PHPDig rocks
I looked forever for an open source site search done in php and tried several without much success. PHPDig has worked well, but I was having similar problems to the one mentioned above. With Charter's modified statement, I seem to be complaint free.
Nice program and stellar support. Thanks to all! |
All times are GMT -8. The time now is 03:53 AM. |
Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.