PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 09-30-2004, 10:34 AM   #16
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Three things I can think of...

1) The links may not match the regex for links. Search for ([a-z]{3,5}://) in the robot_functions.php file to find two regex for links.

2) Some of the pages you are trying to crawl are encoded windows-1251 but the search results look to be using iso-8859-1 instead.

3) Some of the pages are using a whole lot of HTML entities instead of an encoding. PhpDig currently support windows-1251 for Cyrillic.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 09-30-2004, 12:18 PM   #17
Fking
Green Mole
 
Join Date: Sep 2004
Posts: 22
i also think that the problem is related with the pages encoding....



what i can do in order to make them spiderable?
Fking is offline   Reply With Quote
Old 09-30-2004, 12:36 PM   #18
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Use links that match the regex, encode pages using windows-1251, set define('PHPDIG_ENCODING','windows-1251'); in the config file.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump


All times are GMT -8. The time now is 07:56 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.