PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   Troubleshooting (http://www.phpdig.net/forum/forumdisplay.php?f=22)
-   -   Spider indexes cgi pages but not its links!? (http://www.phpdig.net/forum/showthread.php?t=1820)

WebSpider 02-07-2005 07:01 AM

Spider indexes cgi pages but not its links!?
 
Hi!

When I run the spider on a site www.domain.com that hosts several pages in the form of www.domain.com/cgi-bin/whatever... or cgi-bin.domain.com/whatever.... I can't find those links on the database nor the search results, but checking the most common keywords gives me as 1st place the cgi-bin.domain.com keyword.

What's the deal?

How do i make it to add the cgi-*.* links to the database for that particular domain?


Also, is there any difference between indexing http://www.domain.com and http://domain.com? Will I get duplicate pages onto the db?

Thanks!

Charter 02-07-2005 09:59 AM

The links domain.com, www.domain.com, and sub.domain.com are considered different. Try setting PHPDIG_IN_DOMAIN to true in the config file.

WebSpider 02-08-2005 02:18 PM

But then, why search results do not provide any cgi.domain.com result BUT will index it as keywords? how do i remove those false keywords and rerun the spider so it will pick cgi. as a url and not as a keyword?

Charter 02-08-2005 06:04 PM

Stored keywords and indexed links are two different things. If the text cgi.domain.com appears in a page, it is stored as a keyword, regardless of whether the link cgi.domain.com is actually indexed. If you are using PhpDig v.1.8.7, and don't want the text cgi.domain.com stored as a keyword, then edit BANNED in the config file.


All times are GMT -8. The time now is 08:56 PM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.