PDA

View Full Version : Spider indexes cgi pages but not its links!?


WebSpider
02-07-2005, 07:01 AM
Hi!

When I run the spider on a site www.domain.com that hosts several pages in the form of www.domain.com/cgi-bin/whatever... or cgi-bin.domain.com/whatever.... I can't find those links on the database nor the search results, but checking the most common keywords gives me as 1st place the cgi-bin.domain.com keyword.

What's the deal?

How do i make it to add the cgi-*.* links to the database for that particular domain?


Also, is there any difference between indexing http://www.domain.com and http://domain.com? Will I get duplicate pages onto the db?

Thanks!

Charter
02-07-2005, 09:59 AM
The links domain.com, www.domain.com, and sub.domain.com are considered different. Try setting PHPDIG_IN_DOMAIN to true in the config file.

WebSpider
02-08-2005, 02:18 PM
But then, why search results do not provide any cgi.domain.com result BUT will index it as keywords? how do i remove those false keywords and rerun the spider so it will pick cgi. as a url and not as a keyword?

Charter
02-08-2005, 06:04 PM
Stored keywords and indexed links are two different things. If the text cgi.domain.com appears in a page, it is stored as a keyword, regardless of whether the link cgi.domain.com is actually indexed. If you are using PhpDig v.1.8.7, and don't want the text cgi.domain.com stored as a keyword, then edit BANNED in the config file.