View Full Version : Indexing duplicate descriptions and keywords causing false search results

04-19-2004, 08:53 PM
I am working on a search engine for a niche market. Therefore, I am indexing multiple Web sites. Unfortunately, some Web developers use the same description and keywords on every page in their site. This causes the search engine to return false results as pages may meet search criteria via the description and/or keywords but do not contain any content relevant to the search. The only exception to this would be the default page for the site as the description and keywords indicate the content for the site rather than for the page.

When indexing a site, I would like to have the spider compare the description and the keywords on the default page against a description and keywords on a second page and if they are the same, not index the description or the keywords in pages other than the default page.

In addition to greatly increasing relevancy, this would also decrease the size of the database somewhat and allow the search engine to return results a bit faster.

This issue has caused me to stop all indexing as the spider is retrieving so much useless content due to the poor design of so many Web sites.

Any advice would be greatly appreciated.

04-20-2004, 05:25 AM
Hi. I'm not sure comparing meta description and keyword tags across pages within a site would be an efficacious process. Rather, it might be better to just exclude such tag information as shown in this (http://www.phpdig.net/showthread.php?threadid=555) thread.

04-23-2004, 12:31 PM
I am indexing multiple Web sites that other Web developers have created. In sites where the Web developer has provided a description and/or keywords on each page that are related to that page, I want to include the keywords and and description in the index. It is only when a site being indexed was not created with keywords or descriptions that are related to the current page that I want to not index the keywords and the description.

Some times the keyword and description are of value. Sometimes they are not. I would like to implement logic to evaluate when to index this information and when not to. Thus, the value of the indexed information with be greatly improved.

05-04-2004, 09:27 AM
Once the spider makes an initial check, it could set a field in the sites table and then include or not include the desciption and/or keywords each time it indexes the site based on the value of the field. It could also check the setting avery so many indexes of the site after x number of indexes. Ignoring the description and keywords of all sites means not taking advantage of their value. Not ignoring the description and/or keywords when they are duplicated throughout the site means users will get bogus search results. So, there needs to be some way to handle this issue.