View Single Post
Old 02-02-2005, 02:35 PM   #3
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Q: I see "exclude Paths" when I start the spider. How do I set that? and what does it do?

A: Use a robots.txt file or exclude content from the admin panel. It excludes content from index.

Q: How can I tell if one of those time out errors occurs?

A: Check that safe_mode is off, review your server error logs, or ask your host if the process is killed.

Q: Does it hurt anything if I just stop the spider in the middle of things? are my pages still correctly processed if I do that?

A: No, not if you use the stop spider link. Yes, for documents already indexed.

Q: I see a list of words that I can "purge" should I add more words? Does adding words make my database less useful for searches?

A: Add more words if you want. It depends on the words you add.

Q: How do I get my site reindexed after I have made the first pass?

A: Use the admin panel text box or spider from shell.

Q: Should I run the delete processes before I restarting the indexing?

A: If you want, but it is not necessary.

Q: You mention a parameter that will prevent "early" reindexing, what value is it set to and how can I change it?

A: It is set to zero. Look for define('LIMIT_DAYS',0); in the config file, or set revisit-after META tags.

Q: On the update page you say depth trumps links, does that mean values 0, 0 will do my entire site?

A: No, set LIMIT_TO_DIRECTORY to false, choose no, set a large search depth, set links per to zero.

Q: what does "No link in temporary table" mean? Is this a good thing? Or a bad thing?

A: The tempspider table is empty. It is good.

Q: What does: "Duplicate of an existing document" Mean?

A: The document looks like an already indexed document.

Q: Could this "dynamic" behavior I have been seeing be because the pages I have are highly dynamic and constantly change parts of the content with some factors controlled by a r****mly generated value?

A: No, the values in the update sites table are currently being used.

Q: Is the "spider" crawler identifiable? In other words, when it asks for a page, can I detect that it is the spider and not a normal query?

A: Yes, review your server access logs when indexing your site.

Q: Would changing the Primary key to be filename + file title be a better index.

A: No, primary keys must be unique.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote