PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > How-to Forum

Reply
 
Thread Tools
Old 12-19-2004, 03:27 AM   #1
baskamer
Green Mole
 
Join Date: Feb 2004
Posts: 9
garbage collection

hello all!

i think i asked this before and saw at-least one other (unanswered) post on this topic, i never got an satisfying reply. I currently have a working solution, but i don't believe it is very elegant.

If you have pages removed from a site, search results can still point to those page, even after the site is respidered. This is not a situation we want. A brute force solution is to truncate most phpdig tables just before respidering. Allthough this does work it is not ideal (any searches done at that time will result in incomplete search results). This is acceptable for me, because the indexing is done at night and only takes about 2.5 hours. (±1000 pages), but if your site (or your isp slower) is much bigger this could become a problem.

Now my question: is there a build-in phpdig way to approach this problem. Is there any situation in which phpdig removes old spidered pages that should be considered garbage? I image it should work the following way: phpdig to spider a complete site every now and then, only indexing those that have the 'LIMIT_DAYS' configurable constant set. Then it should remove all pages from the indexes that are *not* in the temp table (why is this not a mysql TEMPORARY table anyway?).

This is just my idea of how this could work and it is probably is incorrect with the phpdig way, but i would really like someone to explain what the *best practise* way of doing this is...


thanks

bas
baskamer is offline   Reply With Quote
Old 12-19-2004, 09:28 AM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Do you use the delete icon? Do you run the clean options? Maybe you want to edit the phpdigDelSpiderRow function, or the if-else conditional thereto, so things work like you want?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
urls with collection of weird characters revenazb Troubleshooting 6 01-10-2005 01:09 AM
Catdoc garbage Hoek External Binaries 3 02-23-2004 01:57 PM


All times are GMT -8. The time now is 04:09 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.