PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 07-29-2005, 08:23 AM   #1
kzant
Green Mole
 
Join Date: Jan 2005
Posts: 6
Documents disappear

I have two sites on which I'm experiencing the same issue. Both sites contain a large number of PDF and DOC files - one site has 300, the other about 500. Every time a new document is added to the site, I manually add that single document to the index.

However, it appears that certain documents are not coming up in searches after a period of time. If I just re-submit the document, it says that its already there, of course.

But if I delete all documents and rebuild the entire index the documents will show up again. Then they stop being returned on searches after a period of time.

I am at a loss as to why this is happening; any advice is appreciated.
kzant is offline   Reply With Quote
Old 07-29-2005, 08:34 AM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
What version of PhpDig are you using? Do you use a cron job or the PhpDig admin panel?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 07-29-2005, 08:37 AM   #3
kzant
Green Mole
 
Join Date: Jan 2005
Posts: 6
v.1.8.7
Admin panel. I could never get the chron to work correctly.
kzant is offline   Reply With Quote
Old 07-29-2005, 09:11 AM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Do you run any of the "cleans" prior to experiencing this issue?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 07-29-2005, 09:16 AM   #5
kzant
Green Mole
 
Join Date: Jan 2005
Posts: 6
No - should I?
kzant is offline   Reply With Quote
Old 07-30-2005, 06:51 AM   #6
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
No, you do not have to run the "cleans." What about space; are you running out of space? Maybe adding a document is wiping a previous document, or when you reindex, do all (new and old) documents show up in the search results?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 07-30-2005, 07:11 AM   #7
kzant
Green Mole
 
Join Date: Jan 2005
Posts: 6
Space isn't an issue. Each document is uniquely named -- I trap for that to ensure nothing is getting wiped out.

The documents still exist in the spider table. The keywords still exist in the keywords table. But the connection between the two disappears from engine.

So, when I try to resubmit the document, it says its already there. But its not coming up on searches as the keyword connection is gone. But, if I delete that document from the system (using admin interface) and re-submit it, then its fine.

But of course, I can't tell what's been axed and whats okay when I get hollered at, so I wipe out the whole thing and re-index the whole thing again. And then that seems to make things better.

Of course, I'd like to preserve the original index. But if there is something going on that precludes that, can you suggest a way I could re-index the site (300/500 docs) w/o my intervention? Something I could run nightly that wouldn't timeout?

I really appreciate any advice. this has happened a few times and I really don't like the testy calls from passive aggressive clients.
kzant is offline   Reply With Quote
Old 07-30-2005, 07:26 AM   #8
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Is CONTENT_TEXT set to 1 in the config file? If so, is there ever a case where a TXT file in the TEXT_CONTENT_PATH directory is manually removed? The text files in the TEXT_CONTENT_PATH directory are named spider_id.txt (spider_id is a number from the spider table). For a cron job, do as in the documentation, and also make the change shown in this thread.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
spider documents without extensions jguert External Binaries 0 08-17-2006 07:39 AM
How to scan XML documents batman1056 How-to Forum 1 05-19-2005 07:34 AM
Textual content of indexed documents Dreamory How-to Forum 2 10-25-2004 07:50 AM
Spidering a directory - timeout after 10 documents tams Troubleshooting 2 03-15-2004 10:31 AM
Duplicate Documents Problem... vonbrocklin Troubleshooting 3 11-25-2003 01:16 PM


All times are GMT -8. The time now is 07:53 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.