PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 01-17-2004, 09:03 AM   #1
emcclary
Green Mole
 
Join Date: Dec 2003
Posts: 4
Indexing Directories

Hi,

I'm trying to spider my website and need to index a lot of pages.
(easily over 120,0000) most of the spidering is done but it's just getting longer and longer to index. The site is static (a newspaper archive) and added to daily. The pages are broke down like this:

www.foo.com/years/xxxx<-(being the year)/(the issues in this format 0112 <--January 12th /

I've told phpdig to spider the site by typing out the year url - for example - www.foo.com/years/2003/

the problems are the a) it always show's up as indexed site www.foo.com in the control panel - not the individual years
and B) It always wants to look thru all the previous years to index

Do I have to actually create sub domains (i.e. 1996.foo.com) to have seperate directories indexed or is there some other way.

I basically want to make a static search database and don't need to reindex anything but the current days additions. Thanks in advance if you have any ideas.

Eric McClary
www.recordernews.com
emcclary is offline   Reply With Quote
Old 01-18-2004, 07:39 AM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. What happens when you click a site, click the update button, and then click a green check mark for a specific directory?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 01-28-2004, 05:07 PM   #3
emcclary
Green Mole
 
Join Date: Dec 2003
Posts: 4
Sorry about the late reply,

I can't even open that screen (it's too big) I run out of virtual memory on my computer (besides the minimum 15 minutes to open.
Like I said the site is HUUUGGGEEE.

Any way, I'm playing around with a robots.txt file but that doenst seem to work. Even though I told it to exclude all it still seems to take a look at the ones I already did.

So a couple of questions:
A) Whats the excludes table - can I place parts I don't want reindexed in this table?
B) Is there a way to make it not recheck the stuff I already did?
C) Last but not least, the only solution I can see is multple installs of phpdig (each with there own database) of course I don't like this answer and If I did this is there a way to have phpdig still search through these databases and give one result page?

I know I'm asking alot but I'm hoping there is a solution to searching my Huge archaic website.

Thanks

Eric McClary
www.recordernews.com
emcclary is offline   Reply With Quote
Old 01-28-2004, 05:11 PM   #4
emcclary
Green Mole
 
Join Date: Dec 2003
Posts: 4
Also On a quick note - how about I modify all the update field to some time in the far future (like 2080 or something) would that make them skip checking them (i.e. does it only look at items by current date)?

Thanks Again
Eric

Last edited by emcclary; 01-28-2004 at 05:57 PM.
emcclary is offline   Reply With Quote
Old 01-28-2004, 07:12 PM   #5
emcclary
Green Mole
 
Join Date: Dec 2003
Posts: 4
Just tried using a txt file (via command line) same problem - updates all (at least checks) I just want to add to the database not update the database.
__________________
Eric McClary
www.recordernews.com
emcclary is offline   Reply With Quote
Old 01-29-2004, 06:28 PM   #6
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Perhaps increase LIMIT_DAYS in the config file. Also, you might try version 1.8.0 and a text file via command line, making sure tempspider is empty between runs and SPIDER_MAX_LIMIT, SPIDER_DEFAULT_LIMIT, and RESPIDER_LIMIT are all set to zero in the config file so that just the one page gets indexed, no links are followed.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Indexing sub directories mlisondra How-to Forum 0 02-22-2008 06:55 AM
Spider Indexing and htaccess directories webmaster_k Troubleshooting 0 10-01-2007 10:50 AM
Indexing new directories bugmenot How-to Forum 1 03-28-2006 03:33 AM
indexing directories iconeweb Troubleshooting 1 12-04-2005 01:27 AM
Not Indexing Sub-Directories jayhawk Troubleshooting 3 02-11-2004 02:41 PM


All times are GMT -8. The time now is 11:16 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.