View Single Post
Old 11-01-2005, 03:35 PM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Quote:
1°) I think indexed +- 2500 sites, do you think it is realist or it isn't possible with PHPDIG ?
I have not had 2500 sites indexed at one time, but check out this thread for some numbers.

Quote:
2°) If it is possible, how many days will you put, first to reindex a site, me I would put 30 days and you ?
For the online demo, I leave LIMIT_DAYS at zero, but for a 'real' site I think 30 days is fine. As the number of sites grows, you'll of course want to consider what and when to index.

Quote:
3°) In order to do something "realist", what do you think, with

Number Level Dept : 20
Link for each DEPT : 50 ?? much or less ?
I tried to illimited link but it took too much time to index.
The maximum pages found per site is ((depth * links) + 1) when links is greater than zero, so just think about how many pages per site you would like to find, and then set depth and links accordingly.

Quote:
4°) A problem I don't find the answer, when it is spidering, crawling, can I put a new link ,or have I to wait that it stops crawling ?
It would be better to wait until the crawling is complete, as PhpDig locks when indexing to let you know it is busy.

Quote:
5°) Is ist possible to have more than one spider whit shell command , what I have to do ?
Having more than one spider at a time would still use the same tables and slow the process down, but there is a thread here about multiple spiders.

Quote:
6°) I have a big problem when he is spidering forums, he always find 100 links yet indexed, one link new, after 100 links yet indexed, one link new...etc... what can I do for that, it spend a lot of time for nothing ?
Does 'duplicate of an existing document' appear onscreen? If so, use PHPDIG_SESSID_VAR in the config file, especially for links that contain session IDs.

Quote:
7°) When using shell command are all the informations in the config file are using by shell spider ?
All of the index related settings in the config file are used when indexing from shell, except for RESPIDER_LIMIT and RELINKS_LIMIT and maybe a couple of others.

BTW, your English is fine.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote