PDA

View Full Version : force dircotory to be spidered as new site


lennybruce22000
01-28-2004, 12:55 PM
Hi,

For sites foo.com/french and foo.com/english, I would like the option of searching only the french dir, or only the english. How can I force phpdig to spider foo.com/english as a seperate site_id, so that I can address site in a search request, thus returning only results from there. Thanks.

Charter
01-29-2004, 06:43 PM
Hi. You might try setting up foo.com/french and foo.com/english as french.foo.com and english.foo.com.

renehaentjens
01-29-2004, 11:29 PM
Earlier I did experiments and concluded the following:
- if you ask PhpDig to index from some URL such as foo.com/french/index.htm
- if all hyperlinks from that page and the linked ones are limited to foo.com/french (i.e. there is no cross-link to english)
- then PhpDig will only index foo.com/french

Just yesterday I tried again and for some reason that I do not yet understand, PhpDig found out about "english" while indexing "french". I have to find out why, because the whole philosophy for my site and its indexing counts on the results of my first experiments, i.e. you can indeed index only a single branch of a site, provided that you are careful with where hyperlinks point to.

Charter, a few comments would be welcome.

Charter
01-30-2004, 12:01 PM
Hi. Your philosophy sounds fine if everything is separate like you say. One thing to check is that the tempspider table is empty between runs. Maybe there was something 'english' in that table leftover from a previous index?

renehaentjens
02-02-2004, 05:02 AM
That was indeed the problem: tempspider was not empty. Now all works fine again. Thanks!

renehaentjens
02-09-2004, 01:28 AM
Is there a case when tempspider needs to keep data in between runs? If not, might I suggest that it would be cleaned automatically at the start of each run?

In my site I would like to delegate manual indexing and re-indexing to an administrator, so give him/her access to the admin interface, but not necessarily direct access to the DB...

Charter
02-09-2004, 09:36 AM
>> Is there a case when tempspider needs to keep data in between runs? If not, might I suggest that it would be cleaned automatically at the start of each run?

Hi. There may be cases where someone stops the spider and wants to resume, indexing what is stored in the tempspider table.

>> In my site I would like to delegate manual indexing and re-indexing to an administrator, so give him/her access to the admin interface, but not necessarily direct access to the DB...

With PhpDig version 1.8.0+ click the delete button in the admin interface, without selecting a site, to empty the tempspider table.