![]() |
|
![]() |
#1 |
Green Mole
Join Date: Jan 2004
Posts: 1
|
force dircotory to be spidered as new site
Hi,
For sites foo.com/french and foo.com/english, I would like the option of searching only the french dir, or only the english. How can I force phpdig to spider foo.com/english as a seperate site_id, so that I can address site in a search request, thus returning only results from there. Thanks. |
![]() |
![]() |
![]() |
#2 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. You might try setting up foo.com/french and foo.com/english as french.foo.com and english.foo.com.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
![]() |
![]() |
![]() |
#3 |
Orange Mole
Join Date: Nov 2003
Posts: 69
|
Earlier I did experiments and concluded the following:
- if you ask PhpDig to index from some URL such as foo.com/french/index.htm - if all hyperlinks from that page and the linked ones are limited to foo.com/french (i.e. there is no cross-link to english) - then PhpDig will only index foo.com/french Just yesterday I tried again and for some reason that I do not yet understand, PhpDig found out about "english" while indexing "french". I have to find out why, because the whole philosophy for my site and its indexing counts on the results of my first experiments, i.e. you can indeed index only a single branch of a site, provided that you are careful with where hyperlinks point to. Charter, a few comments would be welcome.
__________________
René Haentjens, Ghent University |
![]() |
![]() |
![]() |
#4 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Your philosophy sounds fine if everything is separate like you say. One thing to check is that the tempspider table is empty between runs. Maybe there was something 'english' in that table leftover from a previous index?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
![]() |
![]() |
![]() |
#5 |
Orange Mole
Join Date: Nov 2003
Posts: 69
|
That was indeed the problem: tempspider was not empty. Now all works fine again. Thanks!
__________________
René Haentjens, Ghent University |
![]() |
![]() |
![]() |
#6 |
Orange Mole
Join Date: Nov 2003
Posts: 69
|
Is there a case when tempspider needs to keep data in between runs? If not, might I suggest that it would be cleaned automatically at the start of each run?
In my site I would like to delegate manual indexing and re-indexing to an administrator, so give him/her access to the admin interface, but not necessarily direct access to the DB...
__________________
René Haentjens, Ghent University |
![]() |
![]() |
![]() |
#7 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
>> Is there a case when tempspider needs to keep data in between runs? If not, might I suggest that it would be cleaned automatically at the start of each run?
Hi. There may be cases where someone stops the spider and wants to resume, indexing what is stored in the tempspider table. >> In my site I would like to delegate manual indexing and re-indexing to an administrator, so give him/her access to the admin interface, but not necessarily direct access to the DB... With PhpDig version 1.8.0+ click the delete button in the admin interface, without selecting a site, to empty the tempspider table.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
![]() |
![]() |
![]() |
Thread Tools | |
|
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Number of sites locked/being spidered | CentaurAtlas | Mod Submissions | 0 | 12-09-2006 02:38 AM |
Manually set title for spidered page | bvr | How-to Forum | 3 | 11-22-2005 12:45 PM |
javascript menu not being spidered | vuurvos | Troubleshooting | 1 | 10-05-2005 06:50 AM |
Too many results per site spidered and redirections | paullind | How-to Forum | 1 | 01-24-2004 09:04 PM |
Limit number of spidered pages | Not Logged In | How-to Forum | 5 | 12-16-2003 03:03 PM |