PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > How-to Forum

Reply
 
Thread Tools
Old 01-28-2004, 12:55 PM   #1
lennybruce22000
Green Mole
 
Join Date: Jan 2004
Posts: 1
force dircotory to be spidered as new site

Hi,

For sites foo.com/french and foo.com/english, I would like the option of searching only the french dir, or only the english. How can I force phpdig to spider foo.com/english as a seperate site_id, so that I can address site in a search request, thus returning only results from there. Thanks.
lennybruce22000 is offline   Reply With Quote
Old 01-29-2004, 06:43 PM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. You might try setting up foo.com/french and foo.com/english as french.foo.com and english.foo.com.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 01-29-2004, 11:29 PM   #3
renehaentjens
Orange Mole
 
Join Date: Nov 2003
Posts: 69
Earlier I did experiments and concluded the following:
- if you ask PhpDig to index from some URL such as foo.com/french/index.htm
- if all hyperlinks from that page and the linked ones are limited to foo.com/french (i.e. there is no cross-link to english)
- then PhpDig will only index foo.com/french

Just yesterday I tried again and for some reason that I do not yet understand, PhpDig found out about "english" while indexing "french". I have to find out why, because the whole philosophy for my site and its indexing counts on the results of my first experiments, i.e. you can indeed index only a single branch of a site, provided that you are careful with where hyperlinks point to.

Charter, a few comments would be welcome.
__________________
René Haentjens, Ghent University
renehaentjens is offline   Reply With Quote
Old 01-30-2004, 12:01 PM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Your philosophy sounds fine if everything is separate like you say. One thing to check is that the tempspider table is empty between runs. Maybe there was something 'english' in that table leftover from a previous index?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 02-02-2004, 05:02 AM   #5
renehaentjens
Orange Mole
 
Join Date: Nov 2003
Posts: 69
That was indeed the problem: tempspider was not empty. Now all works fine again. Thanks!
__________________
René Haentjens, Ghent University
renehaentjens is offline   Reply With Quote
Old 02-09-2004, 01:28 AM   #6
renehaentjens
Orange Mole
 
Join Date: Nov 2003
Posts: 69
Is there a case when tempspider needs to keep data in between runs? If not, might I suggest that it would be cleaned automatically at the start of each run?

In my site I would like to delegate manual indexing and re-indexing to an administrator, so give him/her access to the admin interface, but not necessarily direct access to the DB...
__________________
René Haentjens, Ghent University
renehaentjens is offline   Reply With Quote
Old 02-09-2004, 09:36 AM   #7
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
>> Is there a case when tempspider needs to keep data in between runs? If not, might I suggest that it would be cleaned automatically at the start of each run?

Hi. There may be cases where someone stops the spider and wants to resume, indexing what is stored in the tempspider table.

>> In my site I would like to delegate manual indexing and re-indexing to an administrator, so give him/her access to the admin interface, but not necessarily direct access to the DB...

With PhpDig version 1.8.0+ click the delete button in the admin interface, without selecting a site, to empty the tempspider table.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Number of sites locked/being spidered CentaurAtlas Mod Submissions 0 12-09-2006 02:38 AM
Manually set title for spidered page bvr How-to Forum 3 11-22-2005 12:45 PM
javascript menu not being spidered vuurvos Troubleshooting 1 10-05-2005 06:50 AM
Too many results per site spidered and redirections paullind How-to Forum 1 01-24-2004 09:04 PM
Limit number of spidered pages Not Logged In How-to Forum 5 12-16-2003 03:03 PM


All times are GMT -8. The time now is 02:05 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.