PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Mod Requests

Reply
 
Thread Tools
Old 12-16-2003, 07:28 AM   #1
JÿGius³
Green Mole
 
Join Date: Oct 2003
Posts: 17
pages number limited indexing

Hi people.

When I index a web site I'd like to limit the max number of pages to index per site .
For example I would index only 20 pages on site A, 100 on site B and so on.
This can be useful to limit indexing of huge web sites. Do you agree?

Best regards.

JÿGius³
JÿGius³ is offline   Reply With Quote
Old 12-16-2003, 09:56 AM   #2
Rolandks
Purple Mole
 
Rolandks's Avatar
 
Join Date: Sep 2003
Location: Kassel, Germany
Posts: 119
Sorry, i don't agree this - for what ? The user search words and the word is on page 21 - but this is not index.

Why would you index parts of a Site limit by Pages ?

-Roland-
Rolandks is offline   Reply With Quote
Old 12-16-2003, 10:14 AM   #3
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. I haven't tested the below but what it should do is limit the number of links found per page to a max of 20, where each indexed page will only have a max of 20 links to follow. This is a per page rather than per site adjustment, so if you want to have a max of 100 links for one site, then you'll need to adjust the below added line and/or set a different search depth level.

In spider.php find the following:
PHP Code:
if (isset($urls) && is_array($urls)) { 
and right after it place the following:
PHP Code:
$my_spider_limit 20;
if(
count($urls) > $my_spider_limit) {
   
$urls array_slice($urls0$my_spider_limit);

You might be able to achieve similar results without making the above change by setting the search depth level to one. When the search depth level is one, only the page and links from that page are indexed. Of course this depends on how many links are in the page, so if you use the above code, you should be able to limit the links found on any given page to the first $my_spider_limit links.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 01-08-2004, 05:56 PM   #4
bloodjelly
Purple Mole
 
Join Date: Dec 2003
Posts: 106
Another thing that seems to be working for me, and limits the total number of linked pages written the the database is to find the line:
PHP Code:
 while($level <= $limit
(line 250) and change it to
PHP Code:
while($level <= $limit && count($links_found) <= 200
where 200 is the number of links you want written. The site might stay locked this way, though, and you'd need to move the unlock code (line 588).
bloodjelly is offline   Reply With Quote
Old 01-13-2004, 11:17 AM   #5
bloodjelly
Purple Mole
 
Join Date: Dec 2003
Posts: 106
Whoops this works for finding 250 links total, but if you want 250 links per site you have to reset $links_found array. So, after this:
PHP Code:
if (!$n_links && $delay_message) {
     print 
$delay_message;} 
add this:
PHP Code:
unset($links_found); 
$links_found = array(); 
Also, the site doesn't stay locked, just in my particular case in a glitch, and the <= should be changed to a < or you'll get 201 pages found.
bloodjelly is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Show number of indexed pages on homepage Freddie Mod Submissions 5 01-18-2005 04:36 PM
Number of pages indexed claudiomet How-to Forum 0 08-30-2004 02:26 PM
Set time limit on spider.php or number of pages paullind Troubleshooting 1 05-01-2004 07:25 AM
Limit number of spidered pages Not Logged In How-to Forum 5 12-16-2003 03:03 PM
Country limited searching. sid Mod Requests 0 12-15-2003 12:15 PM


All times are GMT -8. The time now is 07:43 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.