View Full Version : pages number limited indexing

12-16-2003, 08:28 AM
Hi people.

When I index a web site I'd like to limit the max number of pages to index per site :bang: .
For example I would index only 20 pages on site A, 100 on site B and so on.
This can be useful to limit indexing of huge web sites. Do you agree?

Best regards.


12-16-2003, 10:56 AM
Sorry, i don't agree this - for what ? The user search words and the word is on page 21 - but this is not index.

Why would you index parts of a Site limit by Pages ?


12-16-2003, 11:14 AM
Hi. I haven't tested the below but what it should do is limit the number of links found per page to a max of 20, where each indexed page will only have a max of 20 links to follow. This is a per page rather than per site adjustment, so if you want to have a max of 100 links for one site, then you'll need to adjust the below added line and/or set a different search depth level.

In spider.php find the following:

if (isset($urls) && is_array($urls)) {

and right after it place the following:

$my_spider_limit = 20;
if(count($urls) > $my_spider_limit) {
$urls = array_slice($urls, 0, $my_spider_limit);

You might be able to achieve similar results without making the above change by setting the search depth level to one. When the search depth level is one, only the page and links from that page are indexed. Of course this depends on how many links are in the page, so if you use the above code, you should be able to limit the links found on any given page to the first $my_spider_limit links.

01-08-2004, 06:56 PM
Another thing that seems to be working for me, and limits the total number of linked pages written the the database is to find the line: while($level <= $limit) (line 250) and change it to while($level <= $limit && count($links_found) <= 200) where 200 is the number of links you want written. The site might stay locked this way, though, and you'd need to move the unlock code (line 588).

01-13-2004, 12:17 PM
Whoops this works for finding 250 links total, but if you want 250 links per site you have to reset $links_found array. So, after this: if (!$n_links && $delay_message) {
print $delay_message;} add this: unset($links_found);
$links_found = array();
Also, the site doesn't stay locked, just in my particular case in a glitch, and the <= should be changed to a < or you'll get 201 pages found.