View Single Post
Old 02-18-2005, 04:23 PM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
This is an improvement to PhpDig v.1.8.8 RC1 for keeping (quasi)duplicates out of the engine table.

In robot_functions.php find:
Code:
  $key = mb_ereg_replace("^([\x00-\x1f]|[\x21-\x2f]|[\x3a-\x40]|[\x5b-\x60]|[\x7b-\x7f])+","",$key); //off front only
  $key = mb_ereg_replace("([\x00-\x1f]|[\x21-\x2f]|[\x3a-\x40]|[\x5b-\x60]|[\x7b-\x7f])+$","",$key); //off back only
And delete these two lines.

Also, in robot_functions.php find:
Code:
  for ($token = strtok($text2, $separators); $token !== FALSE; $token = strtok($separators)) {
        if (!isset($nbre_mots[$token]))
            { $nbre_mots[$token] = 1; }
        else
            { $nbre_mots[$token]++; }
       $total++;
  }
And replace with:
Code:
  for ($token = strtok($text2, $separators); $token !== FALSE; $token = strtok($separators)) {
        $token = mb_ereg_replace("^([\x00-\x1f]|[\x21-\x2f]|[\x3a-\x40]|[\x5b-\x60]|[\x7b-\x7f])+","",$token); //off front only
        $token = mb_ereg_replace("([\x00-\x1f]|[\x21-\x2f]|[\x3a-\x40]|[\x5b-\x60]|[\x7b-\x7f])+$","",$token); //off back only
        $token = mb_strtolower(trim($token));
        if (mb_strlen($token) > 0) {
          if (!isset($nbre_mots[$token]))
              { $nbre_mots[$token] = 1; }
          else
              { $nbre_mots[$token]++; }
          $total++;
        }
  }
Now run the following queries, adding your table prefix to engine and engine2 if needed:
Code:
CREATE TABLE engine2 (
   spider_id mediumint(9) DEFAULT '0' NOT NULL,
   key_id mediumint(9) DEFAULT '0' NOT NULL,
   weight smallint(4) DEFAULT '0' NOT NULL,
   KEY key_id (key_id)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE utf8_general_ci;

INSERT INTO engine2 SELECT spider_id,key_id,sum(weight) AS weight FROM engine GROUP BY spider_id,key_id;

DELETE FROM engine;

INSERT INTO engine SELECT spider_id,key_id,weight FROM engine2;

DROP TABLE engine2;
If you downloaded PhpDig v.1.8.8 RC1 after the date of this post, the code changes are already included in the package.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline