PDA

View Full Version : Optimization


synnalagma
06-03-2004, 12:36 AM
Hi,

This isn't really a mod request but more like a proposition because a really big part of it is already done.

For my work I needed a search engine and I can't use PHPDig for several reason but the most important one was that I don't have the right to use exec family function and settimelimit.

So I decided to do my own search engine wich can progressivly index content (the website is based on a portal called e-xoops) so there's an admin panel and I can index content when I know there's new content (and not reindexing whole site).

Anyway, my code is based on PHPDig one but there're some big differences.

I found some good optimization, I think, and I find quite normal to give something back. I explain it here so if you want to integrate it, do it. If you want me to do it, there's no problem, I'll do it (but I won't if you don't care so that's why there's this post).

So here's the optimization, the idea is to use only one SQL query to do all the search job and also the ordering job.

First you look for your words in the database (I take exemple for exact search but it's also possible for normal/exact/fuzzy):
you search for words exact :

"SELECT id,word FROM keyword WHERE word='$word'"

You do this for each word separatly and put each results in an array then you do :


$sql="SELECT SQL_BIG_RESULT $tengine.site_id, SUM((CASE ";
foreach($words as $w){
$sql.="WHEN word_id IN(".implode(',',$word_id_array[$w]).") THEN "
if(is_excluded($w)){
$sql.="-500000";
}
else{
$sql.="weight";
}
}
$sql.=")) AS sw"


if you have an "and condition" you add this

$sql.=",BIT_OR( CASE ";
$andnumber=1;
$andtotal=0;
foreach($words as $w){
if(!is_excluded($w)){
$andnumber<<=1;
$total|=$andnumber;
$sql.="WHEN word_id IN(".implode(',',$word_id_array[$w]).") THEN ".$andnumber;
}
}
$sql.=") AS bo FROM engine WHERE word_id IN(".implode(',',all_word_id).") GROUP BY site_id HAVING bo=$total AND sw>0 ORDER BY sw DESC";

The idea is if all words are present then bo must be equal to total. With this you can also have more complex conditions like (word1 AND word2) OR word3 if you generate all valid bo that can satisfy your condition (I'm actually thinking on how to do it).

With this query you will have all your results (valid one) already ordered. This speed up quite well the search, since MySQL is faster on this than php.
On a slow computer difference where 4 seconds (1.5 for mysql and 2.5 for php) to search for 3000 results (on 4000 documents) with old method (my method's not PHPDig's one) and 0.6 seconds with this one. So there's quite a difference


I hope you will understand it, I have a poor english, french would be better, feel free to contact me about any questions and so on... You can reach me synnalagma AT ezwww dot ch