View Single Post
Old 06-04-2004, 09:20 AM   #1
synnalagma
Green Mole
 
Join Date: Mar 2004
Posts: 22
Optimized search (around 60% faster)

Hi,

I have build a search engine for my job (I couldn't use phpdig since set_time_limit and allow_url_fopen where not available on the server) but I took a lot of idea from PHPDig.

I found some great optimization to do. The idea is to do all the job on the Database since it's faster than php for handling big amount of data.

exact phrase is not implemented (now) so it just use the original phpdig search function.

There're also some things that change, not because of the optimization but because of different way to think search :

There're different search type (you can specify it as a last parameter of other_phpdigSearch):
- Exact only search for exact (same) words are counted as 100% word
- Normal is exact and words that have one or two letters more are counted as 80% of a word
- Fuzzy is Normal and words that soundex like the searched one are counted as 40% of a word


so here it is for PHPDig, all you need to do is

- put other_search.php where you put your search.php file (and don't forget to change the SEARCH_PAGE in your config file to other_search.php). If you have changed it or didn't use this one just take a look at other_search.php only few lines were added.
- put other_search_function.php and class.queryparser.php in the phpdig's libs directory.

That's all !

Here are some benchmark done to compare

Searching for two words on a site containing 3500 documents the both give back around 2100 results
PHPDig 0.55 seconds (average)
Other 0.20 seconds (average)
So here's a more complete exemple for PHPDig :
Mark Value
All 1.330961943 s.
All Backend 0.423772812 s.
Parsing Strings 0.002659082 s.
Spider Queries 0.221775055 s.
Spider Fills 0.108201981 s.
Reorder Results 0.090033054 s.
All Display 0.907027960 s.
Result Table 0.107897997 s.
Display Queries 0.021100283 s.
Extracts 0.069858074 s.
Final Strings 0.000780106 s.
Logs 0.004404068 s.
Template Parsing 0.783947945 s.

And here for other one
Mark Value
All 0.257488012 s.
All Backend 0.104629993 s.
Parsing Strings 0.013200045 s.
Spider Queries 0.091212034 s.
All Display 0.152703047 s.
Result Table 0.089221001 s.
Display Queries 0.010927677 s.
Extracts 0.070713758 s.
Final Strings 0.000638008 s.
Logs 0.000868082 s.
Template Parsing 0.060055017 s.


I don't know why there's a difference in Template parsing it's the same and I didn't touch the PHPDig search file.... Maybe it's because of memory issue.

If you have some benchmark on bigger site I would be interested

The big difference you must see it's in Backend (where we perform the search).

Ask if you encounter any trouble and/or have suggestions, questions....

Note that more optimization can be done with MySQL version greater than 4.00 and speed it up much more.
Attached Files
File Type: zip phpdig_search.zip (8.4 KB, 105 views)

Last edited by synnalagma; 06-04-2004 at 09:27 AM.
synnalagma is offline   Reply With Quote