PDA

View Full Version : Optimized search (around 60% faster)


synnalagma
06-04-2004, 10:20 AM
Hi,

I have build a search engine for my job (I couldn't use phpdig since set_time_limit and allow_url_fopen where not available on the server) but I took a lot of idea from PHPDig.

I found some great optimization to do. The idea is to do all the job on the Database since it's faster than php for handling big amount of data.

exact phrase is not implemented (now) so it just use the original phpdig search function.

There're also some things that change, not because of the optimization but because of different way to think search :

There're different search type (you can specify it as a last parameter of other_phpdigSearch):
- Exact only search for exact (same) words are counted as 100% word
- Normal is exact and words that have one or two letters more are counted as 80% of a word
- Fuzzy is Normal and words that soundex like the searched one are counted as 40% of a word


so here it is for PHPDig, all you need to do is

- put other_search.php where you put your search.php file (and don't forget to change the SEARCH_PAGE in your config file to other_search.php). If you have changed it or didn't use this one just take a look at other_search.php only few lines were added.
- put other_search_function.php and class.queryparser.php in the phpdig's libs directory.

That's all !

Here are some benchmark done to compare

Searching for two words on a site containing 3500 documents the both give back around 2100 results
PHPDig 0.55 seconds (average)
Other 0.20 seconds (average)
So here's a more complete exemple for PHPDig :
Mark Value
All 1.330961943 s.
All Backend 0.423772812 s.
Parsing Strings 0.002659082 s.
Spider Queries 0.221775055 s.
Spider Fills 0.108201981 s.
Reorder Results 0.090033054 s.
All Display 0.907027960 s.
Result Table 0.107897997 s.
Display Queries 0.021100283 s.
Extracts 0.069858074 s.
Final Strings 0.000780106 s.
Logs 0.004404068 s.
Template Parsing 0.783947945 s.

And here for other one
Mark Value
All 0.257488012 s.
All Backend 0.104629993 s.
Parsing Strings 0.013200045 s.
Spider Queries 0.091212034 s.
All Display 0.152703047 s.
Result Table 0.089221001 s.
Display Queries 0.010927677 s.
Extracts 0.070713758 s.
Final Strings 0.000638008 s.
Logs 0.000868082 s.
Template Parsing 0.060055017 s.


I don't know why there's a difference in Template parsing it's the same and I didn't touch the PHPDig search file.... Maybe it's because of memory issue.

If you have some benchmark on bigger site I would be interested ;)

The big difference you must see it's in Backend (where we perform the search).

Ask if you encounter any trouble and/or have suggestions, questions....

Note that more optimization can be done with MySQL version greater than 4.00 and speed it up much more.

Rolandks
06-04-2004, 07:16 PM
PHPDig 0.55 seconds I think that is fast enough :)

Tell, me, how long do you or phpdig need to spider sites containing 3500 documents :D I hope anyone spend some time to found some great optimization for this :rolleyes:

-roland-

synnalagma
06-05-2004, 03:53 AM
I think that is fast enough

I don't think so since this results where on my personnal computer and not on the server. So there's only one query at a time and only one site. If you're on a shared server your friends will be happier with this.

If you put more than two words the difference will be bigger.

But i'm agree with you, spidering process is too long.

synnalagma
06-06-2004, 02:18 AM
Results numbers are wrong this is just a small mistake in the code

Just comment this line (around line 193) :
$n=$n_start;

It will fix this problem.

sktest
06-10-2004, 01:05 AM
Hi,

thank you, but why you don't habe insert the "NUMBER_OF_RESULTS_PER_SITE" - function in your new search script?

synnalagma
06-10-2004, 04:40 AM
Hi,

You're right I totally forgot about this NUMBER_OF_RESULTS_PER_SITE thing since I don't use it. So now it's included

I've also made some change :
- You can search with unix wildcards ( * and ? )
- You can show wich word where searched for
- If one word isn't found now it ignore it (for and condition) and propose another word

Installation is always the same except for one thing you can specify some parameter in other_search_function.php (beginning of file).
If you want to show wich words where searched (but's that's more like a debug function) change the define to the number of words to show.

If you want to allow or disallow wildcard search you can do it here also

PS : charter can you remove the first version please.

sktest
06-10-2004, 04:47 AM
thx synnalagma