PDA

View Full Version : An easy way to boost PhpDig ?


tibabs
02-22-2004, 06:31 PM
Hi,

Since 2 days I try to dig 20.000 html pages on my database that already contains close to 6.000.000 records in phpdig_engine table.
Unfortunately ... it last a very very long time (it could last more than 1 minute to dig a unique html file as it took around 3 seconds 2 weeks ago !!!).
After several investigation concerning my system (XP) my database limits (Mysql), my folder size (20.000 html files) ... I've found the solution.
I hope it could help somebody else.

Using time-tracker function I've discover that the time consuming code is the "Optimizing phase" of the spider.php file (PhpDig V1.8.0). As a result ... just comment this 4 lines, integrate another optimizing process your own way (every 5000 digs for example) and enjoy with your new boosted Phpdig.

=== Code to comment in spider.php
//print "Optimizing tables...".$br;
//@mysql_query("OPTIMIZE TABLE ".PHPDIG_DB_PREFIX."spider",$id_connect);
@mysql_query("OPTIMIZE TABLE ".PHPDIG_DB_PREFIX."engine",$id_connect);
@mysql_query("OPTIMIZE TABLE ".PHPDIG_DB_PREFIX."keywords",$id_connect);


Remarks: I'm only using PhpDig for inserting new html files. There is no update, no delete. By the way the original PhpDig optimization phase is less important for me.

Regards.
tibabs.

tibabs
02-22-2004, 06:33 PM
Ooopsss ... I've miss to comment 2 lines in my post.

=== Code to comment in spider.php
//print "Optimizing tables...".$br;
//@mysql_query("OPTIMIZE TABLE ".PHPDIG_DB_PREFIX."spider",$id_connect);
//@mysql_query("OPTIMIZE TABLE ".PHPDIG_DB_PREFIX."engine",$id_connect);
//@mysql_query("OPTIMIZE TABLE ".PHPDIG_DB_PREFIX."keywords",$id_connect);


Regards,
tibabs.

alivin70
02-25-2004, 06:33 AM
I have the same problem.
I've started indexing many sites and noticed a slowing down of the engine.My engine table contains 1/2 million records.
Every page indexing lasts several seconds (up to 1 min)

But your solution is strange because optimization is made only once at the end of site spidring.

Any suggestion?

tibabs
02-25-2004, 05:24 PM
Hi,


On my way I'm only indexing differents html pages that are not necessary linked together.
It seems, as you said that the optimization phase is done only once ... but ...

What I can suggest to you
Sol #1) Try to comment the optimization phase and have a look to the result ==> 5 minutes
Sol #2) Use the phpdigTimer class to profile the source or use other profiling functions such as http://www.pear.php.net/package/Benchmark. ==> 1 hour

I thing that you can quick discover where from is coming the trouble quite quickly (1 hour) and afterwords to fix it.


Regards,
Thierry

alivin70
02-26-2004, 12:41 AM
Originally posted by tibabs
Hi,


On my way I'm only indexing differents html pages that are not necessary linked together.
It seems, as you said that the optimization phase is done only once ... but ...

What I can suggest to you
Sol #1) Try to comment the optimization phase and have a look to the result ==> 5 minutes
Sol #2) Use the phpdigTimer class to profile the source or use other profiling functions such as http://www.pear.php.net/package/Benchmark. ==> 1 hour

I thing that you can quick discover where from is coming the trouble quite quickly (1 hour) and afterwords to fix it.


Regards,
Thierry
I see,
we have different problems.
I have to index large web sites, you have to index many single pages.

Optimization is run at the end of each spidering even if indexing a single page.

Maybe it's possible to add a flag in the config.php to disable automatic optimization and run it manually from the admin page.

If you do that hack post it here ;)