![]() |
An easy way to boost PhpDig ?
Hi,
Since 2 days I try to dig 20.000 html pages on my database that already contains close to 6.000.000 records in phpdig_engine table. Unfortunately ... it last a very very long time (it could last more than 1 minute to dig a unique html file as it took around 3 seconds 2 weeks ago !!!). After several investigation concerning my system (XP) my database limits (Mysql), my folder size (20.000 html files) ... I've found the solution. I hope it could help somebody else. Using time-tracker function I've discover that the time consuming code is the "Optimizing phase" of the spider.php file (PhpDig V1.8.0). As a result ... just comment this 4 lines, integrate another optimizing process your own way (every 5000 digs for example) and enjoy with your new boosted Phpdig. === Code to comment in spider.php //print "Optimizing tables...".$br; //@mysql_query("OPTIMIZE TABLE ".PHPDIG_DB_PREFIX."spider",$id_connect); @mysql_query("OPTIMIZE TABLE ".PHPDIG_DB_PREFIX."engine",$id_connect); @mysql_query("OPTIMIZE TABLE ".PHPDIG_DB_PREFIX."keywords",$id_connect); Remarks: I'm only using PhpDig for inserting new html files. There is no update, no delete. By the way the original PhpDig optimization phase is less important for me. Regards. tibabs. |
Ooopsss ... I've miss to comment 2 lines in my post.
=== Code to comment in spider.php //print "Optimizing tables...".$br; //@mysql_query("OPTIMIZE TABLE ".PHPDIG_DB_PREFIX."spider",$id_connect); //@mysql_query("OPTIMIZE TABLE ".PHPDIG_DB_PREFIX."engine",$id_connect); //@mysql_query("OPTIMIZE TABLE ".PHPDIG_DB_PREFIX."keywords",$id_connect); Regards, tibabs. |
I have the same problem.
I've started indexing many sites and noticed a slowing down of the engine.My engine table contains 1/2 million records. Every page indexing lasts several seconds (up to 1 min) But your solution is strange because optimization is made only once at the end of site spidring. Any suggestion? |
Hi,
On my way I'm only indexing differents html pages that are not necessary linked together. It seems, as you said that the optimization phase is done only once ... but ... What I can suggest to you Sol #1) Try to comment the optimization phase and have a look to the result ==> 5 minutes Sol #2) Use the phpdigTimer class to profile the source or use other profiling functions such as http://www.pear.php.net/package/Benchmark. ==> 1 hour I thing that you can quick discover where from is coming the trouble quite quickly (1 hour) and afterwords to fix it. Regards, Thierry |
Quote:
we have different problems. I have to index large web sites, you have to index many single pages. Optimization is run at the end of each spidering even if indexing a single page. Maybe it's possible to add a flag in the config.php to disable automatic optimization and run it manually from the admin page. If you do that hack post it here ;) |
All times are GMT -8. The time now is 11:57 PM. |
Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.