Thread: PhpDig Status?
View Single Post
Old 09-21-2006, 05:43 PM   #4
Dave A
Purple Mole
 
Dave A's Avatar
 
Join Date: Aug 2004
Location: North Island New Zealand
Posts: 170
PHPDIG Options.

Yes my software was based on PHPDIG, it worked very well indeed but the sheer size of our index started to cause problems.
When we had indexed around 350,000 pages the searches started to slow down a bit. We doubled the memory and that lasted for a while and worked.
Then other minor things started to happen, like when we had around ten users on line searching, some results were a little slow and some were unrelated to the searched for query.
Then we started to work out ways of using an algorythm to filter the results and using one database didn't appear to allow us to do this, so in the end we did a rewrite and started to use two databases.
Then we changed the server over and brought into play a second server which passed users to the least busy server to keep the speeds up.
Now we are running five dedicated servers, we added stemming to it's code and this then allowed us to start to increase our index size.
At the moment we have just over one hundred and ten million pages indexed and search times are quite good.
Since phpdig was started and released as GPL I feel it wouldn't be fair to start to supply it as a commercial venture and I am not sure as to the position we are in, to release the code because it may breach copyright laws and open us up to problems.
It took around five months to code the content related algorythm that we now use and this is still being fine tuned all the time.
We don't have the sheer size to work on a ranking sytem based on links, so we are happy to continue down the content related results, that uses a word meaning/related word and simlar meaning word system.
So if a search is done for say "Wool" the software looks at simlar meanings like sheep, knitting, rugs and finds the content that may relate the most to the query being searched for.
One thing we find quite funny is that our user agent is still set as PHPDIG we did for a while change it to Linknzbot but in the end we went back to the original, otherwise a simple search for our user agents name showed the results of the sites we had indexed from the stats files and so folks could see the way we were growing in size.
Some SEO folks thought it funny we were running PHPDIG and so they decided that We couldn't be a serious search engine! Phpdig was when first designed, written to be used for a single website or a small intranet.
The fact that we now index sites wordwide didn't make people start to wonder how we could be doing this?
When we first started to use PHPDIG, Charter was brilliant with the support and help that he offered and I feel that it may have started to cost him money to keep it going and to try and develop it further.

The time he must have spent helpng people and adapting the code I would imagine, started to drain his finances, the small amount he asked for to help him support PHPDIG wasn't enough to pay for the cost of the hosting for the forum without considering his time.
I would like to see PHPDIG continue to be developed, changes to the webspider enable it to consume less bandwidth and to multitask much better, we can now index upto sixty websites at the same time with no problems.

I am more than willing to enable our code to be used by others, one idea I did start to consider, was that if we had a series of users around the world all using the same system and software, then at some point we could all start to share each others databases which would produce a rather large search engine.
So if anyone has any ideas about developing PHPDIG further then I am all ears.

Heaps of regards
Dave A
Dave A is offline   Reply With Quote