PDA

View Full Version : PhpDig Status?


mokele
09-20-2006, 04:47 PM
Is Phpdig still being devleoped? Any new versions coming?

Dave A
09-20-2006, 08:27 PM
That is a question I can't answer, I am developing my running version and I would wonder if it could be classed as phpdig still? We have two databases and the software I am using is now very different to when it was first started.
I would like to see phpdig be developed further but that would be upto Charter who presently looks after it.

mokele
09-21-2006, 04:24 PM
Did you base your code off of PHPDig? If so it would be a variant of PHPDig. PHPDig works well though some things could be better done and some fixes needed, I've thought about getting into the code and updating, not sure if I really have the time. Using the software for two client sites. So you going to put your version out to the public as open source or commerical? Just stumbled across Interactive Tools Website Search Engine, http://www.interactivetools.com/products/searchengine/features.html, for $49.00, may look into that as another possiblity.

Dave A
09-21-2006, 06:43 PM
Yes my software was based on PHPDIG, it worked very well indeed but the sheer size of our index started to cause problems.
When we had indexed around 350,000 pages the searches started to slow down a bit. We doubled the memory and that lasted for a while and worked.
Then other minor things started to happen, like when we had around ten users on line searching, some results were a little slow and some were unrelated to the searched for query.
Then we started to work out ways of using an algorythm to filter the results and using one database didn't appear to allow us to do this, so in the end we did a rewrite and started to use two databases.
Then we changed the server over and brought into play a second server which passed users to the least busy server to keep the speeds up.
Now we are running five dedicated servers, we added stemming to it's code and this then allowed us to start to increase our index size.
At the moment we have just over one hundred and ten million pages indexed and search times are quite good.
Since phpdig was started and released as GPL I feel it wouldn't be fair to start to supply it as a commercial venture and I am not sure as to the position we are in, to release the code because it may breach copyright laws and open us up to problems.
It took around five months to code the content related algorythm that we now use and this is still being fine tuned all the time.
We don't have the sheer size to work on a ranking sytem based on links, so we are happy to continue down the content related results, that uses a word meaning/related word and simlar meaning word system.
So if a search is done for say "Wool" the software looks at simlar meanings like sheep, knitting, rugs and finds the content that may relate the most to the query being searched for.
One thing we find quite funny is that our user agent is still set as PHPDIG we did for a while change it to Linknzbot but in the end we went back to the original, otherwise a simple search for our user agents name showed the results of the sites we had indexed from the stats files and so folks could see the way we were growing in size.
Some SEO folks thought it funny we were running PHPDIG and so they decided that We couldn't be a serious search engine! Phpdig was when first designed, written to be used for a single website or a small intranet.
The fact that we now index sites wordwide didn't make people start to wonder how we could be doing this?
When we first started to use PHPDIG, Charter was brilliant with the support and help that he offered and I feel that it may have started to cost him money to keep it going and to try and develop it further.

The time he must have spent helpng people and adapting the code I would imagine, started to drain his finances, the small amount he asked for to help him support PHPDIG wasn't enough to pay for the cost of the hosting for the forum without considering his time.
I would like to see PHPDIG continue to be developed, changes to the webspider enable it to consume less bandwidth and to multitask much better, we can now index upto sixty websites at the same time with no problems.

I am more than willing to enable our code to be used by others, one idea I did start to consider, was that if we had a series of users around the world all using the same system and software, then at some point we could all start to share each others databases which would produce a rather large search engine.
So if anyone has any ideas about developing PHPDIG further then I am all ears.

Heaps of regards
Dave A

Dave A
09-22-2006, 12:12 AM
Seems to be simlar to perlfect search but it will only search one domain, the one it's installed on.

http://interactivetools.com/iforum/Products_C2/Search_Engine_F8/Searching_on_more_than_one_server._Is_it_possible_P8995/

Re: [Tipking] Searching on more than one server. Is it possible? [In reply to] Can't Post

--------------------------------------------------------------------------------

Well... since we're on the topic of other search engines...
This one won't to my knowledge search more than one domain but it will search larger sites quite quickly. The interactivetools.com SE works great for up to about 10,000 pages, but starts to bog down after that, so I spent quite some time searching for an alternative. This is the result:
http://www.dropbears.com/cgi-bin/perlfect/search/search.pl?p=1&lang=en&exclude=&penalty=0&include=&q=Picasso
The link to the software site is at the bottom of the page.

joe
10-13-2006, 11:19 PM
the current version is perfect but the advance technologies are always appreciated.

Dave A
10-13-2006, 11:30 PM
Well I am busying myself away at writing addons to my version. We noe use two databases instead of one, we have automatic optimisation software that keeps the database in the best condition.
Last night it dipped out and went from sixteen gig to four meg in around a second and we had a play with it and bingo it's working again, start to finish in five minutes.
The trouble is I am not sure the position we shand regarding GPL licence. What we are using now is very different to the code out the box. We have an algorythm that is changable.