PDA

View Full Version : New Features Inquiry


Charter
10-05-2003, 10:59 AM
Hi. Besides the information posted on these forums, I've been thinking about perhaps integrating my commercial search script, or rather parts of it, into PhpDig. You can demo the commercial search script here (http://www.thinkding.com/software/). It has boolean and phrase searching capabilities, but it is not GPL and it does not index. If I do integrate it into PhpDig, then the integrated parts would then become GPL. Anyway, what I am wondering is if there is anything in particular from the commercial search script that you'd like to see in PhpDig. If nothing, then I'll forget this idea, but if there is something, it'd be good to know in case I decide to undertake the task.
;)

alivin70
10-06-2003, 12:41 AM
Originally posted by Charter
Hi. Besides the information posted on these forums, I've been thinking about perhaps integrating my commercial search script, or rather parts of it, into PhpDig. You can demo the commercial search script here (http://www.thinkding.com/software/). It has boolean and phrase searching capabilities, but it is not GPL and it does not index. If I do integrate it into PhpDig, then the integrated parts would then become GPL. Anyway, what I am wondering is if there is anything in particular from the commercial search script that you'd like to see in PhpDig. If nothing, then I'll forget this idea, but if there is something, it'd be good to know in case I decide to undertake the task.
;)
There are some very interesting features in yuor search engine.
I have a list of new fatures we are working on.
For example "Ad links".
My idea is to integrate Phpdig with a text banner server. We just need to create an "hook" between words searched by user and keyword of the banners.

To allow an easy integration with many ad servers I propose theese steps:
1) Allocation of the space on the right side of the results page phpdig. This column should collapse (or be absent) if the are no "sponsored links" or the feature is disabled.
2) Creation of an "hook function" inside Phpdig written for a specific AdServer. This function takes the Ads from the AdServer and shows them.
Then everything about ads is made by the AdServer: counts, statistics on clicks and so on

In this way it's easy to integrate phpdig with any AdServer. I have my own one, but is possible to do it with PhpAdsNew or others.

:D

alivin70
10-06-2003, 12:48 AM
Originally posted by Charter
Hi. Besides the information posted on these forums, I've been thinking about perhaps integrating my commercial search script, or rather parts of it, into PhpDig. You can demo the commercial search script here (http://www.thinkding.com/software/). It has boolean and phrase searching capabilities, but it is not GPL and it does not index. If I do integrate it into PhpDig, then the integrated parts would then become GPL. Anyway, what I am wondering is if there is anything in particular from the commercial search script that you'd like to see in PhpDig. If nothing, then I'll forget this idea, but if there is something, it'd be good to know in case I decide to undertake the task.
;)
A feature that I would improve in Phpdig is the calculation of relevance of the pages.
We are studying the algorithms that do that.
If you developed your own algorithm you have the right skills to help us.

We can work togheter to define new more powerful rules to calculate the "ranking" of a page.
We have to develop an extensible code to add new features as we hack Google algorithms ;)

Let me know what do you think about
Alivin70

Rolandks
10-06-2003, 04:25 AM
Hey,
boolean and phrase searching is a good idea :)

But i think the curent ranking is OK and it is not so important, because my Site-statistic shows that users often search for one or two words. And the Google ranking algorithms is not interesting on ONE Website, or what ranking will you create for ONE search word ?
"hook function" and Addserver - hmm, i don´t know who need this and for what, does this work international (US, European, etc. )

My favorite feature is to get word suggestions in the case of User-errors. documnetation must find documentation or Downlaod must find Download . It works well, problems are word-parts: manageroperating not suggest: manager operating like Google.

See my "Test Intelligent Php-Dig Fuzzy " in Signature, or this thread for the full story:
http://www.phpdig.net/showthread.php?s=&threadid=77

I think is not with difficulty to include this as phpDig Results table tags.

Not important :confused:

alivin70
10-06-2003, 04:42 AM
Originally posted by Rolandks
Hey,
boolean and phrase searching is a good idea :)

But i think the curent ranking is OK and it is not so important, because my Site-statistic shows that users often search for one or two words. And the Google ranking algorithms is not interesting on ONE Website, or what ranking will you create for ONE search word ?
"hook function" and Addserver - hmm, i don't know who need this and for what, does this work international (US, European, etc. )

I'm interested in it and also Charter, i guess ;)

Anyway I'll do that and release it GPL for Phpdig :D



My favorite feature is to get word suggestions in the case of User-errors. documnetation must find documentation or Downlaod must find Download . It works well, problems are word-parts: manageroperating not suggest: manager operating like Google.

See my "Test Intelligent Php-Dig Fuzzy " in Signature, or this thread for the full story:
http://www.phpdig.net/showthread.php?s=&threadid=77

I think is not with difficulty to include this as phpDig Results table tags.

Not important :confused: That's a great idea and I agree with you.
I've already read the thread and see your test page.
Thanks for let me discover the nice function SOUNDEX() and related. I will help to develop this feature, if I can.

The reason why I need certain features is because I'm building not a single site search engine, but a "few sistes search engine".
Where few means 10-20, depending on Phpdig capacity and speed.

Alivin70

PS Please read my post about documentation of the code (http://www.phpdig.net/showthread.php?s=&threadid=130) ASAP, to avoid some work going lost.

druesome
10-15-2003, 07:05 AM
Hi All,

I would gladly help in developing an algorithm for PHPDig. I want to find out first though, where in the scripts is the variable $weight being computed? I'm not that satisfied with the current relevance ranking. I want to give more weight/importance to the titles than the text. Thank you.

alivin70
10-15-2003, 08:05 AM
Originally posted by druesome
Hi All,

I would gladly help in developing an algorithm for PHPDig. I want to find out first though, where in the scripts is the variable $weight being computed? I'm not that satisfied with the current relevance ranking. I want to give more weight/importance to the titles than the text. Thank you. Hi drue
i'm also interested in hacking the page weighting, but I dindn't start it yet.

Maybe the documentation on my website could help you to find
the relevant piece of code.
look at this thread (http://www.phpdig.net/showthread.php?s=&threadid=130) for more details.

I think it could be useful to add more parameters to adjust the weight of a result.
I'm not completely sure, but at the moment it's possible to change the relative weight of a page if the the keyword is found in the title. Looking at config.php i've found
define('TITLE_WEIGHT',3); //relative title weight

We can add weight for meta keywords or for other parameters.

The best thing to do is to put the weighting method in a function or class that can be developed separately from a person or a team. That function could be also easily customized for special purposes.

I the future we can think to implement the simplest Google algorithms of page ranking, for example the weight associated to links: if a page A contains a link named "word" to the page B and you search "word" in google, you will find page B before A, even if page B doesn't contain the keyword "word".
That's reasonable and is the base of Google power!



bye for now
Alivin70

druesome
10-16-2003, 04:16 AM
Hey Alvin,

I think I figured out a hack that gives a higher score to a result if the query terms match the title. I will share it with everyone soon, because it's still kind of sloppy, but it does the job and I'm quite happy with it. What I'll try next is to give each site a pagerank, much like Google's, and to make it have some effect on the search results. Later, and wish me luck.

alivin70
10-16-2003, 04:55 AM
Originally posted by druesome
Hey Alvin,

I think I figured out a hack that gives a higher score to a result if the query terms match the title. I will share it with everyone soon, because it's still kind of sloppy, but it does the job and I'm quite happy with it. What I'll try next is to give each site a pagerank, much like Google's, and to make it have some effect on the search results. Later, and wish me luck. I wish you lots of luck! :)

Anyway, what do you mean with pagerank? A number calculated by the spider (Google style) of assigned by the administrator (dummy but simpler)?

sid
10-16-2003, 09:02 PM
Hi, I'd like to see the Boolean Capibiltis and the "" phrase search, please.

Can't wait to see the next version of PHPDIG!

Wayne McBryde
10-27-2003, 07:23 PM
I would really like to see a option where you install the software for those of us that know very little about installing scripts on our servers. Of course this option would not be free, but I would pay a reasonable amount to have you install it.

Thanks

pittster
11-04-2003, 10:31 AM
I'm thinking of adding a feature to log commonly searched keywords and provide a report that could be emailed or viewed online.

This is beneficial to site administrators so they can make commonly searched for items more visible on the site.

If it is already in the works please let me know

drjoju
11-05-2003, 01:38 AM
Hi all!

I think some people are not focusing in the final objective of phpdig. Search and Index Engine!!

If you think that this is the most important objective, them the new features must be :

1.- the boolean capabilities and the "" exact phrase.
2.- Add new file types. If necessary.
3.- The Rolandks idea of word suggestion. Good Idea.
4.- Repair bugs and modify the spider to sniff local directories. (It doesn't work to me or I don't know how to do it)
5.- Integrate new external engines. wvware for example.
6.- Add a commit hook system to index new files without reindex.

As you can see there is a lot of work.

I Know that exists a registered version, but I believe in GPL and the open source.

Best regards!

alivin70
11-05-2003, 01:58 AM
Originally posted by drjoju
Hi all!
[...]
1.- the boolean capabilities and the "" exact phrase.
2.- Add new file types. If necessary.
3.- The Rolandks idea of word suggestion. Good Idea.
4.- Repair bugs and modify the spider to sniff local directories. (It doesn't work to me or I don't know how to do it)
5.- Integrate new external engines. wvware for example.
6.- Add a commit hook system to index new files without reindex.
[...]

I agree, 1) is the most important.

4 is easy, if your web server is not public, configure your apache to have web access to files you need and spider it with phdig.
Be careful to permissions, use some .htaccess if you want to protect your dirs

3 is a great idea, but quite difficult. I hope Rolandks will give us good news soon.

6 I proposed that feature and thinking for its implementation. I will inform you as soon as I will hane some news.

2 Needs some external parser (link for PDF or Word files), you can propose some if you know.

5 I didn't understand what you mean .... :(

drjoju
11-05-2003, 11:13 AM
Hi Alivin70,

with point 5 I want to say that exists other engines to parse files like wvware.sourceforge.net that parses doc files.

Best regards.

ZAP
01-19-2004, 05:33 PM
I would definitely update (and make all my wacky custom changes again) if the new version includes phrase indexing. I almost didn't use phpDig originally because it lacked that feature. So many folks are now accostumed to using quotes to find phrases in web search engines that it confuses them when it doesn't work that way (and it doesn't return the results they want).

Charter
01-19-2004, 05:36 PM
Hi. Try the current demo and select 'exact phrase' when searching. For example, try searching on 'apache server' (without the quotes) and see the differences between the options. How does it seem?

ZAP
01-19-2004, 06:09 PM
That's almost exactly what I was looking for. Good work! However I would suggest something a little bit different in the interface.

I personally like the way you set up the interface with AND/Exact Phrase/OR radio buttons, but it's not the same as other popular search engines do it (Google, etc.). I would prefer that people be able to type in the operators if they want them (including NOT and +/-), but that's not such a big deal to me since most people don't use them (on non-technical sites anyway). However I do think that phpDig should recognize exact phrase searches by quotation marks rather than a radio button if it's possible to do so.

There are two reasons I'd prefer it this way:

1. It's the way that other search engines work, and lots of people will just type it that way and never even look at the radio buttons.

2. It allows for multiple exact phrases, as well as mixes between the terms (e.g., "apache server" configuration, or "apache server" "user accounts", or even "apache server" configuration "user accounts" -root, etc.).

I know that this would involve some changes in the flow of your script, but it seems as if the indexing and search code is all there to do it this way now, so you might want to consider it.

Thanks for all the work and effort you put into this!

Charter
01-19-2004, 06:20 PM
Hi. I agree with your points, but the current database structure for PhpDig is not compatible with a quoted phrase such as you suggest and as is already written by me for this (http://www.thinkding.com/software/) script (it's not GNU/GPL and this is not self-promotion - just pointing it out). Certainly, the TD script could be incorporated into PhpDig (currently they will NOT work together) but, given the PhpDig database structure already out there, the exact phrase button is what I am 'up to' so to speak.

Give me time. BUWAHAHAHA! ;)

ZAP
01-19-2004, 07:01 PM
Aha! Well your non-GPL script looks excellent (as well as cheap!), so personally I don't mind you plugging it. In fact, I'd be happy to pay you such a meager amount for all your work and help (though I'm a little concerned about surpassing my host's annoying 20-meg MySQL limit, so I would like to know more before I migrate).

Since the database structure is incompatible with the way I suggested, then the way you are currently modifying phpDig is an acceptable (and welcome) alternative.

Thanks for the quick reply.

Charter
01-19-2004, 07:10 PM
Hey, no problem. I've been thinking about a 'public' search script for a while and was blessed with the opportunity to continue the PhpDig project.

Call me weird, but I am very grateful to Antoine. :)

Anyway, any such 'public' script needs the backend, as these scripts by themselves cannot be the next 'Google' as such, if you know what I mean.