PDA

View Full Version : Taking Requests


Charter
05-01-2004, 03:40 PM
Hi. If you really, ReaLLy, REALLY want something in the next release of PhpDig, NOW is the time to make the request, as I'm almost done with the next release. Note: CONSENSUS RULES!!! So the more requests for a particular item, the better the chances of seeing that item in the next release.

gooseman
05-01-2004, 04:59 PM
2 requests:

1 - small fix for the plurality of the errors

"you", are very common words and were ignored.

to

"you", is a very common word and was ignored.

etc...

2 - not sure if it's worth doing, but utilise the google api for phrase/spelling suggestions. It's not very complicated, and I've used it in other php based applications (using nusoap). If enough people ask for it, let me know and I'll help!

fredh
05-01-2004, 05:53 PM
Hi,

I would like to see cookies being passed properly in all page requests. Currently, in versions of PHP < 4.3, cookies are only being sent properly in HEAD requests but not in the GET requests.

This is a huge problem when you have a site that depends on a sessionID cookie as an example.

Personally I have already fixed this bug in the versions of PHPDig that I use for my clients sites.

What I did was use this small library to do it:
http://snoopy.sourceforge.com

Its an extremely simple piece of code and I fixed the PHPDig bug in 6 lines of code. (in the robot_functions.php file). Mind you, to properly support authentication and such you will need to add a few more lines :)

I will be more then happy to share my code fixes, simply let me know.

PHPDig rocks!

fredh
05-01-2004, 06:03 PM
Not sure if this is being done yet, but it would be extremely useful to spider a site based on a full url path instead of just a domain name.

PHPDig currently does this, but what it does not do is store the search results based on the starting url. It instead stores them based on domain name.

This is a useful feature when you have a site in multiple languages. Example:

http://www.phpdig.net/en/index.php
http://www.phpdig.net/fr/index.php
http://www.phpdig.net/es/index.php

Those are entry points into the website that set a session language variable. All concurrent pages in the site are then rendered in the proper language.

The only way to support this in phpdig at the moment is to have 3 different installations which is a pain to maintain and adds unnecessary complexity/bloat to the sites code base.

I'm a linux guy personally, but what the best related example that I can think of is Microsoft's Indexing server concept of a catalog. It supports multiple catalogs with each catalog having a starting url. You can then write a search form that queries the catalog for results.

I hope this is somewhat clear, if not please let me know and I'll try to explain further :)

Did I mention that PHPDig rocks? Excellent work thus far!

bloodjelly
05-01-2004, 06:07 PM
First I want to say on behalf of everyone that you're doing an awesome job, Charter, both on development and support. Thanks also for taking requests! How cool.

Here are mine, in order of preference:

1) The ability to run multiple spider processes from the Admin panel, so that the whole will finish faster.

2) A limit on the total links per site spidered

3) Full URLs stored in the database as typed in when spidering...e.g. "http://www.site.com/folder" instead of "http://www.site.com"

:D

allergie
05-01-2004, 11:09 PM
Hi, yes I really like phpDig : a nice tool, congratulation.

I have a directory of website, as many people, and a search web engine as phpDig visit them and indexes them. I would have the possibility to enhance the pertinence of one website or another depending of MY HUMAN judgement.

I think it will be great to set keywords to a website and giving these keywords a value (1 to 5).

Jtb
05-01-2004, 11:24 PM
Hi,

I really need Unicode-Support.. :)

digirave
05-02-2004, 12:18 AM
my vote is for multibyte/unicode support

thanks for such great software

jannejava
05-02-2004, 01:29 AM
phpDig already rocks but have to agree with the two others above, unicode-support.

phrase/spelling suggestions would be nice to, but not critical.

JÿGius³
05-02-2004, 02:50 AM
Hi all.

I'd like to see in the next release some additions we have made
(me and alinin70); between others sponsored links (http://coosenza.cosenzainrete.it/ricerca.php?query_string=grafica).

We have added other features that we want to see in the next
release (a complete list is coming soon). First and foremost, integration with the google api using nusoap.

Ciao.

:DJyGius:D

ibrown
05-02-2004, 03:07 AM
> utilise the google api for phrase/spelling suggestions.

As Gooseman, JyGius and JanneJava suggested ... this would be most useful for me, in maintaining the search facilities for the Society of Indexers' mailing list archives.

Also, BloodJelly suggested

> The ability to run multiple spider processes from the
> Admin panel, so that the whole will finish faster.

Yes, please! I have to index 9000 pages on a monthly basis, and no matter how I do it, the indexing process still takes ~30+ hours to do!

Rolandks
05-02-2004, 04:24 AM
Okay, again my plan (post some month ago) and request for : Intelligent Php-Dig Fuzzy
I have see at my statistic, user have many write errors or words which are not found, because there are other words which looks like this.

My Request is a "Did you mean Tag": See: Google - Did you mean: (http://www.google.com/search?hl=en&lr=&ie=UTF-8&oe=UTF-8&q=frere+dictonaary&btnG=Google+Search)


My plan is:
- ideas to create better results
- create a Tag: phpdig: phonetic
- add phpdig: phonetic to index.php
- add this to the templates as: do you mean: ... (in all languages) and display to 3 TOP-hits if searchresult is empty
- add this seach to statistic --- found by phonetic ...

Problems are:
- different letter in front
- special language character: ü. ö, ä, ß - German "Straße" is in Database but "Strasse" isn't found anything (engl. street.
- a little slow (perhaps create a Table-Field for SOUNDEX and add at indexing or a Link in Admin-Menu.

You can view my source at: Test-Search-Page for (little) Intelligent Php-Dig Fuzzy (http://www.kunstbar.tv/suche/suchp.php)

Try this search: "autentication" "deskription" "hiperlinks" on my Test-page.

Roland

gooseman
05-02-2004, 04:34 AM
I like the fact that the google implementation is tried and tested and very easy to implement.

It will work really well for spelling corrections and suggestions*, but the phrase suggestions (spelling correct, but you'll get more hits with a similar suggested phrase) works on the statistical analysis of searches and common phrases on its indexed pages.

So, in my opinion, for spelling corrections, google API is really good. For phrase suggestions, you'd probably want something more customised to your individual site (or indexed pages).

For now, I'd say go with the google api, as it's there and simple to implement. A more custom solution should be targetted to suggesting phrases based on your actual page data set.

*it's interesting that google still doesn't return results for us/uk spelling crossovers or gives you the option for this - organisation and organization return completely different results. You have to search for 'organisations OR organizations' (Then is suggests - did you mean 'organizations OR organizations') - lol

rafarspd
05-02-2004, 05:13 AM
Charter - Thanks for your efforts, much appreciated here.

If you enter more than one search word can each one be highlighted in a different colour (like Google).

I do not think is is necessary to have the ability to switch them on and off (like Google).

drjohn
05-02-2004, 05:31 AM
I think all the suggestions are good, and I have been happy with phpdig.

Two things I would like to see:

1. meta-data for ranking pages.
2. Ability to crawl Web sites developed in Lotus Notes

-- John Chadwick

majestique
05-02-2004, 10:07 AM
better spidering UI... it seems to be a little buggy and sometimes when it's actually spidering.. it doesn't show that it's working in progress.. some users dont' know better and exit it.. which causes probs

orbitalz
05-02-2004, 11:57 AM
I love the script,

but can't use it because for some odd reason I can't get phpdig to parse php, cgi pages fully.

It seems to only want to parse:
www.domain.com/page.php
and not:
www.domain.com/page.php?837983*
etc.

so my suggestions are as followed:

1.) better support for fully parsing dynamic links.

2.) like mentioned before, show the full Url in the admin url window.

3.) a bulk spidering option, so when can just copy and paste a bunch of urls to be spidered.

1.a) better support for parsing dynamic pages.

RaGe
05-02-2004, 11:04 PM
As others have said, better bot functions such as:


Spider Functions:
Ability to index exact urls instead of having to index the raw domain. this is a total pain when trying to index pages from a yahoo store or Hometown addy.

Ability to approve and edit listings that are harvested by the spider

Ability to ignore base url, in order to harvest only off domain url links (handy for spidering link farms and DMOZ directories)


Results Display Functions:
Ability to ignore/not harvest page text but only Meta Tags

Ability to assign weights to displayed URL's.

Adult Filter


External Functions:
Ability to import and export XML feeds from other engines such as Google API, SearchFeed and RevenuePilot.

Jig and Alvins Keyword driven ad system, i third the motion to integrate it into the next Dig release..



----------------------------------------------------
Completed Mods:
Visitor tracking system module including IP, delivers stats in many formats.

Who's Online in realtime

Download counter module (we have a downloadable IE tollbar that interfaces with PhPDig allowing searches from anywhere on the net.

Automated template changer: configurable to change skins automatically at any set time of year, currently set to change skins like Google, Christmas theme, 4th of July theme etc.

Mod WorkBoard:
Integrated SQL functions in Admin, such as Backup/Restore

Integrated External Results function (as stated above, searchfeed, revenuepilot etc)

Multi URL Add in Admin: will allow for text list to be pasted into Admin that the spider will index incrimentally (in other words, paste 30 domains into admin and go make a sammich, it'll do the rest)

I never sleep lol

mbruere
05-03-2004, 12:05 AM
Hello !

Here is the place to say : "PhpDig is Wwwooooonnnnddddeeeerrrrrfffffuuuulllllllll !!!!!!!!!!!!!!!!!!"

It will be very nice if the features below will be integrated :

- Unicode support
- Soundex support



Thk !

blackfeather
05-03-2004, 06:39 AM
I also think phpdig is awsome...

One thing i'd like to see are options in the config file to completely ignore meta tags during indexing, and/or during displaying snippet results.

dfisk
05-03-2004, 07:36 AM
integated PDF document indexing would be nice.
It is an essential feature for CMS in the the US, particularly those in the .edu and .gov domain space.

Shain
05-03-2004, 08:03 AM
XML data import/export should be a good tool.

cbooth7575
05-03-2004, 08:09 AM
I also agree with people that i'm quite impresed by phpDig. Keep up the good work.

I would vote for 2 additions, that many other people have also mentioned:

1. Multiple Indexes for One Domain

this would allow me to index my sites, which often have multiple language versions, without having to create multiple installs. However, some people have suggested sites being structured like this:
http://site.com/en/whatever.php
http://site.com/fr/whatever.php

But keep in mind that it doesn't always work that way....on some of my projects, the URL remains the same, but a variable like $_COOKIES['lang'] is set

2. Unicode Support


thanks, and good luck with the updates.

takpoli
05-04-2004, 06:48 AM
1. Sponsor records support. Search displays results from limited (server option) number of sponsor records first and normal search results follow.

2. Side bar sponsor link for advertisement support from search results of mysql advertisement records.

3. Allows .pdf only index. Crawl through the web, but indix .pdf files only.

4. Allows http index from a file that has a list of target urls. Prevent timeout in this page by more status feedback.

5. Allows to crawl a single branch only.


I am new in using this wonderful tool. I hope my thought is not off the wall.

jerrywin5
05-04-2004, 08:12 AM
1. Indexing duplicate descriptions and keywords causing false search results. See thread. (http://www.phpdig.net/showthread.php?threadid=844)

2. Reduce duplicates in keywords table through more intelligent indexing. See thread. (http://www.phpdig.net/showthread.php?threadid=845)

3. Admin approval for spider to index external URLs. See thread. (http://www.phpdig.net/showthread.php?threadid=743)

4. Better support for PHP sessions.

5. Ability to set weight of data in config file. For example:
Title [1-5]
Description [1-5]
Keywords [1-5]
Content [1-5]

6. Soudex support of similar function

7. Implement the many mods in the mods forum and spread throughout the various forums as either new features or as options that can be turned on or off in the config file. This will make things much easier for people to implement new versions without having to compare each line of code for differences.

JÿGius³
05-04-2004, 09:15 AM
Hi.

This is a summary of the additions we have made (me and alivin70)

1)search.php
If someone search using a form, this form
has a static site id <input type="hidden" name="site" value="2">.
2 => www.phpdig.org.
But if you delete www.phpdig.org from the admin area, you have to
update the site's value (in our example 2).
In the form you can put <input type="hidden" name="site" value="phpdig.org">
and phpdigGetSiteidFromUrl() gets the correct site id.

2) url.php -
When a user click on a result, url.php logs:
a - the position of the clicked result,
b - the url (redundant),
c - the query,
d - date.
Very useful for statistics.

3) admin/index.php
When you start the spider, you have to select a limit > 0 because with
some site the option 0 doesn't start the spider.

4) admin/limit_upd.php
Cron management via web. It manages max number of pages per site as well; useful
when you don't want to index all the pages of a web site (a my God...this site
has thousands of pages).

5) admin/robot_functions.php
Erased a bug when you want to index pdfs. We have added a lot of logging at the end.
Among others, you have the statistics of clicks made by users...

6) admin/spider.php
When you start the spider you can tell it the max number of pages to index.
How long does it take to index a huge web site? Using the admin area you can
see which indexing has been interrupted, and which one has been completed.

7) includes/config.php
You can configure:
a - sponsored links: show it; don't show.
b - cron

8) libs/function_phpdig_form.php
Added form elements.

9) libs/phpdig_functions.php
Added phpdigGetSiteidFromUrl()

10) libs/search_functions.php
- The big HTML page at the end is in a separate file to save
parsing time.
- Sponsored links
- ....

11) libs/time.php
a function for logging the lasting of various events

12) templates/phpdig2.html
template modification

13) sql/init.sql
others tables.

14) libs/google.php
If you a user search something in a given web site and
there are no results... Google help us pleaz :-)

All our adjuncts should be fully explained... but if you are curious you can understand them thru the diffs I sent you. Additional explanations will coming soon...

Bye bye.

JyGius

Charter
05-04-2004, 10:23 AM
Hi all, and thanks for the suggestions. Thread closed. :)