View Full Version : A few suggestions for a revamp

07-22-2006, 10:07 PM
I would like to suggest we pick this project apart and put it back together again. Here's why its to much in one place simple enough?

I think this should be in major blocks.

The crawler/Indexer.
The interface for administration/site addition.
The Output.

And before someone goes on to tell me that they already are. They are not.

The crawler/Indexer is its own unit but If I want to build a custom interface or a better interface I have to hunt down ever variable I need to interface with it.

One of the things I would like to recode the Indexer for is using Memory tables instead of hammering away at my servers hard drives for what is essentially notepad space.

Also Its silly not to hotcopy a table over Ram for getting results as well. Ram is 100's of times faster and this a basic function of Mysql 4.1 and up. I understand that some people haven't switched yet but truly are we all here to keep everyone in the herd or make things as good as they truly could be.

The basic concept at all times with a search engine should be to avoid calls to the hard drive as its the slowest device to communicate with. By making sure these units are modularized properly a old version and Hot new version can be worked on in seperate streams if need be without effecting the other 2/3's of the total engine because we will have set STANDARDS on communication of variables.

Finally Output. Frankly the Output should be a standard super clean XML and nothing else. If you want a template then use a xml parser to display it on your favorite html/php or whatever it even makes your data portable to other websites without wasting processor/memory/and especially bandwidth.

Stats are nice but not critical. These can be stored in a memory table and written to a permanent table on Cron or other timed function again the point is to minimize how often we have to assume control of the read/write head on a harddrive.

The most important (and impressive) part of the code is the parser and storage unit. But I have code that can fetch pages 50 times faster and I truly am having grief figuring out where to start pulling this thing apart.

So the question is. Does anyone care to raise the level of professionalism of this? Or maybe start a seperate project to achieve the basics of search and storage. Maybe look at some of the new features of Mysql 5.0 as a starting platform.

08-08-2006, 04:35 AM
Out of respect for Charter, I have never been supportive of branching this project. However, It looks like things have dried up here due to lack of financial support for the project.

Although I think that PHPDig is great, I do agree with your analysis of the project. Breaking the project into the areas that you suggest is an excellent idea. I can see branching the crawler/indexer and reworking it under the same license. I would like to see commercial projects for the administration and the search engine. Such commercial projects could provide some financial support for the crawler/indexer project and encourage the involvement of new developers in the project.

I think the place to start is with the crawler/indexer. Just breaking it off on its own will get the process started. When people are able to use it on its own to accumulate valuable data, the desire to build commercial search engine modules to access the indexed data will be there and commercial administration modules will start showing up as well.

What do you think about these ideas, Charter?