View Single Post
Old 07-22-2006, 10:07 PM   #1
GunMuse
Green Mole
 
Join Date: Feb 2005
Posts: 24
A few suggestions for a revamp

I would like to suggest we pick this project apart and put it back together again. Here's why its to much in one place simple enough?

I think this should be in major blocks.

The crawler/Indexer.
The interface for administration/site addition.
The Output.

And before someone goes on to tell me that they already are. They are not.

The crawler/Indexer is its own unit but If I want to build a custom interface or a better interface I have to hunt down ever variable I need to interface with it.

One of the things I would like to recode the Indexer for is using Memory tables instead of hammering away at my servers hard drives for what is essentially notepad space.

Also Its silly not to hotcopy a table over Ram for getting results as well. Ram is 100's of times faster and this a basic function of Mysql 4.1 and up. I understand that some people haven't switched yet but truly are we all here to keep everyone in the herd or make things as good as they truly could be.

The basic concept at all times with a search engine should be to avoid calls to the hard drive as its the slowest device to communicate with. By making sure these units are modularized properly a old version and Hot new version can be worked on in seperate streams if need be without effecting the other 2/3's of the total engine because we will have set STANDARDS on communication of variables.

Finally Output. Frankly the Output should be a standard super clean XML and nothing else. If you want a template then use a xml parser to display it on your favorite html/php or whatever it even makes your data portable to other websites without wasting processor/memory/and especially bandwidth.

Stats are nice but not critical. These can be stored in a memory table and written to a permanent table on Cron or other timed function again the point is to minimize how often we have to assume control of the read/write head on a harddrive.

The most important (and impressive) part of the code is the parser and storage unit. But I have code that can fetch pages 50 times faster and I truly am having grief figuring out where to start pulling this thing apart.

So the question is. Does anyone care to raise the level of professionalism of this? Or maybe start a seperate project to achieve the basics of search and storage. Maybe look at some of the new features of Mysql 5.0 as a starting platform.
GunMuse is offline   Reply With Quote