View Single Post
Old 06-06-2004, 08:39 AM   #1
sufehmi
Green Mole
 
Join Date: Jun 2004
Posts: 4
Indexing the Internet

I'm a bit concerned to Google's domination on search, so like many others I signed up to Grub.org.

Unfortunately, their server software is not open-sourced, so I looked around again to find another similar project.

***** is a good search engine and it's open-source, however they're not interested on implementing distributed crawler like Grub. And I don't know Java

So I was looking for a good PHP-based search engine, and found PhpDig. I just installed it, and it looks quite good.

I'm very interested to start a project to index the Internet using PhpDig.
I think we can scale PhpDig for this, example: we can separate the various components (indexer, search front-end, database, etc) into multiple physical servers for each component, MySQL have clustering feature now, etc.

If anyone else's interested, feel free to join in.

This is the to-do list for this project:

# Purchase a dedicated server for the project
# Get domain names list by signing up [ here ] and [ here ]
(read [ this ] and [ this ] for details)
# Code a job allocator, which will allocate job packages to users. It will assign several domain names (from the list above) to be deep-crawled by users.
# Code a job manager, which will receive submission from users, and merge it to the main index.
# Modify spider.php to be able to request job packages (with user authentication), crawl the domains, and submit the result back securely. (running as php cgi)
# Create a simple website; with basic stats, user management, and search front-end.

That should be enough to get this project off the ground.

This project will be fully open and strictly non-profit.

Thanks for the PhpDig developers, and here's hoping that this will be useful for everyone as well.



Thanks,
Harry
sufehmi is offline   Reply With Quote