I'm a bit concerned to Google's domination on search, so like many others I signed up to
Grub.org.
Unfortunately, their server software is not open-sourced, so I looked around again to find another similar project.
***** is a good search engine and it's open-source, however
they're not interested on implementing distributed crawler like Grub. And I don't know Java
So I was looking for a good PHP-based search engine, and found PhpDig. I just installed it, and it looks quite good.
I'm very interested to start a project to index the Internet using PhpDig.
I think we can scale PhpDig for this, example: we can separate the various components (indexer, search front-end, database, etc) into multiple physical servers for each component, MySQL have clustering feature now, etc.
If anyone else's interested, feel free to join in.
This is the to-do list for this project:
# Purchase a dedicated server for the project
# Get domain names list by signing up [
here ] and [
here ]
(read [
this ] and [
this ] for details)
# Code a job allocator, which will allocate job packages to users. It will assign several domain names (from the list above) to be deep-crawled by users.
# Code a job manager, which will receive submission from users, and merge it to the main index.
# Modify spider.php to be able to request job packages (with user authentication), crawl the domains, and submit the result back securely. (running as php cgi)
# Create a simple website; with basic stats, user management, and search front-end.
That should be enough to get this project off the ground.
This project will be fully open and strictly non-profit.
Thanks for the PhpDig developers, and here's hoping that this will be useful for everyone as well.
Thanks,
Harry