sufehmi
06-06-2004, 08:39 AM
I'm a bit concerned to Google's domination on search, so like many others I signed up to Grub.org (http://grub.org).
Unfortunately, their server software is not open-sourced, so I looked around again to find another similar project.
***** (http://www.*****.org/docs/en/) is a good search engine and it's open-source, however they're not interested on implementing distributed crawler (http://www.*****.org/docs/en/faq.html) like Grub. And I don't know Java :P
So I was looking for a good PHP-based search engine, and found PhpDig. I just installed it, and it looks quite good.
I'm very interested to start a project to index the Internet using PhpDig.
I think we can scale PhpDig for this, example: we can separate the various components (indexer, search front-end, database, etc) into multiple physical servers for each component, MySQL have clustering feature now, etc.
If anyone else's interested, feel free to join in.
This is the to-do list for this project:
# Purchase a dedicated server for the project
# Get domain names list by signing up [ here (http://www.verisign.com/nds/naming/tld/) ] and [ here (http://www.pir.org/registrars/zone_file_access) ]
(read [ this (http://forums.devshed.com/t139891/s.html) ] and [ this (http://www.webhostingtalk.com/showthread.php?threadid=52404) ] for details)
# Code a job allocator, which will allocate job packages to users. It will assign several domain names (from the list above) to be deep-crawled by users.
# Code a job manager, which will receive submission from users, and merge it to the main index.
# Modify spider.php to be able to request job packages (with user authentication), crawl the domains, and submit the result back securely. (running as php cgi)
# Create a simple website; with basic stats, user management, and search front-end.
That should be enough to get this project off the ground.
This project will be fully open and strictly non-profit.
Thanks for the PhpDig developers, and here's hoping that this will be useful for everyone as well.
Thanks,
Harry
Unfortunately, their server software is not open-sourced, so I looked around again to find another similar project.
***** (http://www.*****.org/docs/en/) is a good search engine and it's open-source, however they're not interested on implementing distributed crawler (http://www.*****.org/docs/en/faq.html) like Grub. And I don't know Java :P
So I was looking for a good PHP-based search engine, and found PhpDig. I just installed it, and it looks quite good.
I'm very interested to start a project to index the Internet using PhpDig.
I think we can scale PhpDig for this, example: we can separate the various components (indexer, search front-end, database, etc) into multiple physical servers for each component, MySQL have clustering feature now, etc.
If anyone else's interested, feel free to join in.
This is the to-do list for this project:
# Purchase a dedicated server for the project
# Get domain names list by signing up [ here (http://www.verisign.com/nds/naming/tld/) ] and [ here (http://www.pir.org/registrars/zone_file_access) ]
(read [ this (http://forums.devshed.com/t139891/s.html) ] and [ this (http://www.webhostingtalk.com/showthread.php?threadid=52404) ] for details)
# Code a job allocator, which will allocate job packages to users. It will assign several domain names (from the list above) to be deep-crawled by users.
# Code a job manager, which will receive submission from users, and merge it to the main index.
# Modify spider.php to be able to request job packages (with user authentication), crawl the domains, and submit the result back securely. (running as php cgi)
# Create a simple website; with basic stats, user management, and search front-end.
That should be enough to get this project off the ground.
This project will be fully open and strictly non-profit.
Thanks for the PhpDig developers, and here's hoping that this will be useful for everyone as well.
Thanks,
Harry