PhpDig.net - View Single Post

sufehmi · 06-06-2004, 08:39 AM

I'm a bit concerned to Google's domination on search, so like many others I signed up to Grub.org.

Unfortunately, their server software is not open-sourced, so I looked around again to find another similar project.

***** is a good search engine and it's open-source, however they're not interested on implementing distributed crawler like Grub. And I don't know Java

So I was looking for a good PHP-based search engine, and found PhpDig. I just installed it, and it looks quite good.

I'm very interested to start a project to index the Internet using PhpDig.
I think we can scale PhpDig for this, example: we can separate the various components (indexer, search front-end, database, etc) into multiple physical servers for each component, MySQL have clustering feature now, etc.

If anyone else's interested, feel free to join in.

This is the to-do list for this project:

# Purchase a dedicated server for the project
# Get domain names list by signing up [ here ] and [ here ]
(read [ this ] and [ this ] for details)
# Code a job allocator, which will allocate job packages to users. It will assign several domain names (from the list above) to be deep-crawled by users.
# Code a job manager, which will receive submission from users, and merge it to the main index.
# Modify spider.php to be able to request job packages (with user authentication), crawl the domains, and submit the result back securely. (running as php cgi)
# Create a simple website; with basic stats, user management, and search front-end.

That should be enough to get this project off the ground.

This project will be fully open and strictly non-profit.

Thanks for the PhpDig developers, and here's hoping that this will be useful for everyone as well.

Thanks,
Harry

06-06-2004, 08:39 AM	#1
sufehmi Green Mole Join Date: Jun 2004 Posts: 4	Indexing the Internet I'm a bit concerned to Google's domination on search, so like many others I signed up to Grub.org. Unfortunately, their server software is not open-sourced, so I looked around again to find another similar project. ***** is a good search engine and it's open-source, however they're not interested on implementing distributed crawler like Grub. And I don't know Java So I was looking for a good PHP-based search engine, and found PhpDig. I just installed it, and it looks quite good. I'm very interested to start a project to index the Internet using PhpDig. I think we can scale PhpDig for this, example: we can separate the various components (indexer, search front-end, database, etc) into multiple physical servers for each component, MySQL have clustering feature now, etc. If anyone else's interested, feel free to join in. This is the to-do list for this project: # Purchase a dedicated server for the project # Get domain names list by signing up [ here ] and [ here ] (read [ this ] and [ this ] for details) # Code a job allocator, which will allocate job packages to users. It will assign several domain names (from the list above) to be deep-crawled by users. # Code a job manager, which will receive submission from users, and merge it to the main index. # Modify spider.php to be able to request job packages (with user authentication), crawl the domains, and submit the result back securely. (running as php cgi) # Create a simple website; with basic stats, user management, and search front-end. That should be enough to get this project off the ground. This project will be fully open and strictly non-profit. Thanks for the PhpDig developers, and here's hoping that this will be useful for everyone as well. Thanks, Harry