PhpDig.net

Go Back   PhpDig.net > General Forums > The Mole Hole

Reply
 
Thread Tools
Old 06-06-2004, 08:39 AM   #1
sufehmi
Green Mole
 
Join Date: Jun 2004
Posts: 4
Indexing the Internet

I'm a bit concerned to Google's domination on search, so like many others I signed up to Grub.org.

Unfortunately, their server software is not open-sourced, so I looked around again to find another similar project.

***** is a good search engine and it's open-source, however they're not interested on implementing distributed crawler like Grub. And I don't know Java

So I was looking for a good PHP-based search engine, and found PhpDig. I just installed it, and it looks quite good.

I'm very interested to start a project to index the Internet using PhpDig.
I think we can scale PhpDig for this, example: we can separate the various components (indexer, search front-end, database, etc) into multiple physical servers for each component, MySQL have clustering feature now, etc.

If anyone else's interested, feel free to join in.

This is the to-do list for this project:

# Purchase a dedicated server for the project
# Get domain names list by signing up [ here ] and [ here ]
(read [ this ] and [ this ] for details)
# Code a job allocator, which will allocate job packages to users. It will assign several domain names (from the list above) to be deep-crawled by users.
# Code a job manager, which will receive submission from users, and merge it to the main index.
# Modify spider.php to be able to request job packages (with user authentication), crawl the domains, and submit the result back securely. (running as php cgi)
# Create a simple website; with basic stats, user management, and search front-end.

That should be enough to get this project off the ground.

This project will be fully open and strictly non-profit.

Thanks for the PhpDig developers, and here's hoping that this will be useful for everyone as well.



Thanks,
Harry
sufehmi is offline   Reply With Quote
Old 06-06-2004, 08:58 AM   #2
bloodjelly
Purple Mole
 
Join Date: Dec 2003
Posts: 106
Sounds like an ambitious task, sufehmi, to put it mildly--especially considering that as of now phpDig can only spider one site at a time per database. Also, and no offense, but why would you want to do this? Google's "domination" on search, most people agree, provides relevant information quickly and easily, giving useful results in a fair manner. Do you plan on out-Googling Google? Everywhere you look some search engine is trying to top them, and they're spending millions and millions of dollars to do it. If they do, great, just as long as we still get relevant information. That's the only thing people want. So I guess my question is why would you want to compete with these big businesses, and why would you want to do something that many many other people have already almost done? I'm personally happy with my search results thus far.

Last edited by bloodjelly; 06-06-2004 at 09:03 AM.
bloodjelly is offline   Reply With Quote
Old 06-06-2004, 12:59 PM   #3
sufehmi
Green Mole
 
Join Date: Jun 2004
Posts: 4
Lightbulb

Quote:
Originally posted by bloodjelly
Sounds like an ambitious task, sufehmi, to put it mildly
You're absolutely correct. And I'm not the best PHP coder either.

But if I can drive people's interest to this project, I think this project has a good chance to succeed.
Making it very easy to contribute is one of the trick (by enabling them to run the spider)

And I think I can get the project off the ground by my own, where hopefully it'll be interesting enough for others to join in.


Quote:
especially considering that as of now phpDig can only spider one site at a time per database.
Yes, you're correct, it needs an additional module that's able to accept results from multiple spiders and incorporate that into the main index.


Quote:
Also, and no offense, but why would you want to do this? Google's "domination" on search, most people agree, provides relevant information quickly and easily, giving useful results in a fair manner.
# One thing that everyone agree is that Google is among the most powerful entity in the Internet at the moment.

# At the moment they're doing a great job playing it fair (for most people), but there's no guarantee for the future.

# Google is excellent, but there are a few stuff that I (and no doubt others) would like to enhance.
(link farm anyone ? Google spammer ? etc)

# It will be one mighty interesting project


Quote:
Do you plan on out-Googling Google? Everywhere you look some search engine is trying to top them, and they're spending millions and millions of dollars to do it. If they do, great, just as long as we still get relevant information. That's the only thing people want. So I guess my question is why would you want to compete with these big businesses, and why would you want to do something that many many other people have already almost done? I'm personally happy with my search results thus far.
I'd be a total idiot if I think that I can beat Google by myself.

But when people are working together, I think nothing is impossible.


Thanks,
Harry
sufehmi is offline   Reply With Quote
Old 06-06-2004, 04:28 PM   #4
bloodjelly
Purple Mole
 
Join Date: Dec 2003
Posts: 106
Well good luck, let's hear how your project progresses.
bloodjelly is offline   Reply With Quote
Old 08-03-2004, 04:18 PM   #5
MySQLwebmaster
Green Mole
 
Join Date: Aug 2004
Posts: 1
I like how you're thinking. Sounds like a great project. Best of luck. Any progress report?
MySQLwebmaster is offline   Reply With Quote
Old 08-04-2004, 12:56 PM   #6
sufehmi
Green Mole
 
Join Date: Jun 2004
Posts: 4
Nope, unfortunately I'm still busy coding for phpBB and phpOpenChat, among other things.

Well anyway, this gives me opportunity to look for a better server within my budget I can't believe how cheap dedicated server nowadays (as long as you don't host anything business-critical)

In the meantime if anyone is interested to join in, just drop me an email or post in this thread.


cheers, HS
sufehmi is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Site and Internet Search twizzlermambo Coding & Tutorials 0 07-29-2007 05:42 PM


All times are GMT -8. The time now is 07:27 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.