PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   How-to Forum (http://www.phpdig.net/forum/forumdisplay.php?f=33)
-   -   google like search engine (http://www.phpdig.net/forum/showthread.php?t=1284)

sasa 09-09-2004 01:05 PM

google like search engine
 
Hi All,

I've gone up and done these forums, and I am a bit confused (nothing new!) ;)

I want to create a search engine for a niche market. There may several thousand sites in this niche.

1) I want to start by listing a few big ones...
2) have the ability for the people to come and request to be indexed
(the process would have a screen for them to list their URL, and after approval, they would get indexed based on some sort of schedule)

So some questions:

a) How is the data stored?
- Does the PhpDig store the URL, Title, Description, Keywords of the "crawled" pages in the mySQL database?
- Where are the actual INDEXED content of the pages stored?

b) How much storage is needed?
- i.e. if we have 1000 sites, with 15 pages each... a total of 15,000 pages, How much storage would be needed?

c) How quick is the code?
- Using the above example (15,000 pages), how long would a 2 word search take?

d) And most importantly, has someone put together a Moded version for this kind of application?

thanks,
Sam

vinyl-junkie 09-09-2004 08:11 PM

Quote:

Originally Posted by sasa
a) How is the data stored?
- Does the PhpDig store the URL, Title, Description, Keywords of the "crawled" pages in the mySQL database?
- Where are the actual INDEXED content of the pages stored?

Just go into phpMyAdmin and look at the data structure there. It will show you all the tables and fields inside each.

Quote:

b) How much storage is needed?
- i.e. if we have 1000 sites, with 15 pages each... a total of 15,000 pages, How much storage would be needed?
You can't just go by number of pages. It also depends on the size of those pages, and how many keywords are contained in them. Probably a few other factors too that don't readily come to mind.

Quote:

c) How quick is the code?
- Using the above example (15,000 pages), how long would a 2 word search take?
See above. Again, it depends.

Quote:

d) And most importantly, has someone put together a Moded version for this kind of application?

thanks,
Sam
Don't know, but maybe you should get together with the person who started this thread.

sasa 09-10-2004 08:26 AM

Dear junkie,

Thanks for the reply. However, you did not give ANY answers :(

I do understant EVRYTHING depends on something else!

I have not downloaded the code, or installed it yet. My host is on "all Windows" platform. All I wanted to get some estimates before I went and paid for a linux hosting just to try the code.

You seem to know a lot about this code... so here are a few questions:

a) How is the data stored?
- Does the PhpDig store the URL, Title, Description, Keywords of the "crawled" pages in the mySQL database?
- Where are the actual INDEXED (text) content of the pages stored?


b) In your own installation, what are the sizes of the database and how long would a 2 word search take?

vinyl-junkie 09-10-2004 09:41 AM

Quote:

Originally Posted by sasa
I have not downloaded the code, or installed it yet. My host is on "all Windows" platform. All I wanted to get some estimates before I went and paid for a linux hosting just to try the code.

I didn't know you hadn't downloaded the code yet. Otherwise, you'd be able to look at the database structure yourself.

One thing that you might not be aware of is that phpdig will work on a Windows server. However, my own experience with that has been that it doesn't work very well there. You might have better luck than me though. I have been told that a Windows server settings can be tweaked so that phpdig will work pretty well, but I've never pursued that myself so I can offer you any insight on that.

Quote:

You seem to know a lot about this code... so here are a few questions:
I've got everyone fooled! ;) Seriously, I really don't know that much about the code. I just know where to look up the answers to a lot of questions that are asked here in the forums.

Quote:

a) How is the data stored?
- Does the PhpDig store the URL, Title, Description, Keywords of the "crawled" pages in the mySQL database?
Yes, the database stores all those data elements in its tables.

Quote:

- Where are the actual INDEXED (text) content of the pages stored?
I'm not sure exactly what you need to know here. I can't speak for Charter (the forum owner/administrator), but if you're asking how the search engine works, the answer to that is probably beyond the scope of support offered in this forum.

Quote:

b) In your own installation, what are the sizes of the database and how long would a 2 word search take?
My own database has just over 1,500 pages indexed. When I do searches, it takes less than 1 second to retrieve the results.

Hope this answers your questions. :)


All times are GMT -8. The time now is 01:27 AM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.