PhpDig.net

Go Back   PhpDig.net > General Forums > Feedback & News

Reply
 
Thread Tools
Old 11-16-2003, 11:07 PM   #1
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
PhpDig Version 1.6.4 Released

Hi. PhpDig version 1.6.4 has been released as a minor release. The changes can be found in the Changelog file. Assuming the bugs from version 1.6.3 have been fixed in version 1.6.4, a future release will likely be a major release.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 11-17-2003, 05:14 PM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Please, if you've installed PhpDig version 1.6.4, I'd like to hear from you. What I am looking for is feedback, good or bad, on how version 1.6.4 is working for you. This will let me know where the code shines and where the code needs improvement, especially before releasing a major release. Thanks for helping.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 11-19-2003, 05:02 PM   #3
David J Harmon
Orange Mole
 
David J Harmon's Avatar
 
Join Date: Sep 2003
Location: Corbin KY
Posts: 45
I loading it up tonight and see how it works...

wish me luck and I'll give feedback on it.
__________________
David J Harmon
Cyberkopia.NETwork Geeks Style
http://cyberkoipia.net
David J Harmon is offline   Reply With Quote
Old 11-19-2003, 08:28 PM   #4
sid
Former Member
 
Join Date: Sep 2003
Posts: 34
Works excellent... Just what I wanted...... Well done and good work, keep it up...


There is a Idea in the "Mod requests" forum about making an "Image Search" which I have classified it and made it into an Idea that is "Quite Possible" by you, as being such a great PHP developer, can it be Done, and don't forget to name it "PhpDig Image search, An possible Idea by sid "
sid is offline   Reply With Quote
Old 11-19-2003, 08:38 PM   #5
David J Harmon
Orange Mole
 
David J Harmon's Avatar
 
Join Date: Sep 2003
Location: Corbin KY
Posts: 45
Charter take note of this, I would like to have PhpDig Image Search, it will work great with my site...
__________________
David J Harmon
Cyberkopia.NETwork Geeks Style
http://cyberkoipia.net
David J Harmon is offline   Reply With Quote
Old 11-20-2003, 05:48 PM   #6
David J Harmon
Orange Mole
 
David J Harmon's Avatar
 
Join Date: Sep 2003
Location: Corbin KY
Posts: 45
Cool sipping coffee and banging keys

Well I just added 100 more host to my database (out of 525 host, 16,304 pages, 4,648,468 indexs,264,792 keywords) and it still working great. I've been spidering all day, well I did take a break and watch screen savers on tech tv. So what is on the burner for the next major upgrade? I like the ideal on an image search, but I would like to see some more option on the admin page. But other than that I think its a strong program.
__________________
David J Harmon
Cyberkopia.NETwork Geeks Style
http://cyberkoipia.net

Last edited by David J Harmon; 11-20-2003 at 05:54 PM.
David J Harmon is offline   Reply With Quote
Old 11-26-2003, 11:44 AM   #7
mark
Green Mole
 
Join Date: Nov 2003
Posts: 5
Questions and Comments...

Hello, I'm using 1.6.4. I really like this package, it has been fun playing with it.

I installed PhpDig and spidered a few of my sites, then was busy with other things for a few days, when I came back I was pleasantly surprised to see many new domains had been spidered while I was away, even though I didn't recognize any of the new sites (in other words not sure where it found the links to them...?). After installing PhpDig, it seemed that all spidering must be manually initiated. Is the spider actually running automatically, and if so, what is the algorithm that it uses to branch out? Is this what I'm seeing when I see the "locked" sites in my domain list in the admin panel? Another question is what happens to all the links that the spider finds when spidering a site? Does it save them all and eventually come back and spider them as well?
mark is offline   Reply With Quote
Old 11-26-2003, 11:57 AM   #8
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Depending on the level used, PhpDig will go and index sites from the links it finds. Locked sites are sites that are currently being crawled. Sometimes, if a crawl terminates prematurely, a site can remain locked, but you can unlock the site from the admin panel. The timeframe for the crawl process can take some time, especially with a lot of links and a high level. PhpDig will not start by itself unless you set a cron job. Link information from the sites is stored in the database tables, and text from the pages is stored in flat files.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 11-26-2003, 01:27 PM   #9
David J Harmon
Orange Mole
 
David J Harmon's Avatar
 
Join Date: Sep 2003
Location: Corbin KY
Posts: 45
I never had it start working by itself, which I don't want becasue I like to see what site are be added. I have a Gaming Search Site and I have all different ages looking for sites and I don't want any adult site or other garbage to come up.
__________________
David J Harmon
Cyberkopia.NETwork Geeks Style
http://cyberkoipia.net
David J Harmon is offline   Reply With Quote
Old 11-26-2003, 01:29 PM   #10
mark
Green Mole
 
Join Date: Nov 2003
Posts: 5
Thanks Charter, that makes sense. I must have had a crawl that didn't get stopped when I thought it did.

Again this is a great tool, especially for the price, thanks!

Suggestions:

There is a config variable for holding a particular SessionId tag to remove from the URLs, it would be nice if this could be an array, because there are so many different ones... SID, SESSID, etc. I tried adding it, but my PHP isn't too good, so it didn't work.

When the crawl is done and the links found are shown, there could be an option to select a subset of these links with checkboxes, and a button to start the spidering of these links.

Is it possible to create a simplified PageRank feature (like Google), that skips all the fancy calculations, but does determine the number of links to a given page (from the PhpDig database pages) and factors this into the search results?

I'm have a little problem where I enter a search with a single keyword that is known to appear on Page X, and the results show Page X, but sometimes the snippet doesn't contain the keyword and neither does the page title. Why might the snippet not be the one that contains the keyword?

I want to allow PhpDig to jump to different domains and have my configuration set this way:
define('PHPDIG_IN_DOMAIN',true);
Is the correct for what I want?

If so, that leads to another question. I spidered a site which links to dozens of different sites right on the home page (no frames) with a depth of 2, but only pages from this domain were added. (If my config above is wrong for this then nevermind.)

When I use a spidering depth of 1, it grabs the target URL plus links directly from that page. But what if I wanted to only grab the home page of each domain, there doesn't seem to be an easy way to do this. Could there be a search depth option of 0, which only grabs that page?

Some spiders take into account the load that they might put on the servers they crawl and space the individual downloads out. Is this possible to integrate into PhpDig, in order to keep the webmasters out there from all banning PhpDig?

I read the thread here, about PhpDig indexing the meta tags and comments, things the user would never see. I tried all the suggestions posted there for regular expressions to zap that, but couldn't get that to work. Maybe this could be worked out correctly and added as an option.

I was getting results where my keyword was not in the page title for result 1, and the keyword was in the page title for result 2, so I tried changing the TITLE_WEIGHT config variable, changing it from 10 to 10000 to -1000. but never saw any change in the results. Is this setting only applicable to spider time, or can it be changed globally at any time?

Thanks again!

Last edited by mark; 11-26-2003 at 02:00 PM.
mark is offline   Reply With Quote
Old 11-26-2003, 02:16 PM   #11
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Thanks for the suggestions. Here are some answers.

If you go to the admin panel, click a site, and click a blue arrow, you'll see the links in that (sub)tree. If you click a green check mark, PhpDig should reindex that (sub)tree using the setup in the config file.

For jumping domains, this thread might help with what you want.

Adding an array for session id is probably easy, adding PageRank is probably hard (not sure it makes sense for limited crawling), will look into the snippet issue (does the highlighted word show up if you increase the snippet length), and adding a 'wait' variable is probably easy.

With regard to tags, do you mean that you see HTML comments in the search results (can you post an example), or is it that you want to be rid of META tag description and keywords text in the search results? If the latter, comment out the code in post seven of this thread.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 11-26-2003, 03:20 PM   #12
mark
Green Mole
 
Join Date: Nov 2003
Posts: 5
>> For jumping domains, this thread might help with what you want.

That is great, I'll certainly try that. But the thing that this really makes me wonder about is how domain Z got into my database if I never specifically spidered it in the admin panel?


>> With regard to tags, do you mean that you see HTML comments in the search results (can you post an example), or is it that you want to be rid of META tag description and keywords text in the search results? If the latter, comment out the code in post seven of this thread.

Not comments, but meta keywords. I tried commenting that section out, then deleting my test case domain, and respidered, it still came up for the metakeyword, so I haven't found a solution for that.

What about the keyword in title weighting? I guess that should just work...?
mark is offline   Reply With Quote
Old 11-26-2003, 03:46 PM   #13
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Keeping that code section commented out, try deleting the test case domain and also deleting the test case domain files in the text_content directory and then do a new index. Weights are stored in a database table so a new index should change the order.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 11-26-2003, 04:04 PM   #14
mark
Green Mole
 
Join Date: Nov 2003
Posts: 5
Yes, I see now that when I respidered with a negative title weighting, those pages with the keyword are buried at the end. Thanks.
mark is offline   Reply With Quote
Old 11-26-2003, 04:11 PM   #15
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Do you mean that after you did the thing in two posts above, the meta keywords and description still show up?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
PhpDig Version 1.8.5 Released Charter Feedback & News 4 12-15-2004 09:18 PM
PhpDig Version 1.8.4 Released Charter Feedback & News 4 12-12-2004 01:43 AM
PhpDig Version 1.8.3 Released Charter Feedback & News 6 08-01-2004 01:04 PM
PhpDig Version 1.8.2 Released Charter Feedback & News 0 07-12-2004 04:41 PM
PhpDig Version 1.6.3 Released Charter Feedback & News 0 11-10-2003 04:00 PM


All times are GMT -8. The time now is 11:35 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.