PDA

View Full Version : PhpDig Version 1.8.1 Released


Charter
07-07-2004, 03:23 PM
Hi. PhpDig version 1.8.1 has been released as a 'minor+++' release. The changes can be found in the Changelog (http://www.phpdig.net/info/changelog.txt) file. Three database tables were added. To upgrade, add the tables to your database, reconfigure the new connect.php and config.php files, and copy over the old files with the new files.

bloodjelly
07-08-2004, 01:33 AM
Hey - thanks for updating.:) I've been playing with the new version and it works well, but when I add a new URL with a subdirectory as the root (e.g. http://www.geocities.com/website/) phpdig still only stores the base URL in the database. In your changelog, you mention that in this version you can "Search by site or directory." Is this only a variation in the search function itself, not the spider function?

Anyway, everything else looks really great and all the changes you've made wil help this become an even better search engine. Now all we need is multiple spiders :yum: (I'll stop bugging you)

Charter
07-08-2004, 08:40 AM
>> "Search by site or directory." Is this only a variation in the search function itself, not the spider function?

Hi. Yes, it's the search itself, not the spider function. What you might try is to do a limited index, setting links per so that you get a cursory spider, and then go and exclude the directories that you don't want and then reindex. This is a roundabout way to limit spidering to certain directories on sites you don't own.

With 1.8.1 there are changes to limit indexing using the links per option, and if set, the extra index info that used to be kept in the tempspider table is now deleted, so it might be the case that multiple spidering is now possible. What happens if you run two 1.8.1 spiders on two different sites, setting links per for each spider?

One other tip is that if you want to stop all spidering processes, just keep clicking the delete button, without selecting a site, until the sites being spidered go from locked to unlocked. Using the browser stop button will not necessarily stop the process on the server, but once the tempspider table is empty, spidering should stop.

bloodjelly
07-08-2004, 12:37 PM
The muliple spiders seemed to work when I ran two spiders from the web interface at the same time. Both finished correctly - nice! I also started a spider with the exec() command, which ran for a while and then stopped with links still in the temporary table and without unlocking. Most likely this is because I didn't set links per for this spider? This is the command I used:

exec("/usr/bin/php -f /home/search/admin/spider.php $site 2>&1 > /dev/null &");

As for the directory spidering issue, I think I might play around with the code to get it to do what I want, unless you plan on adding this feature to a future version. Thanks for the help.

Charter
07-08-2004, 01:13 PM
>> I also started a spider with the exec() command, which ran for a while and then stopped with links still in the temporary table and without unlocking. Most likely this is because I didn't set links per for this spider?

Hi. Hmm, not sure on this one. In spider.php is the following:

if (!isset($linksper) or (int)$linksper > LINKS_MAX_LIMIT) {
if ($run_mode != 'cgi') {
$linksper = RELINKS_LIMIT;
}
else {
$linksper = LINKS_MAX_LIMIT;
}
}

In 1.8.1 the links per is either set via the browser interface or by values in the config file. One of the new tables has links per for each site, but utilizing this table didn't get done for version 1.8.1, so for now, links per will be the same for all sites crawled via shell.

Anyway, back to your exec issue, I'm not sure why the spider quit. Maybe r****m noise, maybe not. Anything in your error logs?

bloodjelly
07-08-2004, 07:11 PM
I ran three execs this time with three different sites, and 2 of the 3 spidered completely without stopping. The third one stopped until I emptied the temporary table, and then it started up again for some reason. I checked the error log but everything looked fine - it simply paused and then picked back up where it left off after I emptied the temp table.

Anyway all three finished perfectly with a little encouragement, so multiple spiders are indeed possible. Thanks!

As for the directory dilemma, I'll post in a new thread where it's more appropriate. Thanks again for the help.

Charter
07-08-2004, 07:17 PM
Hi. Sounds like some r****m noise, maybe too many database connections?

>> it simply paused and then picked back up where it left off after I emptied the temp table.

The tempspider table probably wasn't really empty. It may take a few tries to be sure the table is empty, as info is constantly being placed in the table during spidering.