PDA

View Full Version : Spidering multiple URL's


2wheelin
05-22-2004, 06:51 PM
I need a search/spider program to index subject specific web sites.

As user RaGe outlined below, phpDig is not able to index multiple URL's as far as I can see. Why not add this ability and make it an effective Web Search Engine?

BTW- phpDig is a GREAT script for a single URL (site search), the best I have found! Add the URL feature and phpDig will be the best of both worlds.

Thus far i've seen the spider functions only deal with spidering a particular site and returning only results within the spidered URL. An option that would allow the Admin to ignore the base URL and return only links to external URL's would allow for spidering of a link farm site or links page and harvesting the links back into PhP dig. For example:

I built a cgi engine and have tons of links indexed on it, if i use PhP dig to try to spider the links from the original engine, it returns MY url links instead of ignoring base url and spidering the external links at a depth of 1. Thus it is a URL harvester spider rather than just a site spider.

My cgi engine does this with the greatest of ease, i can spider a particular directory of DMOZ and bring back only the links and their relative URLS. If someone out there (in the Mole Squad) is proficient at both PhP and CGI i'd be willing to make my engine available and perhaps we can cross the spider functions into PhP dig and save some raw coding time for all.

It also features admin features for visitor added URL's that can be directly edited rather than just spidered. At this time i see no way of editing spidered or user submitted urls without doing such at an SQL level which might also be a useful PhPdig function to consider.