PDA

View Full Version : New Exclude Option


josegringo
02-17-2005, 12:45 PM
greetings!

I am trying to accomplish the spidering of a site which has an event calendar that I would like to index. This site is not my own, so the usual methods for excluding links do not apply.

My question is this: how hard would it be to add a "do not follow" rule for links? The site I am spidering has a series of links which follow this pattern:

index.htm?cYear=2002
index.htm?cYear=2003
index.htm?cYear=2004
index.htm?cYear=2005

In the interest of not spending a ton of time indexing things that I don't want, and to minimize the intrusion on their site (I am using their bandwidth), I would like to have a rule, perhaps in the config file that basically says: if cYear is not = 2005 then don't follow the link.

Any thoughts on this would be appreciated. I am ok with PHP, but not good enough to dig into the code and find where to hard code the exception statement. Once I knew where the statement went, I could code it though...

Thanks,
-josegringo

Charter
02-17-2005, 01:16 PM
config file...

FORBIDDEN_EXTENSIONS in PhpDig < 1.8.8
FORBIDDEN in PhpDig 1.8.8+

http://www.phpdig.net/forum/showthread.php?t=1684

josegringo
02-17-2005, 02:48 PM
Charter,

Thanks for quickly setting me in the right direction. I spent a good part of the day running in circles. I guess I just wasn't searching for the right term in the forum.

Cheers,
-Joey