josegringo
02-17-2005, 12:45 PM
greetings!
I am trying to accomplish the spidering of a site which has an event calendar that I would like to index. This site is not my own, so the usual methods for excluding links do not apply.
My question is this: how hard would it be to add a "do not follow" rule for links? The site I am spidering has a series of links which follow this pattern:
index.htm?cYear=2002
index.htm?cYear=2003
index.htm?cYear=2004
index.htm?cYear=2005
In the interest of not spending a ton of time indexing things that I don't want, and to minimize the intrusion on their site (I am using their bandwidth), I would like to have a rule, perhaps in the config file that basically says: if cYear is not = 2005 then don't follow the link.
Any thoughts on this would be appreciated. I am ok with PHP, but not good enough to dig into the code and find where to hard code the exception statement. Once I knew where the statement went, I could code it though...
Thanks,
-josegringo
I am trying to accomplish the spidering of a site which has an event calendar that I would like to index. This site is not my own, so the usual methods for excluding links do not apply.
My question is this: how hard would it be to add a "do not follow" rule for links? The site I am spidering has a series of links which follow this pattern:
index.htm?cYear=2002
index.htm?cYear=2003
index.htm?cYear=2004
index.htm?cYear=2005
In the interest of not spending a ton of time indexing things that I don't want, and to minimize the intrusion on their site (I am using their bandwidth), I would like to have a rule, perhaps in the config file that basically says: if cYear is not = 2005 then don't follow the link.
Any thoughts on this would be appreciated. I am ok with PHP, but not good enough to dig into the code and find where to hard code the exception statement. Once I knew where the statement went, I could code it though...
Thanks,
-josegringo