View Single Post
Old 07-29-2004, 06:49 PM   #1
jinkas
Green Mole
 
Join Date: Jul 2004
Posts: 8
Question PhpDig "clipping" links while spidering

Ok, I consider myself to be moderately skilled at PHP, but this is something that I just don't understand. As PhpDig spiders my site, it looks for links that are clipped versions of links that are all ready there. (This additional processing really slows the script down.) I have attached the results from the most recent spidering so that you can all see and maybe help. Unfortunately, this is still a test site and for security reasons it is only open to employees of the place where I work until we solve some authorization issues (in other words, you can't go see the code to see why everything is happening); however, I can assure you that the links that PhpDig is trying to follow show up neither in the source code nor in the generate HTML code (the entire site is dynamic).

Anyway, on with the problem...

In the txt file (and all references are to the txt file), the first error of this kind shows up in the first two 404 errors after spidered page #3. http://uuu.cae.wisc.edu/si does not exist, but is a part of uuu.cae.wisc.edu/site, which is the entryway into the rest of the site. Similar errors appear in the last two 404 errors of spidered page #3 (should be /wikiutils/), the first two 404s of page #5 (again, should be /site/), the first two 404s of page #7 (should be /site/public/), the 404 of page #11 (should be .php), the 404s in the middle of page 15 (should be /help/h****uts/, not /help/han), and in many, many other places. In fact, in the final results over 50+ clipped links were "found." (it is a Wiki-based system, and all pages that don't exist give you a dynamically generate error page offering to help you create as a new page the page you have requested).

I know that I've been a little verbose, but the final site will contain 8000+ pages and I would like to be able to squash this error. I just can't figure it out! Could someone please help me? Thank you so much!

-jinkas

P.S. - I cut a chunk out of the middle of the file to make it the right size for uploading. You can see at the end that the clipping seems to happen with a greater and greater frequency (every 404 from at least page 201 to 297 is caused by this link clipping)

P.P.S. - It doesn't seem like the link clipping causes PhpDig to skip real links; all the real links seem to be spidered. It just makes it go much slower.

Last edited by jinkas; 07-29-2004 at 06:52 PM.
jinkas is offline   Reply With Quote