PDA

View Full Version : Index/Spider looping


renehaentjens
02-12-2004, 03:58 AM
I managed to get indexing/spidering into a loop. In a very small branch of the site, with no links to elsewhere, it kept repeating the same URL over and over again, with a red cross in front and "Was recently indexed" at the end.

When I stopped the process, the site was locked (of course), and MySQL gave error 145 on table tempspider: cant open file tempspider.MYI.

I had to delete the table and create it back with phpMyAdmin.

As I have long URLs in the site, I first thought that this might have caused the problem. Why, by the way, are fields 'file' and 'path' limited to 127 chars in the spider table? That is not going to be enough for my site! Can't they be TEXT fields like in tempspider?

Anyway, my long URLs are not yet long enough, I have around 50-60 characters in path and file currently.

So, something else must have been the cause of this looping.

Charter
02-13-2004, 11:19 AM
Hi. Without running a test index on the small branch of your site, you may find the following article useful.

http://www.databasejournal.com/features/mysql/article.php/3300511

You may also find the below links useful.

http://www.faqts.com/knowledge_base/view.phtml/aid/329
http://www.mysql.com/doc/en/Choosing_types.html

renehaentjens
02-16-2004, 06:10 AM
Thanks, Charter, for not giving up on educating me. The articles that you refer to are always interesting and to the point.

My summary on VARCHAR vs. TEXT:
VARCHAR: max 255, no loss of space, trailing spaces removed;
TEXT: in most respects like unlimited VARCHAR, but no DEFAULT value and sorting only uses the first 1024 chars.

My summary on URL length: no limit specified in RFC 1738, in Windows products upto 2083 chars (max32://max2048).

As there is no loss of space anyway, may I suggest to change VARCHARs to 255 in the next release, except where for some reason you count on a truncation to a smaller size?

I understand that checking all possible side-effects of DB corruptions in the PHP code would be a major coding effort.