PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 08-10-2004, 12:28 PM   #1
slintz
Green Mole
 
Join Date: Aug 2004
Posts: 4
speciffically slow spidering at fgets()

I've read the other posts re: slow spidering behavior and found nothing matching my situation. Please help!

After inserting traces and such into the code, I've found a consistent delay of 10 - 15 seconds for each page being indexed which occurs across a specific function call:

Code:
FILE:      robot_functions.php 
FUNCTION:  phpdigGetUrl()
STATEMENT: $answer = fgets($fp,8192);
(I've reformatted the code substantially, so I can't provide a specific line number. The fgets() occurs close to the top of the while (!$stop && !feof($fp)) { ... })

Code:
OS:      Win 2000
HTTPD:   Apache 2.0.49 (Win32)
PHP:     5.0.0
MYSQL:   4.1.3b-beta
PHPDIG:  1.8.3
As a speed check, I ran wget (cygwin) to mirror a piece of my own local site to my own drive. PhpDig took about 4 minutes to index what wget did in less than 10. Although they do different things, spidering and wget'ing are very similar which indicates that a 25:1 timing differential should not be expected...

Thanks much!
slintz is offline   Reply With Quote
Old 08-10-2004, 12:32 PM   #2
slintz
Green Mole
 
Join Date: Aug 2004
Posts: 4
PS - One more helpful(?) bit of info: while PhpDig spidering is going on, I've watched my CPU activity which is mostly nothing, with occasional spikes (every 10 - 15 seconds, BTW). To me, this points to a timeout issue - but I don't know where / what layer to consider. (Also, I've reduced all PhpDig sleeps to 1 or 2 seconds and this is NOT the problem at all). Thanks again!
slintz is offline   Reply With Quote
Old 08-10-2004, 05:08 PM   #3
vinyl-junkie
Purple Mole
 
Join Date: Jan 2004
Posts: 694
Are you able to spider from shell? That might be a way around the problem.
vinyl-junkie is offline   Reply With Quote
Old 08-10-2004, 06:37 PM   #4
slintz
Green Mole
 
Join Date: Aug 2004
Posts: 4
Vinyl J -

Good idea (and it made me solve some incidental installation problems), yet no go (i.e. same problem and with harder-to-read output <lol>). Anyway, as I mentioned above, the wget mirroring program doesn't have any trouble like this - it's quite zippy! That points away from the httpd software / configuration. It has all the smell of a communication timeout issue, but how do I investigate beyond the sticking fgets() ?
slintz is offline   Reply With Quote
Old 08-15-2004, 03:05 PM   #5
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Can't say I've experienced fgets problems. Perhaps something here might help?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 08-16-2004, 01:17 PM   #6
slintz
Green Mole
 
Join Date: Aug 2004
Posts: 4
Well, I've exactly found the problem: the code doesn't respect the Content-Length header (or when chunked, the chunk sizes). Thus, it will always attempt an over-read. I suppose on some configurations that doesn't make a difference, but on mine it surely does! I've fully solved the problem in the test script and partially moved that solution into my own PhpDig code. If anyone cares to know more, get in touch...

Cheers!
slintz is offline   Reply With Quote
Old 08-17-2004, 03:02 PM   #7
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Will you post your mod in the Mod Submissions forum?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 08-18-2004, 02:24 AM   #8
jinkas
Green Mole
 
Join Date: Jul 2004
Posts: 8
just to throw in my two cents worth...

i'm already communicating with slintz, but this isn't a problem specific only to him...the exact same thing happens to me when i try and spider my site...i always get between 10-15 seconds (sometimes up to 20) of delay / page

here is my server info:

OS: Solaris 5.8
PHP: 4.3.8
Apache: 2.0.50
MySQL: 4.0.13
PhpDig: 1.8.3

yes, i realize that some of those are older versions, but i have no control over that...i just write the webpages
jinkas is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Fix for slow spidering in PhpDig 1.8.x vital Bug Tracker 3 11-06-2004 10:33 AM
Indexing slow.... no, _really_ slow bluntman Troubleshooting 1 09-24-2004 01:23 PM
Fix timeouts at fgets() jinkas Mod Requests 0 08-25-2004 02:02 PM
Spidering **VERY** Slow Niall Fernie Troubleshooting 4 07-13-2004 12:45 AM
Very Slow Indexing airplay Troubleshooting 2 03-09-2004 02:20 PM


All times are GMT -8. The time now is 04:35 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.