PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   Troubleshooting (http://www.phpdig.net/forum/forumdisplay.php?f=22)
-   -   404 error although page exists (http://www.phpdig.net/forum/showthread.php?t=1833)

hendrix 02-10-2005 01:23 PM

404 error although page exists
 
Hi,

I've had a problem indexing a particular site (please note that all other sites have been indexed without any problem).

PhpDig v1.8.7 is located at http://www.santeestrie.qc.ca/recherche

I've tried to index http://www.iugs.ca but it always returned a 404 error. So then I tried indexing a file I knew existed (http://www.iugs.ca/FR/100/RH_Recrutement.asp) but it also returned a 404 error:
------------------------------------------------
HTTP/1.1 404 Object Not Found - http://www.iugs.ca/FR/100/RH_Recrutement.asp
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it.
Optimizing tables...
Indexation terminée !
------------------------------------------------

It doesn't matter which page I try to index on this site, it will never work. There's no robot.txt so that's not the problem.

Here are a few of my settings:

- Tried indexing with a depth of 10 and links per set to zero.

define('PHPDIG_IN_DOMAIN',true);
define('SPIDER_MAX_LIMIT',20);
define('RESPIDER_LIMIT',5);
define('LINKS_MAX_LIMIT',20);
define('RELINKS_LIMIT',5);
define('LIMIT_TO_DIRECTORY',false);
define('LIMIT_DAYS',0);

and from phpinfo():

allow_url_fopen = 1
safe_mode = off

Any help would be appreciated

Regards,
Stéphane Brault
eComDEV.com

Charter 02-10-2005 02:28 PM

What if you try http://www.iugs.ca/FR/100/default.asp in the textbox? :confused:

hendrix 02-11-2005 04:56 AM

yes, I've also tried that... :bang:

hendrix 02-11-2005 04:57 AM

in fact, I've tried to index all the links found at http://www.iugs.ca/FR

and tried to add "default.asp" at the end also.

Charter 02-11-2005 05:46 AM

The only other thing I can think of is that maybe the site dislikes HEAD requests so it returns a 404 Not Found even though GET requests return content.

hendrix 02-11-2005 09:39 AM

Is there a way to generate a "HEAD" request manually so I can see the server's response? I could open a connexion to the webserver (telnet www.iugs.ca 80) and issue whatever command it takes.

Charter 02-11-2005 12:21 PM

Code:

telnet www.iugs.ca 80
HEAD /FR/100/default.asp HTTP/1.1
Host: www.iugs.ca

However, I no longer think it's a HEAD request issue, as a one-page index produced the following:

Spidering in progress... [Stop spider]
SITE : http://www.iugs.ca/
Exclude paths :
- @NONE@

Wait...
1:http://www.iugs.ca/FR/100/default.asp
(time : 00:00:19)
No link in temporary table
links found : 1
http://www.iugs.ca/FR/100/default.asp
Optimizing tables...
Indexing complete !

So, I don't really know why you'd be getting 404s. :confused:

hendrix 02-11-2005 03:49 PM

By the way, thank you for your time, it's really appreciated.

Here's what I get when I try to index the same page as you:

---------------------------------------------
HTTP/1.1 404 Object Not Found - http://www.iugs.ca/robots.txt
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it.

HTTP/1.1 404 Object Not Found - http://www.iugs.ca/FR/100/default.asp
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it.
Optimizing tables...
Indexation terminée !
---------------------------------------------

I forgot to mention: I am using IIS on Win 2000 Server. Here's my phpinfo page if that is of any help for you:

http://www.santeestrie.qc.ca/phpinfo.php

I know IIS isn't the best web server but I don't have the choice. Of course, I prefer Apache on Linux over IIS but...

hendrix 02-11-2005 03:50 PM

by the way, how come your phpdig installation didn't crawl www.iugs.ca and stopped after the first page?

soryr for my english, I usually speak french ;)

hendrix 02-11-2005 03:55 PM

I've made a small donation through 2Checkout, I think PhpDig is great (and IIS is crap). I have it installed over a few other sites (running Apache on Linux) and never had any problems... except on IIS, as with most php scripts out there.

I have phpdig over there:

http://www.emusicmag.com
http://www.soundfontdepot.com
http://www.mididepot.com

and should soon be there:

http://www.homemusician.net

keep up the good work :)

Charter 02-11-2005 04:16 PM

Thanks! I want to say that is almost smells like a FP issue, but I didn't see any FP reference in your phpinfo. If you do a manual HEAD request, does it give a clue? To index just one page, use zero, zero, no in the admin panel.

hendrix 02-14-2005 08:52 AM

I don't think it's a FP issue since PhpDig and www.iugs.ca are not hosted on the same server.

hendrix 02-14-2005 07:07 PM

404 error although page exists Reply to Thread: SOLVED!
 
Hey when I try to index using the server's ip address, it works!

Could it be my host that can't resolve www.iugs.ca's ip?

hendrix 02-15-2005 05:58 AM

Ha! I was right.

I called my provider this morning and www.iugs.ca used to be hosted on their server. They still had an entry in their "hosts" file pointing to the wrong IP. They removed it and it worked instantly.

So I guess that would be a good idea to try spider an IP instead of a full URL when trying to figure out a problem of this nature.


All times are GMT -8. The time now is 05:58 AM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.