Log in

View Full Version : 404 error although page exists


hendrix
02-10-2005, 01:23 PM
Hi,

I've had a problem indexing a particular site (please note that all other sites have been indexed without any problem).

PhpDig v1.8.7 is located at http://www.santeestrie.qc.ca/recherche

I've tried to index http://www.iugs.ca but it always returned a 404 error. So then I tried indexing a file I knew existed (http://www.iugs.ca/FR/100/RH_Recrutement.asp) but it also returned a 404 error:
------------------------------------------------
HTTP/1.1 404 Object Not Found - http://www.iugs.ca/FR/100/RH_Recrutement.asp
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it.
Optimizing tables...
Indexation terminée !
------------------------------------------------

It doesn't matter which page I try to index on this site, it will never work. There's no robot.txt so that's not the problem.

Here are a few of my settings:

- Tried indexing with a depth of 10 and links per set to zero.

define('PHPDIG_IN_DOMAIN',true);
define('SPIDER_MAX_LIMIT',20);
define('RESPIDER_LIMIT',5);
define('LINKS_MAX_LIMIT',20);
define('RELINKS_LIMIT',5);
define('LIMIT_TO_DIRECTORY',false);
define('LIMIT_DAYS',0);

and from phpinfo():

allow_url_fopen = 1
safe_mode = off

Any help would be appreciated

Regards,
Stéphane Brault
eComDEV.com

Charter
02-10-2005, 02:28 PM
What if you try http://www.iugs.ca/FR/100/default.asp in the textbox? :confused:

hendrix
02-11-2005, 04:56 AM
yes, I've also tried that... :bang:

hendrix
02-11-2005, 04:57 AM
in fact, I've tried to index all the links found at http://www.iugs.ca/FR

and tried to add "default.asp" at the end also.

Charter
02-11-2005, 05:46 AM
The only other thing I can think of is that maybe the site dislikes HEAD requests so it returns a 404 Not Found even though GET requests return content.

hendrix
02-11-2005, 09:39 AM
Is there a way to generate a "HEAD" request manually so I can see the server's response? I could open a connexion to the webserver (telnet www.iugs.ca 80) and issue whatever command it takes.

Charter
02-11-2005, 12:21 PM
telnet www.iugs.ca 80
HEAD /FR/100/default.asp HTTP/1.1
Host: www.iugs.ca

However, I no longer think it's a HEAD request issue, as a one-page index produced the following:

Spidering in progress... [Stop spider]
SITE : http://www.iugs.ca/
Exclude paths :
- @NONE@

Wait...
1:http://www.iugs.ca/FR/100/default.asp
(time : 00:00:19)
No link in temporary table
links found : 1
http://www.iugs.ca/FR/100/default.asp
Optimizing tables...
Indexing complete !

So, I don't really know why you'd be getting 404s. :confused:

hendrix
02-11-2005, 03:49 PM
By the way, thank you for your time, it's really appreciated.

Here's what I get when I try to index the same page as you:

---------------------------------------------
HTTP/1.1 404 Object Not Found - http://www.iugs.ca/robots.txt
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it.

HTTP/1.1 404 Object Not Found - http://www.iugs.ca/FR/100/default.asp
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it.
Optimizing tables...
Indexation terminée !
---------------------------------------------

I forgot to mention: I am using IIS on Win 2000 Server. Here's my phpinfo page if that is of any help for you:

http://www.santeestrie.qc.ca/phpinfo.php

I know IIS isn't the best web server but I don't have the choice. Of course, I prefer Apache on Linux over IIS but...

hendrix
02-11-2005, 03:50 PM
by the way, how come your phpdig installation didn't crawl www.iugs.ca and stopped after the first page?

soryr for my english, I usually speak french ;)

hendrix
02-11-2005, 03:55 PM
I've made a small donation through 2Checkout, I think PhpDig is great (and IIS is crap). I have it installed over a few other sites (running Apache on Linux) and never had any problems... except on IIS, as with most php scripts out there.

I have phpdig over there:

http://www.emusicmag.com
http://www.soundfontdepot.com
http://www.mididepot.com

and should soon be there:

http://www.homemusician.net

keep up the good work :)

Charter
02-11-2005, 04:16 PM
Thanks! I want to say that is almost smells like a FP (http://www.phpdig.net/forum/showthread.php?t=190) issue, but I didn't see any FP reference in your phpinfo. If you do a manual HEAD request, does it give a clue? To index just one page, use zero, zero, no in the admin panel.

hendrix
02-14-2005, 08:52 AM
I don't think it's a FP issue since PhpDig and www.iugs.ca are not hosted on the same server.

hendrix
02-14-2005, 07:07 PM
Hey when I try to index using the server's ip address, it works!

Could it be my host that can't resolve www.iugs.ca's ip?

hendrix
02-15-2005, 05:58 AM
Ha! I was right.

I called my provider this morning and www.iugs.ca used to be hosted on their server. They still had an entry in their "hosts" file pointing to the wrong IP. They removed it and it worked instantly.

So I guess that would be a good idea to try spider an IP instead of a full URL when trying to figure out a problem of this nature.