![]() |
404 error although page exists
Hi,
I've had a problem indexing a particular site (please note that all other sites have been indexed without any problem). PhpDig v1.8.7 is located at http://www.santeestrie.qc.ca/recherche I've tried to index http://www.iugs.ca but it always returned a 404 error. So then I tried indexing a file I knew existed (http://www.iugs.ca/FR/100/RH_Recrutement.asp) but it also returned a 404 error: ------------------------------------------------ HTTP/1.1 404 Object Not Found - http://www.iugs.ca/FR/100/RH_Recrutement.asp See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation. 404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it. Optimizing tables... Indexation terminée ! ------------------------------------------------ It doesn't matter which page I try to index on this site, it will never work. There's no robot.txt so that's not the problem. Here are a few of my settings: - Tried indexing with a depth of 10 and links per set to zero. define('PHPDIG_IN_DOMAIN',true); define('SPIDER_MAX_LIMIT',20); define('RESPIDER_LIMIT',5); define('LINKS_MAX_LIMIT',20); define('RELINKS_LIMIT',5); define('LIMIT_TO_DIRECTORY',false); define('LIMIT_DAYS',0); and from phpinfo(): allow_url_fopen = 1 safe_mode = off Any help would be appreciated Regards, Stéphane Brault eComDEV.com |
What if you try http://www.iugs.ca/FR/100/default.asp in the textbox? :confused:
|
yes, I've also tried that... :bang:
|
in fact, I've tried to index all the links found at http://www.iugs.ca/FR
and tried to add "default.asp" at the end also. |
The only other thing I can think of is that maybe the site dislikes HEAD requests so it returns a 404 Not Found even though GET requests return content.
|
Is there a way to generate a "HEAD" request manually so I can see the server's response? I could open a connexion to the webserver (telnet www.iugs.ca 80) and issue whatever command it takes.
|
Code:
telnet www.iugs.ca 80 Spidering in progress... [Stop spider] SITE : http://www.iugs.ca/ Exclude paths : - @NONE@ Wait... 1:http://www.iugs.ca/FR/100/default.asp (time : 00:00:19) No link in temporary table links found : 1 http://www.iugs.ca/FR/100/default.asp Optimizing tables... Indexing complete ! So, I don't really know why you'd be getting 404s. :confused: |
By the way, thank you for your time, it's really appreciated.
Here's what I get when I try to index the same page as you: --------------------------------------------- HTTP/1.1 404 Object Not Found - http://www.iugs.ca/robots.txt See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation. 404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it. HTTP/1.1 404 Object Not Found - http://www.iugs.ca/FR/100/default.asp See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation. 404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it. Optimizing tables... Indexation terminée ! --------------------------------------------- I forgot to mention: I am using IIS on Win 2000 Server. Here's my phpinfo page if that is of any help for you: http://www.santeestrie.qc.ca/phpinfo.php I know IIS isn't the best web server but I don't have the choice. Of course, I prefer Apache on Linux over IIS but... |
by the way, how come your phpdig installation didn't crawl www.iugs.ca and stopped after the first page?
soryr for my english, I usually speak french ;) |
I've made a small donation through 2Checkout, I think PhpDig is great (and IIS is crap). I have it installed over a few other sites (running Apache on Linux) and never had any problems... except on IIS, as with most php scripts out there.
I have phpdig over there: http://www.emusicmag.com http://www.soundfontdepot.com http://www.mididepot.com and should soon be there: http://www.homemusician.net keep up the good work :) |
Thanks! I want to say that is almost smells like a FP issue, but I didn't see any FP reference in your phpinfo. If you do a manual HEAD request, does it give a clue? To index just one page, use zero, zero, no in the admin panel.
|
I don't think it's a FP issue since PhpDig and www.iugs.ca are not hosted on the same server.
|
404 error although page exists Reply to Thread: SOLVED!
Hey when I try to index using the server's ip address, it works!
Could it be my host that can't resolve www.iugs.ca's ip? |
Ha! I was right.
I called my provider this morning and www.iugs.ca used to be hosted on their server. They still had an entry in their "hosts" file pointing to the wrong IP. They removed it and it worked instantly. So I guess that would be a good idea to try spider an IP instead of a full URL when trying to figure out a problem of this nature. |
All times are GMT -8. The time now is 10:22 AM. |
Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.