PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   Bug Tracker (http://www.phpdig.net/forum/forumdisplay.php?f=27)
-   -   No indexing IIS 6 Win 2003 Server (http://www.phpdig.net/forum/showthread.php?t=80)

Rolandks 09-19-2003 03:44 AM

No indexing IIS 6 Win 2003 Server
 
I spend many time to find out what the problems are with the NEW IIS 6 at Windows 2003 Server.

PHPDIG don“t indexing IIS 6 Websites at the moment.

I also try to index a IIS 6 Sites from a Linux-System - same result. (email me, I sent you the web-page to test it.)

Results of indexing:

### IIS 6 - Log file ####
#Fields: date time c-ip c-session cs(Referer) sc-Protocol sc-uri sc-status
2003-09-18 19:41:27 62.142.48.115 1033 217.160.xx.xx 80 HTTP/1.1 HEAD /robots.txt 400 - BadRequest
2003-09-18 19:41:27 62.141.48.115 1034 217.160.xx.xx 80 HTTP/1.1 HEAD // 400 - BadRequest
2003-09-18 19:41:27 62.141.48.115 1035 217.160.xx.xx 80 HTTP/1.1 HEAD / 400 - BadRequest
2003-09-18 19:41:27 62.141.48.115 1036 1217.160.xx.xx 80 HTTP/1.1 HEAD /robots.txt 400 - BadRequest
op=HEAD arg=http://www.my-domain.de/ result="400 Bad Request"

## Windows 2003 Monitoring ###
<-> Filter: http
----------------------------------
HTTP: HEAD Request from Client
HTTP: Request Method =HEAD
HTTP: Uniform Resource Identifier =//
HTTP: Protocol Version =HTTP/1.1
HTTP: Host =www.my-domain.de
HTTP: Accept = */*
HTTP: Accept-Charset = iso-8859-1
HTTP: Accept-Encoding =identity
HTTP: User-Agent =PhpDig/1.6.2 (PHP; MySql)
------
HTTP: Response to Client; HTTP/1.1; Status Code = 400 - Bad Request
HTTP: Protocol Version =HTTP/1.1
HTTP: Status Code = Bad Request
HTTP: Reason =Bad Request
HTTP: Content-Length =20
HTTP: Content-Type =text/html
HTTP: Connection =close

I will also ask in a Win-Newsgroups to get the reasons for this.

I read some other problems with Error 400: does phpdig use allowed HTTP RFC Commands: see: RFC 2616

-Roland-

Charter 09-19-2003 09:47 AM

Hi. With HEAD [your_site]/robots.txt HTTP/1.1 it produces the following:

Content-Length: 24

The robots.txt file contains the following:
Code:

User-agent: *
Disallow:

What happens if you just delete the robots.txt file?

What do you get?

Rolandks 09-19-2003 10:13 AM

ok is deleted. You can try again. Its just the same in my tests.

-Roland-

Charter 09-19-2003 10:22 AM

Hi. Please can you post the results like you did above? Maybe there will be something in there, or are the results just like those above?

Rolandks 09-19-2003 12:24 PM

Hmm, Monitor-Log is only possible if i start this 2 sec before i dig.

This is wrong - IMHO !!
robot_functions.php Line 286

Code:

  $request =
  "HEAD $path HTTP/1.1\n"
  ."Host: $host$sport\n"
  .$cookiesSendString
  .$auth_string
  ."Accept: */*\n"
  ."Accept-Charset: ".Dig-Spider_ENCODING."\n"
  ."Accept-Encoding: identity\n"
  ."User-Agent: Dig-Spider/".Dig-Spider_VERSION." (PHP; MySql)\n\n";

The Header(lines) of the HEAD Requests are NOT split by CRLF only
with LF ('\n')? LF is wrong in RFC - Each header ends with a CRLF !!

See:
http://www.w3.org/Protocols/rfc2616/...c2.html#sec2.2

Quote:

HTTP/1.1 defines the sequence CR LF as the end-of-line marker for all
protocol elements except the entity-body (see appendix 19.3 for
tolerant applications). The end-of-line marker within an entity-body
is defined by its associated media type, as described in section 3.7.

CRLF = CR LF
-Roland-

Charter 09-19-2003 12:43 PM

Hi. I believe the problem is that the script uses \n and your machine needs \r\n.

Please try this to fix the problem: First make a backup of the robot_functions.php file. Then in robot_functions.php, do the following:
  1. find:

    PHP Code:

    $auth_string 'Authorization: Basic '.base64_encode($components['user'].':'.$components['pass'])."\n"

    and replace with:

    PHP Code:

    $auth_string 'Authorization: Basic '.base64_encode($components['user'].':'.$components['pass'])."\r\n"

  2. find:

    PHP Code:

    $cookiesSendString .= "Cookie: ".$cookieString['string']."\n"

    and replace with:

    PHP Code:

    $cookiesSendString .= "Cookie: ".$cookieString['string']."\r\n"

  3. find:

    PHP Code:

    @ini_set('user_agent','PhpDig/'.PHPDIG_VERSION.' (PHP; MySql)'."\n".phpDigMakeCookies($cookiesToSend,$path)); 

    and replace with:

    PHP Code:

    @ini_set('user_agent','PhpDig/'.PHPDIG_VERSION.' (PHP; MySql)'."\r\n".phpDigMakeCookies($cookiesToSend,$path)); 

  4. find:

    PHP Code:

      $request =
      
    "HEAD $path HTTP/1.1\n"
      
    ."Host: $host$sport\n"
      
    .$cookiesSendString
      
    .$auth_string
      
    ."Accept: */*\n"
      
    ."Accept-Charset: ".PHPDIG_ENCODING."\n"
      
    ."Accept-Encoding: identity\n"
      
    ."User-Agent: PhpDig/".PHPDIG_VERSION." (PHP; MySql)\n\n"

    and replace with:

    PHP Code:

      $request =
      
    "HEAD $path HTTP/1.1\r\n"
      
    ."Host: $host$sport\r\n"
      
    .$cookiesSendString
      
    .$auth_string
      
    ."Accept: */*\r\n"
      
    ."Accept-Charset: ".PHPDIG_ENCODING."\r\n"
      
    ."Accept-Encoding: identity\r\n"
      
    ."User-Agent: PhpDig/".PHPDIG_VERSION." (PHP; MySql)\r\n\r\n"

  5. find:

    PHP Code:

    $req1 "HEAD $path HTTP/1.1\n"
    ."Host: $host$sport\n"
    .$cookiesSendString
    .$auth_string
    ."Accept: */*\n"
    ."Accept-Charset: ".PHPDIG_ENCODING."\n"
    ."Accept-Encoding: identity\n"
    ."User-Agent: PhpDig/".PHPDIG_VERSION." (PHP; MySql)\n\n"

    and replace with:

    PHP Code:

    $req1 "HEAD $path HTTP/1.1\r\n"
    ."Host: $host$sport\r\n"
    .$cookiesSendString
    .$auth_string
    ."Accept: */*\r\n"
    ."Accept-Charset: ".PHPDIG_ENCODING."\r\n"
    ."Accept-Encoding: identity\r\n"
    ."User-Agent: PhpDig/".PHPDIG_VERSION." (PHP; MySql)\r\n\r\n"


I think that's all of them that absolutely need to be changed. I also think you could just do a search and replace, changing all \n to \r\n in the files.

As a general rule of thumb, I believe it's like this for different OS:

Windows uses \r\n
Macintosh uses \r
*nix uses \n

Charter 09-19-2003 12:47 PM

Quote:

Originally posted by Rolandks
Hmm, Monitor-Log is only possible if i start this 2 sec before i dig.

Question? Where is the relevant Line in Spider.php ?

Are the Header(lines) of the HEAD Requests split be CRLF or only
with LF ('\n')? LF is wrong in RFC - Each header ends with a CRLF !!

-Roland-

Ah, I see you were already thinking that. To test, I wrote a script to do a HEAD request on your machine. With only \n I received 400 Bad Request, but with \r\n it worked fine.

Rolandks 09-19-2003 01:40 PM

Thanks :)

I think it should change in the next Version it is conform to RFC - and if users update they can fix this again

I wrote above:

See:
http://www.w3.org/Protocols/rfc2616/...c2.html#sec2.2

Quote:

HTTP/1.1 defines the sequence CR LF as the end-of-line marker for all
protocol elements except the entity-body (see appendix 19.3 for
tolerant applications). The end-of-line marker within an entity-body
is defined by its associated media type, as described in section 3.7.
CRLF= CR LF
Microsoft IIS 6 is designed for NEW Security ;) and they use STRICT RFC and no tolerant applications.

-Roland-


All times are GMT -8. The time now is 08:34 PM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.