PDA

View Full Version : ARGH - Indexing only ever gets 1 link! HELP


DanBUK
10-23-2003, 08:00 AM
Hi,
I have installed phpdig onto my server, but i cannot seem to figure out why its not indexing.
I have checked mysql privs, file privs and looked at the HTTP headers that are sent, but they do look correct. All i get from any domain is the following:

Spidering in progress...
SITE : http://freebox.mine.nu/
Exclude paths :
- @NONE@
1:http://freebox.mine.nu/
(time : 00:00:00)
No link in temporary table
links found : 1
http://freebox.mine.nu/
Optimizing tables...
Indexing complete ! [Back] to admin interface.

Help

Regards,
Daniel.

Rolandks
10-23-2003, 09:48 AM
Originally posted by DanBUK
... my server ...

And now we all must advise what operating System and which PHP Version you use on "your server" :D

Please post OS and PHP-Version - it is important !

DanBUK
10-23-2003, 09:57 AM
Sorry I havent slept in 36 hours...

Running:
Vanilla Kernel 2.4.22
Apache 2.0.47
MySQL 4.0.13-r4
PHP 4.3.3-r2

I have also added the l_time FLOAT into the db.
And rectified the \n -> \r\n for the headers

Cheers,
Dan.

DanBUK
10-24-2003, 08:08 AM
Also, I have testing on many different domains. The ones that are hosted on the same box as the phpdig will due to my bind setup be told the internal IP not the global one, so thats not causing it.
Im really confused as to why its not working...

Charter
10-24-2003, 05:23 PM
Hi. Does anything in this (http://www.phpdig.net/showthread.php?threadid=127) thread help?

DanBUK
10-25-2003, 03:51 AM
I'd allready read that one, that was why I pointed out that Bind(DNS Server) has two "views", if an external ip askes about my domain, it gets the 'Real' ip, if an internal ip asks about the domain it get the internal 192.168.x.x address.
I have tried setting PHPDIG_DEFAULT_INDEX as well.

I'm lost!

Charter
11-12-2003, 09:16 AM
Oops, this thread got lost.

Has anything changed?

DanBUK
11-12-2003, 09:23 AM
No change, ive had a bit of a fiddle further to no joy. :(

I ended up installing htdig instead.

Charter
11-12-2003, 09:41 AM
Hi. I can't see to duplicate the problem. I crawled your site at level one and found nine links. Did anything in this (http://www.phpdig.net/showthread.php?threadid=73) thread help?

schade
11-13-2003, 04:36 AM
Same problem here :confused:

Im running PHP Version 4.3.2
BUT Server API is CGI (Could this be a problem?)
And running on an AIX.

The output is:

Spidering in progress...

--------------------------------------------------------------------------------
SITE : http://www.kvis.org/
Exclude paths :
- @NONE@
No link in temporary table

--------------------------------------------------------------------------------

links found : 0
...Was recently indexed
Optimizing tables...
Indexing complete !
--------------------------------------------------------------------------------
[Back] to admin interface.

And I've testet tons of sizes - same result.

Can I turn some kind of debugging on?

Charter
11-14-2003, 06:22 PM
Hi. There seems to be a problem with PhpDig and CGI mode, but I currently don't have access to PHP in CGI mode. If someone out there with PHP in CGI mode, who is having problems like those posted above, can offer access, then I could try to locate the problem.

Charter
11-15-2003, 03:24 PM
Hi. To schade: I purchased a hosting account that runs PHP in CGI mode, and I also set an open_basedir restriction. I was able to crawl several sites using this account without incident, except for your site where I received the same results as you did. Now I no longer think that there is a CGI mode problem. Rather, it seems the PhpDig problem is related to something site specific. Are you able to setup a plain demo page without JavaScript on your account and crawl it?

schade
11-19-2003, 02:32 AM
Hi,

I just created a simple test-page

http://www.kvis.org/test/

But I still have the same problem.

schade
11-19-2003, 06:04 AM
I've been digging into robot_functions.php and found the reason to my errror fsockopen() fails.

This small program demonstrates the error:

<?php
$fp = fsockopen ("www.schade.dk", 80, $errno, $errstr, 30);
if (!$fp) {
echo "$errstr ($errno)<br>\n";
} else {
fputs ($fp, "GET / HTTP/1.0\r\nHost: www.schade.dk \n\r\n");
while (!feof($fp)) {
echo fgets ($fp,128);
}
fclose ($fp);
}
?>


Returning this errormessage:

----

Warning: fsockopen(): php_network_getaddresses: getaddrinfo failed: System error returned in errno (is your IPV6 configuration correct? If this error happens all the time, try reconfiguring PHP using --disable-ipv6 option to configure) in /home/www/php/test.php on line 2

Warning: fsockopen(): unable to connect to www.schade.dk:80 in /home/www/php/test.php on line 2
No such file or directory (2)

----

hmmm, searching the net I found:

http://bugs.php.net/bug.php?id=11058

... thats all for now, but I'll keep digging :-)

Charter
11-19-2003, 09:15 AM
Hi. When I run your snippet, I get the following output:

HTTP/1.1 200 OK
Date: Wed, 19 Nov 2003 17:12:24 GMT
Server: Apache/1.3.27 (Unix) (Red-Hat/Linux) Chili!Soft-ASP/3.6.2
FrontPage/4.0.4.3 mod_auth_pgsql/0.9.12
Last-Modified: Tue, 11 Nov 2003 13:23:10 GMT
ETag: "1ba40ca-1177-3fb0e2be"
Accept-Ranges: bytes
Content-Length: 4471
Connection: close
Content-Type: text/html