PDA

View Full Version : Spidering issue with my site


pager
01-15-2004, 12:07 PM
Hello, I'm trying to set up phpdig for a web site and I can make it spider other web sites except mine.

I have tried both locally from the command line and remotely from another server.

Any time I try to spider it the web page freezes for about 30 seconds after I click on the "Dig This!" button and then goes to the result page with:

Spidering in progress...

SITE : http://dev.videx.com/
Exclude paths :
- @NONE@
No link in temporary table


links found : 0
...Was recently indexed
Optimizing tables...
Indexing complete !
[Back] to admin interface.

The site is, if you didn't notice ;) , dev.videx.com and I have managed to spider other servers in our domain (like www.videx.com).

I have removed the robots.txt file from the site but still have a .htaccess restricting use of the /search folder, but otherwise the site is a basic CSS / php based one on a Mac OS X 10.3 server and I am using phpdig version 1.6.2.

I have modified my config file to not search through .css files, but still no luck.

Any suggestions?

pager
01-16-2004, 01:56 PM
Anyone? Anyone? Bueler?

Well, I've done some more searching and it turns out that the spidering will hang on any Mac OS X 10.3 site that I configure (including a default site with one web page!).

It works fine spidering Mac OS X 10.2 servers, however, so I think it has something to do with the Apache config on the server.

The site that I can't get phpdig to spider is http://dev.videx.com/ and it is running with the following config:

OS: Mac OS X 10.3
Apache: 1.3.28
PHP: 4.3.2
phpdig: 1.6.2

I have tried turning on error logging for php, but it never creates the file. My php.ini file is:

include_path=".:/Library/WebServer/php"
log_errors = On
error_log = ".:/Library/WebServer/log.txt"
error_reporting = E_ALL

Feel free to attempt to spider http://dev.videx.com/ and let me know if it works :)

Charter
01-18-2004, 08:28 AM
Hi. Below are the results at search depth one for http://dev.videx.com/ - When you try to crawl this site, what shows up in your Apache log files?

links found : 17
http://dev.videx.com/
http://dev.videx.com/favicon.ico
http://dev.videx.com/index.html
http://dev.videx.com/products/index.html
http://dev.videx.com/News/index.html
http://dev.videx.com/about/index.html
http://dev.videx.com/products/downloads/manuals/accesscontrol/cyberaudit_manual.pdf
http://dev.videx.com/products/support.html
http://dev.videx.com/products/download.html
http://dev.videx.com/products/listing.html
http://dev.videx.com/news/tradeshows.html
http://dev.videx.com/news/careers.html
http://dev.videx.com/map.html
http://dev.videx.com/news/press.html
http://dev.videx.com/news/studies.html
http://dev.videx.com/about/privacy.html
http://dev.videx.com/about/contact.html
Optimizing tables...
Indexing complete !

pager
01-19-2004, 09:23 AM
I cleared my apache logs, restarted it, and ran an index. Here are the results in the log files:

access log:

12.17.172.219 - - [19/Jan/2004:09:12:30 -0800] "GET / HTTP/1.1" 200 7404


error log:

Processing config directory: /etc/httpd/sites/*.conf
Processing config file: /etc/httpd/sites/0000_any_80_.conf
Processing config file: /etc/httpd/sites/virtual_host_global.conf
[Mon Jan 19 09:11:22 2004] [notice] Apache/1.3.28 (Darwin) PHP/4.3.2 configured -- resuming normal operations
[Mon Jan 19 09:11:22 2004] [notice] Accept mutex: flock (Default: flock)

It doesn't look very helpful to me.

I still can't index the site from other Mac 10.3 servers. I timed the delay between when I click on the "Dig this!" button and when the spider page comes up with 0 results, and it is about 3 minutes and 20 seconds.

pager
01-19-2004, 10:13 AM
Well, I just updated my phpdig to 1.6.5 and tried out indexing the site.

It works up to a point with the web interface and then gives me the following message from the web browser:

Could not open the page “http://12.17.172.219/phpdig1/admin/spider.php” after trying for 60 seconds.

All the pages that it indexes up to that point are fine. I am going to try it from the command line, where the timeout should not apply.

pager
01-19-2004, 11:05 AM
Everything is working fine now with phpdig 1.6.5 - apparently there was something in the php code in 1.6.2 that was causing a problem.

So, in case anyone wants to know, phpdig 1.6.5 works on Mac OS 10.3.