PDA

View Full Version : Error creating initial records


dnuttall
10-16-2004, 05:06 AM
I've got what appears to be a VERY basic install of PHPDig 1.8.3 on Fedora Core 2 and Whitebox Enterprise Linux 3.0 and I get the same error on both when trying to index local files that are either HTML or flat ASCII/text.

The errors are as follows:
======== START ERRORS ========
HTTP/1.1 403 Forbidden - http://wbox01.devel.local/phpdig/documentation//
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.

HTTP/1.1 403 Forbidden - http://wbox01.devel.local/phpdig/documentation/
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
SITE : http://wbox01.devel.local/
Exclude paths :
- @NONE@

HTTP/1.1 403 Forbidden - http://wbox01.devel.local/phpdig/documentation/
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
1:http://wbox01.devel.local/phpdig/documentation/
(time : 00:00:05)
No link in temporary table
links found : 1
http://wbox01.devel.local/phpdig/documentation/
Optimizing tables...
Indexing complete !
========= END ERRORS ===========

Am I supposed to modify someithing in the config.php to overcome this?

The sites are trying to run just standard HTTP/port 80. I don't understand what's making it think it needs to look at the raw data with SSL/port 443.

Please "instruct" me on which part of the "fine manual" that will correct my problem.

TIA.

Dave Nuttall
San Antonio, TX

vinyl-junkie
10-16-2004, 06:18 AM
Welcome to the forum, dnuttall. We're glad to have you here. :D

What is the root level you're trying to index? Is this an intranet? Is there something in .htaccess that prevents it from being displayed or indexed? Is it password protected? There are a number of things that could cause the site not to be spidered. More detail on the site would help.

dnuttall
10-16-2004, 06:30 AM
What is the root level you're trying to index?

For all intents/purposes either a folder/directory in the path of the PHPDIG site or an aliased folder/directory to the PHPDIG site.

Is this an intranet? Is there something in .htaccess that prevents it from being displayed or indexed? Is it password protected? There are a number of things that could cause the site not to be spidered. More detail on the site would help.

Yes, basically it is an intranet/development/exploration site. I have one on VMWare and another attempt on a full machine.

There are NO .htaccess files of any type.

Ideally, I'd like to experiment with filesystems that are shared/mounted via SAMBA and aliased to the PHPDIG, but for the moment, I'd just like to see it work in the least complex mode which I assumed would be to spider files that that local to server/machine.

Thanks for thinking about it.
Dave

dnuttall
10-18-2004, 04:16 AM
After tinkering with my configuration, I've been able to make the spider find files that are on a "locall/intranet" type server.

Is there a way to AVOID the (apparent) requirement that there be a URL reference to the file(s) that you want to index?

I'd like to be able to give the spider a folder/directory name and just let it romp and ravage the contents.

I've got a short piece of PHP that I evolved yesterday that can look at a folder and determine what sort of file the spider has found.

If this should be on a different forum/thread, just scream at me.

TIA.
Dave Nuttall
San Antonio, TX