PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   Troubleshooting (http://www.phpdig.net/forum/forumdisplay.php?f=22)
-   -   spidering does *nothing* ? (http://www.phpdig.net/forum/showthread.php?t=1046)

davenewt 07-12-2004 06:25 AM

Hang on, just tried re-indexing the root and it seems to be working... sorry, not sure how or why, but I'm getting a ton of output, which is a good sign :)

Will report back...

Charter 07-12-2004 06:30 AM

Server error logs... ;)

davenewt 07-12-2004 06:56 AM

Server logs... thought that's what you meant... anyway, seems I had to "Dig This" to add the site to start with, then use the "Update Form" part of the Admin interface and click the green tick icon next to "Root" to re-index.

However...

Doing this initially started to pick up all links to the forum directory, so I added a robots.txt file to exclude this directory.

Now I have discovered the main problem. The only navigation on the site so far is via a dynamic javascript menu which is added to the HTML at runtime. There are no links (except to the aforementioned forums) in the body of the HTML.

How can I spider sub-directories which aren't linked to from the root index file? Do I need to add a line of navigation to the bottom of the index page which will make the spider aware of the subdirectories, or can I tell the spider to look for them some other way?

Almost there :) Thanks so much for the help.

Cheers,
Dave.

Charter 07-12-2004 07:33 PM

>> How can I spider sub-directories which aren't linked to from the root index file?

Check out PhpDig version 1.8.2... :D

davenewt 07-13-2004 12:15 AM

To update from 1.8.0 can I just copy over all the files, or is there a safer process? Just checking in case I'm about to screw up something else :)

Charter 07-13-2004 12:03 PM

Hi. Yes, copy over all the files, add the new tables, and then use the new connect.php and config.php files.

davenewt 07-14-2004 12:40 AM

Okay, am doing that... but I still don't see an easy way to index the entire site when my index.php file contains no static HTML links to my subdirectories' pages.

I found the line:
Code:

define(LIMIT_TO_DIRECTORY,true);        //limit index to directory, no sub dir, set in URL
in config.php which I thought might have something to do with it, but there were no single quotes around the variable name so I added them and changed the line to:
Code:

define('LIMIT_TO_DIRECTORY',false);        //limit index to directory, no sub dir, set in URL
to see if it made any difference, but no.

So even with this latest version, it seems I still need to spider all the subdirectories manually, yes?

davenewt 07-14-2004 12:53 AM

Hmmm, there is no longer anything being put into my text_content directory. Nor is the spidering process picking up on basic links in the index file AGAIN. It seems we're back to the same stage as the last post on page 1 of this discussion. Which magically fixed itself (see first post of page 2) seemingly without me doing anything (that I remember). So I don't know what to do next. Back to square one :(

Charter 07-14-2004 01:02 AM

Hi. Set define('LIMIT_TO_DIRECTORY',true); and then index http://www.domain.com/dir1/dir2/ or some such thing, assuming the page at dir1/dir2/ has links to other pages.

davenewt 07-14-2004 02:02 AM

Ok, now I get:

Code:

SITE : http://knet/
Exclude paths :
- newsboard/
1:http://knet/index.php
(time : 00:00:05)
No link in temporary table

--------------------------------------------------------------------------------

links found : 1
http://knet/index.php
Optimizing tables...
Indexing complete !

BUT there's a link to test.php in there which STILL isn't found (this is the problem I referred to above, which 'magically' fixed itself). Forget about subdirectories for a moment, I need it to start picking up bog-standard links again first :(

Thanks again.

Charter 07-14-2004 02:23 AM

Hi. If you do an update after indexing the main page, does it start up?

davenewt 07-14-2004 02:30 AM

Nope. Nothing. When I click on the Update Form button (after selecting the site, obviously) I get:

Code:

Found tree :
Click on the cross to delete the branch
Click on the green sign to update it
Click on the noway sign to exclude from future indexings
Warning ! Exclude will delete indexed entries


[Back] to admin interface.

(i.e. nothing under "Found Tree:" - no documents, nothing :(

Charter 07-14-2004 02:32 AM

Is http://knet/ pointed to 127.0.0.1 in the Hosts file?

davenewt 07-14-2004 02:44 AM

No, only localhost. Maybe I should spider localhost then (seeing as I'm performing all this on the server)?

Charter 07-14-2004 02:45 AM

Does it work with localhost?


All times are GMT -8. The time now is 10:32 PM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.