![]() |
spidering does *nothing* ?
Hi folks,
I've installed phpdig, solved the logging-in problem with the help of your forums here, but have come across another. Namely, that spidering my site doesn't appear to be working, throwing out any errors, or anything. All I get is: Code:
Spidering in progress... ...and then, nothing. It just sits there. I've waited a while, wondering if my 20-page site I'm testing it with is maybe taking 5mins per page, but no. Still nothing. What gives? Any help gratefully appreciated. Thanks, Dave. |
PS I've tried adding /index.php and just / to the path, but still no joy. Help! :)
|
Hi. In this thread, although a bit dated, there is talk of various issues. Does anything there help?
|
Thanks Charter, I've looked into both those threads, but still no joy. When I comment out the //print $answer line in robot_functions.php I get the following output. Does this shed any light for anyone here?
Code:
Spidering in progress... Thanks, Dave. |
Hi. Is magic_quotes_runtime On or Off?
|
Off. Why?
Thanks for the quick response :-) Dave. |
Just a quoting bug when magic_quotes_runtime is on... :(
Anything showing up in your error logs? |
PS: Not using version 1.8.1? On Win? Set USE_IS_EXECUTABLE_COMMAND to 0 in the config.php file.
|
Thanks, have got a little further now. Changed USE_IS_EXECUTABLE_COMMAND to 0 as you suggested, and now get the following output:
Code:
Spidering in progress... Glad we're getting somewhere though! What do I need to do next, to get it to spider the entire site? Thanks, Dave. |
Hi. Look in the text_content directory, at the file with the highest number. What is in that file, the contents from the main page or something else? If something else, is it showing a 404 message page? If so, then add a robots.txt page to web root and see if it will go.
simple robots.txt file: User-agent: * Disallow: |
Hi. There's only a 1.txt file, and it's the content of the index page. This includes text which is a link in the HTML, so it's obvisously missing something...
Thanks, Dave. |
What's in the file, or at least the suspicious looking piece, and how does it compare to the HTML of the index page?
|
In the HTML:
Code:
<span class="xhead">Latest News <a href="/newsboard/index.php">[View Archive]</a></span> Code:
Latest News [View Archive] |
That's as it should work, strip away the tags and leave the text. PhpDig looks for links prior to that. Is there anything showing in your error logs?
|
I don't see any error log files within the phpdig directories...?
|
Hang on, just tried re-indexing the root and it seems to be working... sorry, not sure how or why, but I'm getting a ton of output, which is a good sign :)
Will report back... |
Server error logs... ;)
|
Server logs... thought that's what you meant... anyway, seems I had to "Dig This" to add the site to start with, then use the "Update Form" part of the Admin interface and click the green tick icon next to "Root" to re-index.
However... Doing this initially started to pick up all links to the forum directory, so I added a robots.txt file to exclude this directory. Now I have discovered the main problem. The only navigation on the site so far is via a dynamic javascript menu which is added to the HTML at runtime. There are no links (except to the aforementioned forums) in the body of the HTML. How can I spider sub-directories which aren't linked to from the root index file? Do I need to add a line of navigation to the bottom of the index page which will make the spider aware of the subdirectories, or can I tell the spider to look for them some other way? Almost there :) Thanks so much for the help. Cheers, Dave. |
>> How can I spider sub-directories which aren't linked to from the root index file?
Check out PhpDig version 1.8.2... :D |
To update from 1.8.0 can I just copy over all the files, or is there a safer process? Just checking in case I'm about to screw up something else :)
|
Hi. Yes, copy over all the files, add the new tables, and then use the new connect.php and config.php files.
|
Okay, am doing that... but I still don't see an easy way to index the entire site when my index.php file contains no static HTML links to my subdirectories' pages.
I found the line: Code:
define(LIMIT_TO_DIRECTORY,true); //limit index to directory, no sub dir, set in URL Code:
define('LIMIT_TO_DIRECTORY',false); //limit index to directory, no sub dir, set in URL So even with this latest version, it seems I still need to spider all the subdirectories manually, yes? |
Hmmm, there is no longer anything being put into my text_content directory. Nor is the spidering process picking up on basic links in the index file AGAIN. It seems we're back to the same stage as the last post on page 1 of this discussion. Which magically fixed itself (see first post of page 2) seemingly without me doing anything (that I remember). So I don't know what to do next. Back to square one :(
|
Hi. Set define('LIMIT_TO_DIRECTORY',true); and then index http://www.domain.com/dir1/dir2/ or some such thing, assuming the page at dir1/dir2/ has links to other pages.
|
Ok, now I get:
Code:
SITE : http://knet/ Thanks again. |
Hi. If you do an update after indexing the main page, does it start up?
|
Nope. Nothing. When I click on the Update Form button (after selecting the site, obviously) I get:
Code:
Found tree : |
Is http://knet/ pointed to 127.0.0.1 in the Hosts file?
|
No, only localhost. Maybe I should spider localhost then (seeing as I'm performing all this on the server)?
|
Does it work with localhost?
|
...nope, that results in exactly the same result. Spiders the page, finds only a link to that page, and no pages (not even Root) is shown on the "update form" page :(
|
...and my command of the English language is also deteriorating rapidly :)
|
Thoughts...
|
nope, nope and nope again :(
I wish I knew what happened to make it work last time! |
Interestingly, using the previous phpdig install (I renamed the directory "phpdig1" when I installed the new version in "phpdig"), things are very different. Using the admin script, spidering localhost, this is what I get:
Code:
SITE : http://localhost/ I can then go back to http://localhost using "Update Form" and lo and behold, a "[x] [tick] Root [arrow]" part appears! When I click on the tick to re-index, it starts picking up the links and spidering properly. Does that give you any clues as to what might be different with the new install? |
All times are GMT -8. The time now is 07:16 PM. |
Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.