PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 12-13-2004, 09:16 AM   #1
kevinz
Green Mole
 
Join Date: Oct 2003
Posts: 2
1.8.5 won't spider

I just downloaded phpDig 1.8.5 and installed it, and am having some troubles getting it to spider my site.

I had installed ver 1.6.2 and had it working well before, but I decided to start over as if it were a new install, as my databases are pretty small. I downloaded 1.8.5 into a directory on my server at /var/www/coreinitiative/htdocs/search. I created a new phpDig DB in MySQL called 'phpdig-ci' and a MySQL user 'phpdig-ci-user' and gave it a password and privilidges over the DB. I chown -R the directories text_content/, include/ and admin/temp/ to the Apache user and group. I edited include/config.php to change the admin password and made ABSOLUTE_SCRIPT_PATH equal '/var/www/coreinitiative/htdocs/search'.

Running admin/install.php seems to work fine, and doesn't give me any errors. The tables are created.

I can access admin/index.php and enter the URI of my site, http://www.coreinitiative.org. It seems to correctly find and read the index page, but doesn't find any links. It only finds one page.

Running the spider from the command line gives:
Code:
www:/var/www/coreinitiative/htdocs/search# php4 -f admin/spider.php http://www.coreinitiative.org
3499: old priority 0, new priority 18
Spidering in progress...
-----------------------------
SITE : http://www.coreinitiative.org/
Exclude paths :
- @NONE@
XDuplicate of an existing document
1:http://www.coreinitiative.org/
(time : 00:00:06)
No link in temporary table
links found : 1
Optimizing tables...
Indexing complete !
www:/var/www/coreinitiative/htdocs/search#
Any suggestions on what I'm doing wrong? My problems seem similar to other posting here that talk about not spidering all of a site.

Thanks for your help and suggestions.

-Kevin
kevinz is offline   Reply With Quote
Old 12-13-2004, 06:13 PM   #2
vinyl-junkie
Purple Mole
 
Join Date: Jan 2004
Posts: 694
Hi, Kevin. PhpDig works differently now than it did with 1.6.2. If you want it to spider more than just the one link, set a non-zero value for "links per" and "search depth." That should take care of your problem.
vinyl-junkie is offline   Reply With Quote
Old 12-14-2004, 12:56 AM   #3
Xavi
Green Mole
 
Join Date: Dec 2004
Posts: 5
Unhappy Very similar but a little difference on exclusion list

I'm having nearly the same trouble as Kevinz. I was also running phpdig1.6.2 before. I tried an update (new files onto old ones, update templates, connect,php, etc.) in my localhost (wampp win2k pro, apache 1.3.26, php 4.3.1, mysql 3.23 I think....) before uploading it to the right place in the server for production, to spider my website on a linux server, as always was with phpdig 1.6.2, ... but it didn't work. It finds only one link, and I tried many combinations of depth X * Y links per page (0, 0; 3m 3; 10, 10; 0,10; 10, 0)... but no way.

Btw, It says something about excluding .*.php and *.php3. All my site is created through php3 files, but so it was before with phpdig 1.6.2...
I couldn't find where to set up the exclusion list (and delete just in case .*.php and .*.php3). I saw a table in the ddbb, but it was empty ... (???)

Just in case, I also tried a blank new installation (with new database installation, just in case there was some trouble with update process on files opr ddbb), but results where the same.

So far, I've requested the sysadmin to completely delete our phpdig 1.6.2, but I'd like to able to include a new search engine there.... And I like phpdig a lot....

The URL of my site:

http://estel.bib.ub.es/ecolo/

(search disabled :-( until I see phpdig 1.8.5 or higher to work fine again as always with this site I administer)

Hints welcome (I'm not computer scientist)

And thanks for all your hard and nice work with phpdig! ;-)

Last edited by Xavi; 12-14-2004 at 01:06 AM. Reason: completing information (2)
Xavi is offline   Reply With Quote
Old 12-14-2004, 03:16 AM   #4
vinyl-junkie
Purple Mole
 
Join Date: Jan 2004
Posts: 694
Xavi, did you read my post in this thread? Did you try what I said to do?
vinyl-junkie is offline   Reply With Quote
Old 12-14-2004, 05:42 AM   #5
kevinz
Green Mole
 
Join Date: Oct 2003
Posts: 2
THANKS!

vinyl-junkie, thanks, that worked. Note that it took two tries, however. First, I just reindexed the site and increased the links and depth both to 20. This didn't work; I got the same result. I then deleted the site entirely and re-input the URL, with the depth and links set to 20. Now, it's going to town indexing the pages. It's been running 20 minutes now, and is on the 127th page.

Note that most of my content is .php files, just like Xavi. I, too, thought that this was the problem and didn't know where to set the inclusion, or turn off the exclusion. However, it doesn't seem to be necessary; I'm indexing the php files just fine, it seems.

So, '0' is no longer a code for 'unlimited' depth or links? Is there a code? How do I index all the links on pages with more than 20 links on them?

Thanks, again, so much for your help with my problem.

-Kevin
kevinz is offline   Reply With Quote
Old 12-14-2004, 11:37 AM   #6
Xavi
Green Mole
 
Join Date: Dec 2004
Posts: 5
Hi Vinyl-Junkie:

Yes, I had read your message, tried what you suggested, and reported previously what I got (did you read all my message?).

I've tried again. Same results.This time tired the combination of 20 "earch depth"and 20 "inks per". Strings are in Catalan, but structures of answers is the same as in English.
I tell phpdig to dig this:

http://estel.bib.ub.es/ecolo

or

http://estel.bib.ub.es/ecolo/index.php3?lg=en
(because I added the var lg in the code, to make it compatible with the language var in the whole site)

And the results:

---
SITE : http://estel.bib.ub.es/
Exclou les rutes :
- cgi-bin/
- .*.php
- .*.php3
1:http://estel.bib.ub.es/ecolo/
(temps : 00:00:11)

No existeix l'enllaƧ a la taula temporal


enllaƧos trobats : 1

http://estel.bib.ub.es/ecolo/
Optimizing tables...
Indexat complet!
[Enrere] a la pĆ*gina d'administraciĆ³.
---

Any ideas of what can be wrong?

And by the way, where can I define or reset the exclusion paths, just in case?

Thanks, Xavier
Xavi is offline   Reply With Quote
Old 12-14-2004, 11:52 AM   #7
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Look in the config and set LIMIT_TO_DIRECTORY to false. The LIMIT_TO_DIRECTORY set to true makes it so that only links in that directory get indexed.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-15-2004, 08:30 AM   #8
Xavi
Green Mole
 
Join Date: Dec 2004
Posts: 5
Cool I did, but no change, so far...

Hi Charter. I changed LIMIT_TO_DIRECTORY to true, but same results while trying to spider http://estel.bib.ub.es/ecolo

I've rechecked pages from my site to see if there was some exclude tag, but there are not.

What does "no link in temporary table mean? Is it any clue?
Can somebody try to spider my site, to see if there is a problem with the info in my site??? (it worked fine when digged by phpdig 1.6.2...)

Thanks for your nice software, and for your support .

Xavi
---

SITE : http://estel.bib.ub.es/
Exclude paths :
- cgi-bin/
- .*.php
- .*.php3
1:http://estel.bib.ub.es/ecolo/
(time : 00:00:09)

No link in temporary table
links found : 1
http://estel.bib.ub.es/ecolo/
Optimizing tables...
Indexing complete ! [Back] to admin interface.
Xavi is offline   Reply With Quote
Old 12-15-2004, 08:42 AM   #9
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
First, go to http://estel.bib.ub.es/robots.txt and edit the robots.txt file:
Code:
# remove these two lines
Disallow: *.php
Disallow: *.php3
Next, set search depth to a large number, links per to zero, and LIMIT_TO_DIRECTORY to false, and try an index.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-17-2004, 02:43 AM   #10
Xavi
Green Mole
 
Join Date: Dec 2004
Posts: 5
Thumbs up It worked finally! :-)

Thanks, Charter, that was it!
In short, my sysadmin will have the phpdig (1.8.6) back again as our search engine.
Cheers, thanks for the support again, and Merry Christmas
Xavier
Xavi is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump


All times are GMT -8. The time now is 09:45 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.