![]() |
|
![]() |
#1 |
Green Mole
Join Date: Oct 2003
Posts: 2
|
1.8.5 won't spider
I just downloaded phpDig 1.8.5 and installed it, and am having some troubles getting it to spider my site.
I had installed ver 1.6.2 and had it working well before, but I decided to start over as if it were a new install, as my databases are pretty small. I downloaded 1.8.5 into a directory on my server at /var/www/coreinitiative/htdocs/search. I created a new phpDig DB in MySQL called 'phpdig-ci' and a MySQL user 'phpdig-ci-user' and gave it a password and privilidges over the DB. I chown -R the directories text_content/, include/ and admin/temp/ to the Apache user and group. I edited include/config.php to change the admin password and made ABSOLUTE_SCRIPT_PATH equal '/var/www/coreinitiative/htdocs/search'. Running admin/install.php seems to work fine, and doesn't give me any errors. The tables are created. I can access admin/index.php and enter the URI of my site, http://www.coreinitiative.org. It seems to correctly find and read the index page, but doesn't find any links. It only finds one page. Running the spider from the command line gives: Code:
www:/var/www/coreinitiative/htdocs/search# php4 -f admin/spider.php http://www.coreinitiative.org 3499: old priority 0, new priority 18 Spidering in progress... ----------------------------- SITE : http://www.coreinitiative.org/ Exclude paths : - @NONE@ XDuplicate of an existing document 1:http://www.coreinitiative.org/ (time : 00:00:06) No link in temporary table links found : 1 Optimizing tables... Indexing complete ! www:/var/www/coreinitiative/htdocs/search# Thanks for your help and suggestions. -Kevin |
![]() |
![]() |
![]() |
#2 |
Purple Mole
Join Date: Jan 2004
Posts: 694
|
Hi, Kevin. PhpDig works differently now than it did with 1.6.2. If you want it to spider more than just the one link, set a non-zero value for "links per" and "search depth." That should take care of your problem.
|
![]() |
![]() |
![]() |
#3 |
Green Mole
Join Date: Dec 2004
Posts: 5
|
![]()
I'm having nearly the same trouble as Kevinz. I was also running phpdig1.6.2 before. I tried an update (new files onto old ones, update templates, connect,php, etc.) in my localhost (wampp win2k pro, apache 1.3.26, php 4.3.1, mysql 3.23 I think....) before uploading it to the right place in the server for production, to spider my website on a linux server, as always was with phpdig 1.6.2, ... but it didn't work. It finds only one link, and I tried many combinations of depth X * Y links per page (0, 0; 3m 3; 10, 10; 0,10; 10, 0)... but no way.
![]() Btw, It says something about excluding .*.php and *.php3. All my site is created through php3 files, but so it was before with phpdig 1.6.2... I couldn't find where to set up the exclusion list (and delete just in case .*.php and .*.php3). I saw a table in the ddbb, but it was empty ... (???) Just in case, I also tried a blank new installation (with new database installation, just in case there was some trouble with update process on files opr ddbb), but results where the same. So far, I've requested the sysadmin to completely delete our phpdig 1.6.2, but I'd like to able to include a new search engine there.... And I like phpdig a lot.... The URL of my site: http://estel.bib.ub.es/ecolo/ (search disabled :-( until I see phpdig 1.8.5 or higher to work fine again as always with this site I administer) Hints welcome (I'm not computer scientist) And thanks for all your hard and nice work with phpdig! ;-) Last edited by Xavi; 12-14-2004 at 01:06 AM. Reason: completing information (2) |
![]() |
![]() |
![]() |
#4 |
Purple Mole
Join Date: Jan 2004
Posts: 694
|
Xavi, did you read my post in this thread? Did you try what I said to do?
|
![]() |
![]() |
![]() |
#5 |
Green Mole
Join Date: Oct 2003
Posts: 2
|
THANKS!
vinyl-junkie, thanks, that worked. Note that it took two tries, however. First, I just reindexed the site and increased the links and depth both to 20. This didn't work; I got the same result. I then deleted the site entirely and re-input the URL, with the depth and links set to 20. Now, it's going to town indexing the pages. It's been running 20 minutes now, and is on the 127th page.
Note that most of my content is .php files, just like Xavi. I, too, thought that this was the problem and didn't know where to set the inclusion, or turn off the exclusion. However, it doesn't seem to be necessary; I'm indexing the php files just fine, it seems. So, '0' is no longer a code for 'unlimited' depth or links? Is there a code? How do I index all the links on pages with more than 20 links on them? Thanks, again, so much for your help with my problem. -Kevin |
![]() |
![]() |
![]() |
#6 |
Green Mole
Join Date: Dec 2004
Posts: 5
|
Hi Vinyl-Junkie:
Yes, I had read your message, tried what you suggested, and reported previously what I got (did you read all my message?). I've tried again. Same results.This time tired the combination of 20 "earch depth"and 20 "inks per". Strings are in Catalan, but structures of answers is the same as in English. I tell phpdig to dig this: http://estel.bib.ub.es/ecolo or http://estel.bib.ub.es/ecolo/index.php3?lg=en (because I added the var lg in the code, to make it compatible with the language var in the whole site) And the results: --- SITE : http://estel.bib.ub.es/ Exclou les rutes : - cgi-bin/ - .*.php - .*.php3 1:http://estel.bib.ub.es/ecolo/ (temps : 00:00:11) No existeix l'enllaƧ a la taula temporal enllaƧos trobats : 1 http://estel.bib.ub.es/ecolo/ Optimizing tables... Indexat complet! [Enrere] a la pĆ*gina d'administració. --- Any ideas of what can be wrong? And by the way, where can I define or reset the exclusion paths, just in case? Thanks, Xavier |
![]() |
![]() |
![]() |
#7 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Look in the config and set LIMIT_TO_DIRECTORY to false. The LIMIT_TO_DIRECTORY set to true makes it so that only links in that directory get indexed.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
![]() |
![]() |
![]() |
#8 |
Green Mole
Join Date: Dec 2004
Posts: 5
|
![]()
Hi Charter. I changed LIMIT_TO_DIRECTORY to true, but same results while trying to spider http://estel.bib.ub.es/ecolo
I've rechecked pages from my site to see if there was some exclude tag, but there are not. What does "no link in temporary table mean? Is it any clue? Can somebody try to spider my site, to see if there is a problem with the info in my site??? (it worked fine when digged by phpdig 1.6.2...) Thanks for your nice software, and for your support . Xavi --- SITE : http://estel.bib.ub.es/ Exclude paths : - cgi-bin/ - .*.php - .*.php3 1:http://estel.bib.ub.es/ecolo/ (time : 00:00:09) No link in temporary table links found : 1 http://estel.bib.ub.es/ecolo/ Optimizing tables... Indexing complete ! [Back] to admin interface. |
![]() |
![]() |
![]() |
#9 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
First, go to http://estel.bib.ub.es/robots.txt and edit the robots.txt file:
Code:
# remove these two lines Disallow: *.php Disallow: *.php3
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
![]() |
![]() |
![]() |
#10 |
Green Mole
Join Date: Dec 2004
Posts: 5
|
![]()
Thanks, Charter, that was it!
![]() In short, my sysadmin will have the phpdig (1.8.6) back again as our search engine. Cheers, thanks for the support again, and Merry Christmas ![]() Xavier |
![]() |
![]() |