PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 11-19-2004, 12:46 PM   #1
Ensim
Green Mole
 
Join Date: Nov 2004
Posts: 4
Exclamation Spiders site first time, but update doesn't spider

Hello,

I have recently installed PHPDig and it will index my site the first time correctly and works great.

But if I remove a page, then "update sites" in the admin or do a command "forceall" update, nothing is actually updated (takes 0 seconds) and the deleted page is still in the database. My LIMIT_DAYS is 0 in the in config.php.

Here is the output of the command line spidering:

------------- force reindex of site7811: old priority 0, new priority 18
Spidering in progress...
-----------------------------
SITE : http://www.mysite.org/
Exclude paths :
- @NONE@
No link in temporary table
links found : 0
Optimizing tables...
Indexing complete !

Any ideas on this?

If I can't get it to respider the site, then I'll have to find another search solution. I hope someone has an answer though since I already spent time on PHPDig and the search works great after the first spidering of the site.

Thanks,
John
Ensim is offline   Reply With Quote
Old 11-19-2004, 04:37 PM   #2
vinyl-junkie
Purple Mole
 
Join Date: Jan 2004
Posts: 694
Don't know how you tried updating, but try this.

Go into the Admin panel. On the right side of the screen, click to highlight the site you want to update, then click on Update Form. That will take you to a second screen. Click on the green checkmark for whichever branch you want to update. Clicking on the one next to Root will update everything.
vinyl-junkie is offline   Reply With Quote
Old 11-22-2004, 03:32 PM   #3
Ensim
Green Mole
 
Join Date: Nov 2004
Posts: 4
That did it! Kudos to you and the PHPDig people for making a viable free PHP only search solution!

Thanks so much, it updates correctly through the browser now, after I got commented out those "set_time_limit"'s that kept it from updating in php safe mode.

I re-read the docs and still can't get the command line version to work (but now see what I missed conerning using the web admin).

Or I guess I can cron wget to update the site's index via a web call to spider.php?site_id=1&mode=small . Unless you know what is the correct commandline way of doing a site re-indexing?

Here is what I am calling:
/usr/bin/php -f /path.to.it/phpdig/admin/spider.php forceall http://www.oursite.org

But it never indexes anything:
SITE : http://www.oursite.org/
Exclude paths :
- @NONE@
No link in temporary table
links found : 0
Optimizing tables...


Thanks
Ensim is offline   Reply With Quote
Old 11-22-2004, 05:41 PM   #4
vinyl-junkie
Purple Mole
 
Join Date: Jan 2004
Posts: 694
You might want to have a look at the phpdig documentation for command line indexing. If you've followed that and indexing still doesn't work, I don't know what to tell you. I've tried indexing my site via command line, and it's touchy at best.
vinyl-junkie is offline   Reply With Quote
Old 11-23-2004, 08:17 AM   #5
Ensim
Green Mole
 
Join Date: Nov 2004
Posts: 4
Thanks again Vinyl-Junkie. I gave up on the command line version, I've had problems with command line php before though, it doesn't appear to be the same exact php that runs in apache on our server.

For those interested, here is the command line I came up with that works, it uses wget to call the php via apache:

/usr/bin/wget --timeout 3600 --http-user your_htaccess_username --http-passwd your_htaccess_password "http://www.mysite.org/phpdig/admin/update.php?path=&site_id=1&exp=1"
1>/dev/null 2>/dev/null

(where it is all one line with no breaks)

Here's the explanation of the command:

/usr/bin/wget
Replace this with what ever path there is to your wget, find out using "which wget" at the SSH/telnet prompt.

--timeout 3600
this is the timeout in seconds. I removed the sleep(5) in the spider.php, if you keep it there increase this timeout a lot, my update.php call takes 20 seconds so an hour should be plenty for me. I also removed all the "set_timeout_limit" calls from PHPDig since my PHP safe mode gave errors that resulted in "update form" not reindexing the site.

--http-user & --http_passwd
I turned off the PHPDig admin user/pass by updating this in config.php:
define('PHPDIG_ADM_AUTH','0'); Then I password protected the admin directory using apache .htaccess files, and included their username and password as these parameters.

"http://www.mysite.org/phpdig/admin/update.php?path=&site_id=1&exp=1"
This link should work in your browser (after logging in) and force a site reindex. I got this link by selecting my site in the admin and clicking "update form", then copying the link that is there on the "root" green checkbox. I had to include the quotes so wget understood the full link.

1>/dev/null 2>/dev/null
This is optional. Without it, each time this command runs wget will save another copy of the page loaded which you may want as a record of the indexing. By including this, no files are created by wget.

Then you can add this to your cronjob so it can reindex automatically. Hopefully this will help people and be my small contribution back to PHPDig.
Ensim is offline   Reply With Quote
Old 11-24-2004, 10:42 AM   #6
siliconkibou
Green Mole
 
Join Date: Dec 2003
Posts: 11
Thanks for this insight, Ensim!

I never even thought about using the wget command.

Works great!

It should be noted that some cron setups will drop the "http://www.mysite.org/phpdig/admin/update.php?path=&site_id=1&exp=1" due to the double quotes.

Using single quotes like 'http://www.mysite.org/phpdig/admin/update.php?path=&site_id=1&exp=1' seems to work fine in all cases.

Thanks again, and awesome work!
siliconkibou is offline   Reply With Quote
Old 11-30-2004, 01:17 PM   #7
Ensim
Green Mole
 
Join Date: Nov 2004
Posts: 4
Glad I could help Siliconkibou!

Looks like Indeh found a more elegant solution, kudos on that.
Ensim is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
1.8.3 spiders slow, 1.6.3 spiders same site fast Wayne McBryde Troubleshooting 0 09-21-2004 07:10 PM
Spider doesn't work for first time sbrinkmann Script Installation 1 09-07-2004 03:34 PM
QUESTION: How-to Spider Multiple URL's, not just one at a time. 2wheelin How-to Forum 4 06-13-2004 10:42 PM
Spider & time limit onlytrue How-to Forum 1 04-16-2004 05:03 AM
Command Line Spider spiders all sites Wayne McBryde Troubleshooting 3 01-27-2004 05:15 PM


All times are GMT -8. The time now is 10:19 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.