PDA

View Full Version : Newbie


bheyse
01-09-2006, 09:23 AM
I’m really new to using PhpDig... and I can't figure out what I’m doing wrong. I installed the application through Plesk, so I’m sure it's installed right. The only problem I saw at first is that everything was installed to http, while my entire website is actually at https (I didn't think this would be too much of a problem though). But I can't get anything to work. Whenever I try and index https://www.ihrimjournal.com (with search depth of 50, and links per at 0) I get this:

SITE : https://www.ihrimjournal.com/
Exclude paths :
- @NONE@
1:https://www.ihrimjournal.com/
(time : 00:00:05)

No link in temporary table
________________________________________
links found : 1
https://www.ihrimjournal.com/
Optimizing tables...
Indexing complete !

Why is it only finding one link?? Does it have to do with the fact that this is a subscription based site? Most of the pages aren’t accessible till you are logged in. The site is an online magazine, and I need PhpDig to index a bunch of pages from an archives folder so that users can search for past articles?

What am I doing wrong?

Dave A
01-09-2006, 12:17 PM
Well if I were you it may be worth checking that the temp directory in the admin directory has a chmod setting of 777 that allows reading , writing permission I have found that sometimes this may be set up to 755 which can stop the temp folder being written too which appears to stop the temp files being written.
If you try a deep dig at your website with settings at 10 levels, ten links this sometimes gets the robot into it.

I hope this helps you out.
Heaps of regards
Dave A

bheyse
01-12-2006, 08:15 AM
No, that's not it.. but i came across this:

PhpDig can spider sites served on another port other than the default 80 but spidering 443 https:// may be met with limited success.

Could that be my problem since pretty much my whole website is on the https:// side? I'm not even sure what exactly that sentance means? Does anyone know what it's saying exactly?

Dave A
01-12-2006, 08:26 AM
Hi one thing that you may wish to try is to index say ten levels and ten links per level.
Sometimes on a new index, setting the links to zero (All links found) will stop the spider from indexing and a light indexing at first go often gets it done.
Updates can be made to go deeper by altering the
"Use values from Update sites table if present and use
default values if values absent from table?" which is on the admin panel.

Hope this helps..

bheyse
01-12-2006, 10:36 AM
That won't work either... I still get only one link indexed. Just cause I was curious I tried to index one of my other sites to see if it would work… and it works fine. So I know it has to be something with the site. The only differences between the two sites is A.) the first one is on the secure side of the server, and B.) it’s also a password protected (via php) site…

I read somewhere about needing to provide a robots.txt with a username and password so that it could index pages that were protected, but I thought that was refering to sites protected via .htaccess (which I didn't think i was using??)…. And also what I said above about sites that started with https:// (which mine does) and having limited success… Does anyone know if either of these two things could be my problem.. and if so, how to fix it???

Thanks!