PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   Troubleshooting (http://www.phpdig.net/forum/forumdisplay.php?f=22)
-   -   Spidering....links found : 0 (http://www.phpdig.net/forum/showthread.php?t=889)

-IAN- 04-29-2004 10:06 AM

Spidering....links found : 0
 
Hey, I finally got the database created and connected. It was a pain because our Administrator wont allow php to do file uploads, read, or write after we got hacked 3 weeks ago. The install code relies on fopen and writing or creating the file. I finaly got it to work though.
But now when I go to spider our site I just get the following:

Spidering in progress...

--------------------------------------------------------------------------------
SITE : http://clarknexsen/
Exclude paths :
- @NONE@
No link in temporary table

--------------------------------------------------------------------------------

links found : 0
...Was recently indexed
Optimizing tables...
Indexing complete !
--------------------------------------------------------------------------------
[Back] to admin interface.

I noticed that in other posts youve suggested changing the robots.txt file to:

User-agent: *
Disallow: /go.php

Now where exactly do I find the "robots.txt" file? Do I need to contect my administrator for it (I access the server remotely)?
thanks!

vinyl-junkie 04-29-2004 05:29 PM

What value does 'LIMIT_DAYS' have in your config.php file? If it's set to the default value of 7 and it's been fewer than 7 days since you tried to re-spider your site, this is probably why nothing is being indexed.

Here is a page that tells you all about the robots.txt file. I don't think that's where your problem is with phpDig, but you'd do well to familiarize yourself with that anyway.


BTW, welcome to the forum. We're glad you decided to join us. :)

-IAN- 04-30-2004 06:09 AM

It is seven, I can change it to zero. I looked in the website directory (like the page you linked recommended ex:http://www.foobar.com/robots.txt) and there wasnt a robots.txt file. Should there be one?

Note: I just retried it with the LIMIT_DAYS set to zero, and I still got the same result.....

vinyl-junkie 04-30-2004 06:22 PM

Quote:

Originally posted by -IAN-
It is seven, I can change it to zero. I looked in the website directory (like the page you linked recommended ex:http://www.foobar.com/robots.txt) and there wasnt a robots.txt file. Should there be one?
Not necessarily. It's really only needed if you want portions of your site not to be spidered.
Quote:

Note: I just retried it with the LIMIT_DAYS set to zero, and I still got the same result.....
What search depth did you choose? If you left the default at zero, only the root will be indexed. That might possibly be the problem.

-IAN- 05-11-2004 05:57 AM

Nope tried it again just to be sure with a spidering depth of 20 and still got the same result.

Are there any files that need fwrite or fopen for this to work? Maybe that is the sorce of the problem?...?

..sorry for the late reply

-Jonathan

vinyl-junkie 05-11-2004 05:51 PM

If this is an internet site, perhaps posting the link would help.

Regarding your question about which files need write permission, check out the documentation here, as there is a discussion on which directories need to have write access.

-IAN- 05-12-2004 06:16 AM

Thanks Pat! I am checking with the network administrator about making those directories writable....

-Jonathan

-IAN- 05-18-2004 09:45 AM

Okay, he made the writable but I still get nothing...any ideas?

its actually an intranet site.

vinyl-junkie 05-18-2004 05:18 PM

If you're sure you have the proper directory permissions set, make sure all your database tables are empty. Also, make sure you have LIMIT_DAYS set to zero in the config file. Then try spidering again.

If that still doesn't work, post any error messages you're getting.

roger 05-21-2004 02:22 PM

I had similar problems, among others (:-), still a newbie. To solve this I had to delete the site and try again as I had spidered with 0 levels and couldn't re-spider with other levels etc.

iankim 08-05-2004 04:47 PM

having same problem -- can't spider without deleting first
 
i'm having the exact same problem:
-- i don't have a robots.txt file anywhere.
-- limit_days is set to zero
-- unless i delete the site first, spidering returns the same message that others are getting:

Quote:

links found : 0
...Was recently indexed
Optimizing tables...
Indexing complete !
--------------------------------------------------------------------------------
[Back] to admin interface.

vinyl-junkie 08-05-2004 06:28 PM

Welcome to the forum, iankim. :D

What search depth and "links per" did you choose? If the search depth was zero, all you're going to get is the starting page if this is the first time you're indexing the site. A "links per" depth of zero means to check for all links at each seach depth.

Hope this helps. :)

iankim 08-05-2004 11:41 PM

'search depth' and 'links per' not related to the problem
 
thanks for your response, and for your welcome! :-)

i set search depth to 3 or 4, usually (but i tried a range of numbers)

i set links per to 0, usually (but i tried a range of different numbers for this, too)

i'm sure my problem is not related to this.

vinyl-junkie 08-06-2004 03:30 AM

Does your server run in safe mode? If so, check out this thread.

rispbiz 08-24-2004 10:46 AM

Same problem!
 
I seem to be having the vary same problem.

I had tried everything posted in this thread and still same thing.

I have no problem indexing most websites but a few come up with this problem.

Spidering in progress...

--------------------------------------------------------------------------------
SITE : http://www.hotdial.net/
Exclude paths :
-
- @NONE@
No link in temporary table

--------------------------------------------------------------------------------

links found : 0
...Was recently indexed
Optimizing tables...
Indexing complete !


I had tried changing the config file, deleting the site, all different levels and links.

Tried having webmaster for the site to delete the robots.txt file and tried adding the allow to the robots text file. also checked permissions.

I also have XAV se and it indexes this site without error.

There only a few sites that I seem to have this problem with. Strangly enough one of the others off hand is http://www.hotmail.com

HUH is it possiable the hot has anything to do with it?

I am stumped!

Oh I am running newest version 1.8.3

Thanks
2-surf.net


All times are GMT -8. The time now is 05:17 PM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.