PDA

View Full Version : Spidering....links found : 0


-IAN-
04-29-2004, 11:06 AM
Hey, I finally got the database created and connected. It was a pain because our Administrator wont allow php to do file uploads, read, or write after we got hacked 3 weeks ago. The install code relies on fopen and writing or creating the file. I finaly got it to work though.
But now when I go to spider our site I just get the following:

Spidering in progress...

--------------------------------------------------------------------------------
SITE : http://clarknexsen/
Exclude paths :
- @NONE@
No link in temporary table

--------------------------------------------------------------------------------

links found : 0
...Was recently indexed
Optimizing tables...
Indexing complete !
--------------------------------------------------------------------------------
[Back] to admin interface.

I noticed that in other posts youve suggested changing the robots.txt file to:

User-agent: *
Disallow: /go.php

Now where exactly do I find the "robots.txt" file? Do I need to contect my administrator for it (I access the server remotely)?
thanks!

vinyl-junkie
04-29-2004, 06:29 PM
What value does 'LIMIT_DAYS' have in your config.php file? If it's set to the default value of 7 and it's been fewer than 7 days since you tried to re-spider your site, this is probably why nothing is being indexed.

Here (http://www.robotstxt.org/wc/exclusion.html) is a page that tells you all about the robots.txt file. I don't think that's where your problem is with phpDig, but you'd do well to familiarize yourself with that anyway.


BTW, welcome to the forum. We're glad you decided to join us. :)

-IAN-
04-30-2004, 07:09 AM
It is seven, I can change it to zero. I looked in the website directory (like the page you linked recommended ex:http://www.foobar.com/robots.txt) and there wasnt a robots.txt file. Should there be one?

Note: I just retried it with the LIMIT_DAYS set to zero, and I still got the same result.....

vinyl-junkie
04-30-2004, 07:22 PM
Originally posted by -IAN-
It is seven, I can change it to zero. I looked in the website directory (like the page you linked recommended ex:http://www.foobar.com/robots.txt) and there wasnt a robots.txt file. Should there be one? Not necessarily. It's really only needed if you want portions of your site not to be spidered.
Note: I just retried it with the LIMIT_DAYS set to zero, and I still got the same result..... What search depth did you choose? If you left the default at zero, only the root will be indexed. That might possibly be the problem.

-IAN-
05-11-2004, 06:57 AM
Nope tried it again just to be sure with a spidering depth of 20 and still got the same result.

Are there any files that need fwrite or fopen for this to work? Maybe that is the sorce of the problem?...?

..sorry for the late reply

-Jonathan

vinyl-junkie
05-11-2004, 06:51 PM
If this is an internet site, perhaps posting the link would help.

Regarding your question about which files need write permission, check out the documentation here (http://www.phpdig.net/navigation.php?action=doc#toc4), as there is a discussion on which directories need to have write access.

-IAN-
05-12-2004, 07:16 AM
Thanks Pat! I am checking with the network administrator about making those directories writable....

-Jonathan

-IAN-
05-18-2004, 10:45 AM
Okay, he made the writable but I still get nothing...any ideas?

its actually an intranet site.

vinyl-junkie
05-18-2004, 06:18 PM
If you're sure you have the proper directory permissions set, make sure all your database tables are empty. Also, make sure you have LIMIT_DAYS set to zero in the config file. Then try spidering again.

If that still doesn't work, post any error messages you're getting.

roger
05-21-2004, 03:22 PM
I had similar problems, among others (:-), still a newbie. To solve this I had to delete the site and try again as I had spidered with 0 levels and couldn't re-spider with other levels etc.

iankim
08-05-2004, 05:47 PM
i'm having the exact same problem:
-- i don't have a robots.txt file anywhere.
-- limit_days is set to zero
-- unless i delete the site first, spidering returns the same message that others are getting:

links found : 0
...Was recently indexed
Optimizing tables...
Indexing complete !
--------------------------------------------------------------------------------
[Back] to admin interface.

vinyl-junkie
08-05-2004, 07:28 PM
Welcome to the forum, iankim. :D

What search depth and "links per" did you choose? If the search depth was zero, all you're going to get is the starting page if this is the first time you're indexing the site. A "links per" depth of zero means to check for all links at each seach depth.

Hope this helps. :)

iankim
08-06-2004, 12:41 AM
thanks for your response, and for your welcome! :-)

i set search depth to 3 or 4, usually (but i tried a range of numbers)

i set links per to 0, usually (but i tried a range of different numbers for this, too)

i'm sure my problem is not related to this.

vinyl-junkie
08-06-2004, 04:30 AM
Does your server run in safe mode? If so, check out this thread (http://www.phpdig.net/showthread.php?threadid=221).

rispbiz
08-24-2004, 11:46 AM
I seem to be having the vary same problem.

I had tried everything posted in this thread and still same thing.

I have no problem indexing most websites but a few come up with this problem.

Spidering in progress...

--------------------------------------------------------------------------------
SITE : http://www.hotdial.net/
Exclude paths :
-
- @NONE@
No link in temporary table

--------------------------------------------------------------------------------

links found : 0
...Was recently indexed
Optimizing tables...
Indexing complete !


I had tried changing the config file, deleting the site, all different levels and links.

Tried having webmaster for the site to delete the robots.txt file and tried adding the allow to the robots text file. also checked permissions.

I also have XAV se and it indexes this site without error.

There only a few sites that I seem to have this problem with. Strangly enough one of the others off hand is http://www.hotmail.com

HUH is it possiable the hot has anything to do with it?

I am stumped!

Oh I am running newest version 1.8.3

Thanks
2-surf.net

Dave A
08-24-2004, 12:30 PM
links found : 0
...Was recently indexed
Optimizing tables...
Indexing complete !

When I get this I usually delete the site from the admin page and then go back and respider it using different settings. Most times it works a treat.

But I am a new boy ay using this software...

vinyl-junkie
08-24-2004, 02:31 PM
There only a few sites that I seem to have this problem with. Strangly enough one of the others off hand is http://www.hotmail.comI'd be willing to bet that hotmail.com won't allow you to index them. There's a thread here in the forum about something similar, but I'm unable to find it at the moment.

rispbiz
08-24-2004, 04:08 PM
That could be with hotmail.com but the result is the same with www.hotdial.net which does allow indexing and it is like the spider doesnt even try indexing the url.

Spidering in progress...

--------------------------------------------------------------------------------
SITE : http://www.hotdial.net/
Exclude paths :
-
- @NONE@
No link in temporary table

--------------------------------------------------------------------------------

links found : 0
...Was recently indexed
Optimizing tables...
Indexing complete !

If possiable could someone try indexing this site with there phpdig and let me know if they have a problem indexing it. Then I would know that if another phpdig site cant index it then it would more than likley be a problem with the url rather than an issue with my spider.

If other phpdig se can index it then I would have to figure out where the problem is in my engine.

Thanks for any help.
2-surf.net

vinyl-junkie
08-24-2004, 04:53 PM
That could be with hotmail.com but the result is the same with www.hotdial.net which does allow indexing and it is like the spider doesnt even try indexing the url.I don't know about that. I got the same results as you. I've been helping in the forum for quite a while, and usually if a site can be indexed, I won't have a problem doing so.

rispbiz
08-24-2004, 05:06 PM
I will work with the webmaster and see if I can get him to change the index page, and then try to reindex. Maybe there is something on the page that php dig doesn't like.

Thank you for trying the url for me and quick responses.

With both of us not being able to index the site leaves a lot of questions. Is it the site or something to do with phpdig. HUH???

Thank You
2-surf.net

vinyl-junkie
08-24-2004, 06:15 PM
With both of us not being able to index the site leaves a lot of questions. Is it the site or something to do with phpdig. HUH???I'm betting it's a problem with the site.

Good luck on getting it solved!

rispbiz
08-25-2004, 10:45 AM
Here is what the problem was, Which makes no sense.

The website had a robot.txt file in the that only had this line.

Disallow: 4.15.191.215

I had webmaster remove the robots.txt and it indexed fine.

How come the Disallow: is causing a problem with the sipder?

Thank You,
2-Surf.net

Charter
08-25-2004, 11:44 AM
Here is what the problem was, Which makes no sense.

The website had a robot.txt file in the that only had this line.

Disallow: 4.15.191.215

I had webmaster remove the robots.txt and it indexed fine.

How come the Disallow: is causing a problem with the sipder?

Thank You,
2-Surf.net

That is not standard robots.txt format.

vinyl-junkie
08-25-2004, 05:53 PM
I think they might have been trying to ban someone from visiting their site. Trouble is, you don't do that via robots.txt. You do it via .htaccess.