PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   Troubleshooting (http://www.phpdig.net/forum/forumdisplay.php?f=22)
-   -   Some sites won't index (http://www.phpdig.net/forum/showthread.php?t=127)

Charter 10-13-2003 05:23 PM

Quote:

Was recently indexed
Did it index the first time? Are you trying to reindex?

vvvvv 10-13-2003 05:30 PM

>Did it index the first time?

Nope. :( This is all I ever got and I've tried it on several different URLs.

SITE : http://www.mysite.com/
Exclude paths :
- @NONE@
No link in temporary table

mike221 10-13-2003 05:42 PM

I had the same problem but it went away after performing the mods published Here .

My sever (Where phpdig is) : Apache/1.3.28 (Unix) mod_auth_passthrough/1.8 mod_gzip/1.3.26.1a mod_log_bytes/1.2 mod_bwlimited/1.0 PHP/4.3.3 FrontPage/5.0.2.2634 mod_ssl/2.8.15 OpenSSL/0.9.7a on Linux.

I still have problems indexing a couple of servers running Netscape out of 265 servers with all kind of configurations.

Good Luck

vvvvv 10-13-2003 05:58 PM

OK I'll give that a spin. Thanks for the suggestion mike221

Looks like a late night cup of coffee for me. :)

vvvvv 10-13-2003 06:33 PM

OK I did the mods but still the same. :(

Any other ideas? Again much appreciate the help.

rayvd 10-14-2003 07:32 AM

You've probably already checked this... there's no robots.txt file on your server preventing the crawling is there? :)

Tanasja 10-17-2003 03:40 AM

Hi,

Think I have the same problem. The first time the indexing went fine. Then I changed some filenames. When reindexing, the old filenames were taken and the new ones skipped. Also when I index directly indexed the new filename the index couldn't find it.

I read the posts on this item and tried the following things:
- delete en reindex site (several ways)
- delete en reinstalling database
- empty dir text_content (not keepalive) and dir admin/temp (which stayed empty when reindexing)
- change config: LIMIT_DAYS=1 and PHPDIG_DEFAULT_INDEX=false
- run spider.php from a browser

I also see suggestions like:
- lynx from command line
- adjusting the routing table on the machine with the webserver

I don't understand these suggestions. Can anybody exlpain them? Or are there other options left? Maybe useful information: I host my sides at a provider.

Anybody can help?
Greetings from Amsterdam,
Tanasja

Charter 10-17-2003 03:52 PM

Hi vvvvv. Maybe this is a JavaScript issue? Does setting PHPDIG_DEFAULT_INDEX to false have any effect?

Hi Tanasja. Just to be sure, when you say "the old filenames were taken and the new ones skipped" are the new links in the files you are trying to index?

web newsroom 05-19-2004 12:35 PM

Same Problems
 
Im sure that its something quite simple.

I have been able to successfully spider certain sites and currently show the following stats.

Last Run : May 19, 2004
Pages : 5025 Entries
Index : 1397195 Entries
Keywords : 230416 Entries
Temporary : 110440 Entries


However, I still cannot seem to spider our own site.

A qualified subdomain.mydamain.com will work? I have changed the robots.txt and the .htaccess file and still stumped.


All times are GMT -8. The time now is 04:58 PM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.