PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   Troubleshooting (http://www.phpdig.net/forum/forumdisplay.php?f=22)
-   -   Some sites won't index (http://www.phpdig.net/forum/showthread.php?t=127)

jalerta 10-05-2003 03:25 PM

Some sites won't index
 
Hi All,

I have installed PHPDig-1.6.2 on a Redhat Linux 8.1 server running Apache 2.0 and MySQL version 3.23.56 with PHP 4.2.2.

I am having problems with some sites not indexing and just giving me the following message.

SITE : http://www.somedomain.com/
Exclude paths :
- @NONE@
No link in temporary table

--------------------------------------------------------------------------------

links found : 0
...Was recently indexed
Optimizing tables...
Indexing complete !


I am sure that there are more than 10 links on the index.html page of this site, but still nothing.

On other domains on this server PHPDig works correctly.

Can anyone give me any idea as to what is happening?

Thanks in advance.


Jeff

Charter 10-05-2003 03:41 PM

Hi. Did you previously index the sites recently, or are the sites like http://www.domain.com/dirone/index.php and http://www.domain.com/dirtwo/index.php? You can change the reindex timeframe with define('LIMIT_DAYS',7); in the config file.

jalerta 10-05-2003 06:36 PM

Thanks for the reply.

I have been trying to get it to work with that specific domain and have read other posts here about problems with reindexing a recently indexed site.

So, I have repeatedly deleted the MySQL database and re-installed it using the install.php script.

I am only indexing from the top level directory using "www.domainname1.com" and "www.domainname2.com.

I have also tried "www.domainname.com/index.html" without any success.

I have tried indexing 3 domains on the same server. Only one indexed. The other 2, including the domain that I really what to index, did not.

Both domains gave the same message listed in the post above.


Jeff

Charter 10-05-2003 06:49 PM

Hi. To start over and index from scratch, do the following:
  1. empty all the PhpDig database tables
  2. delete all files that may be in the temp dir
  3. delete all files in the text_content dir except keepalive.txt
  4. run spider.php from a browser or command prompt
Before running spider.php from the command prompt, in the config file, change the following to one like so, if only one level is wanted:
PHP Code:

define('SPIDER_MAX_LIMIT',1);
define('SPIDER_DEFAULT_LIMIT',1);
define('RESPIDER_LIMIT',1); 

Also, in the config file, change the following to one like so, if more frequent reindexing is wanted:
PHP Code:

define('LIMIT_DAYS',1); 

Emptying the database tables is part of the process to restart from scratch. The files in the text_content directory also need to be deleted, except for the keepalive.txt file.

jalerta 10-05-2003 08:04 PM

I followed your instructions but still nothing.

The message this time was:

2935: old priority 0, new priority 18
Spidering in progress...
-----------------------------
SITE : http://www.somedomain.com/
Exclude paths :
- @NONE@
No link in temporary table
links found : 0
...Was recently indexed
Optimizing tables...
Indexing complete !

Just to recap the installation instructions so I am sure that I got everything right ...

I unTARed the phpdig files into a temp directory and then copied all the files into the www.somedomain.com/search directory.

I changed the permissions on the admin/temp, includes and text_content directories to 777 to allow write access to everyone. ( Security issue that I will worry about when I get PHPDig running )

I copied the _connect.php file to connect.php and edited it to add the MySQL hostname, username, password and database name. I cleared the PHPDIG_DB_PREFIX field.

I then ran the install.php file from a web browser ( although at first it complained about not finding the init_db.sql file, which I then copied to the admin directory).

Once the database was created and the tables were installed I tried to index www.somedomain.com with on success.

Was there anything else that I was supposed to do? Am I missing any permissions or something?

Any other suggestions?


Thanks for the help.


Jeff

Charter 10-06-2003 02:53 PM

Hi. That sounds correct. What type of files are you trying to index: *.asp, *.shtml, etcetera? Do you notice if indexing works on some file types but not others?

jalerta 10-06-2003 03:58 PM

I am trying to index plain .html files.

I have done some more tests and I have tried to index 10 different virtual domain sites that reside on my server.

I have discovered that of the 10 sites I tried to index only 1 site worked. 9 sites would not index.

Looking furthur, I discovered that the only site that would index was a site that had moved to another provider.

The directory structure and files for the web site still resided on my server but the DNS now points to another server.

All the other virtual domains that I tried to index had DNS entries that pointed to my server IP address.

Does this tell you anything?

Jeff

Charter 10-06-2003 04:36 PM

Can you try lynx from command line instead? An example is in this thread.

jalerta 10-06-2003 06:41 PM

I tried using Lynx, with no success.

Lynx would just sit there saying "Making HTTP connection to www.somedomain.com".

I was wondering if the issue in this case could be that the web server is behind a NAT'ed firewall?

Also, the web sites are on the same machine as the DNS service.

So, on the internal network the server has an IP address, for example, of 10.1.1.100. However, in the DNS the domain has an IP address of 123.123.123.1.

In this case, Lynx is trying to open the web site that DNS says is at 123.123.123.1, while the server that the web site is really on is at 10.1.1.100. So no connection can be established.

Is this a possible explaination for the problem?

Has anyone run into this problem before?

Any and all help is greatly appreciated.

Thanks,

Jeff

rayvd 10-08-2003 10:04 AM

This is definitely a NAT problem. I am experiencing the same thing and am trying to figure out a rule to get around it. What I'm going to try and figure out how to do is to get the webserver to reply on the same interface as the request came in on, instead of doing NAT on the packet.

If your setup isn't too complex, you may just be able to set up a rule specifying that outbound packets to a given IP should not be NAT'd, or in some specific way only. I am hoping to find a way to tell the system to not do NAT on packets with a certain flag marked ... I'm using ipf on FreeBSD, but I would guess iptables would have this functionality as well...

rayvd 10-08-2003 10:31 AM

Well, fixed my problem by adjusting the routing table on the machine with the webserver. :)

In your case, why not add an explicity entry to your /etc/hosts file pointing to the internal address instead of the external one?

jalerta 10-08-2003 10:46 AM

Rayvd,

Yep, that worked.

Thanks for the help.

I hope the PHPDig will eventually have the ability to directly index a site based on the location of files in the file system instead of only by FQDN/IP address.

Again, thanks for the help.


Jeff

vvvvv 10-12-2003 08:29 PM

I have the same problem:

SITE : http://www.blah-blah-blah.com/
Exclude paths :
- @NONE@
No link in temporary table


>Well, fixed my problem by adjusting the routing table on the machine with the webserver.

I can't do that cause I have a simple hosting account. :(
Any suggestions? And thanks in advance for any help.

Charter 10-13-2003 05:01 PM

Perhaps in config.php change PHPDIG_DEFAULT_INDEX to false?

vvvvv 10-13-2003 05:17 PM

thanks Charter but still the same:

--------------------------------------------------------------------------------
SITE : http://www.somesite.com/
Exclude paths :
- @NONE@
No link in temporary table

--------------------------------------------------------------------------------

links found : 0
...Was recently indexed
Optimizing tables...
Indexing complete !

--------------------------------------------------------------------------------

Any other ideas? Much appreciate the help.


All times are GMT -8. The time now is 04:29 PM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.