PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Script Installation

Reply
 
Thread Tools
Old 09-29-2006, 03:04 PM   #1
digdug
Green Mole
 
Join Date: Sep 2006
Posts: 6
spider.php problem

Hi,

I just installed and integrated phpdig to my website. The install went OK (i.e., phpdig tables are created). But then in index.php, when I tried to index the link by putting down the URI link and click 'Dig This!', which is supposed to direct to spider.php, the spider.php page could not be accessed. This is the browser error message: "The page cannot be displayed ...Cannot find server or DNS Error".

Then, I tried to refresh the page. Now I can see the page, but it didn't seem to work either. Below is the PhpDig message:

Spidering in progress... [Stop spider]
Optimizing tables...
Indexing complete !
--------------------------------------------------------------------------------
[Back] to admin interface.

Yes, only those 5 lines, which convince me that spider.php isn't doing anything.

By the way, there is one issue in my server: allow_url_fopen is set to 0. I tried to work on it by adding iniset('allow_url_fopen', '1') at the top of every php page.

I don't know whether allow_url_fopen or another issue is the cause of the problem.

Could somebody help?

Thanks in advance.
digdug is offline   Reply With Quote
Old 09-30-2006, 09:29 AM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
PhpDig needs allow_url_fopen set to on. If you are using PHP version greater than 4.3.4, then allow_url_fopen can only be set in the php.ini or httpd.conf files. There is a list here that lets you know 'what is allowed where' when using the ini_set function.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 09-30-2006, 04:30 PM   #3
digdug
Green Mole
 
Join Date: Sep 2006
Posts: 6
I just contacted the server admin, and he switched the allow_url_fopen value to 1 (ON). But then, same thing still happened. I put the website name, click 'Dig This!', and spider.php was still blank and no indexing activity.

Btw the server that I use is Linux Apache Version 1.3.27 and PHP Version 4.2.3.

I tried to copy the EXACT same phpdig folder to another website on a different server system (this time windows server), and voila, it works. Even when I put the website address of the one in the linux server, it could crawl and index that website.

Then I thought, it might be a path issue (because all the pages are in /usr/home/....../public_html/ and this search folder is in /usr/home/....../public_html/phpdig), so I tried to changed the 'ABSOLUTE_SCRIPT_PATH' to '/usr/home/../../phpdig', still it wouldn't work.

What else should I do?

Btw I noticed the following paragraph in the documentation:

"Note that if your OS/setup is for example a CGI loadbalanced cluster of servers, it may not possible to index sites on the cluster as there cannot be a connection back to the loadbalanced address. Also note that PhpDig is a web spider and search engine, meaning that you may have to edit you hosts file with something like "127.0.0.1 www.domain.com" in order to get PhpDig to crawl on localhost."

What does this mean?
digdug is offline   Reply With Quote
Old 10-01-2006, 06:50 AM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Are these directories set to 777 permissions on the Linux server?
Code:
	[PHPDIG_DIR]/text_content
	[PHPDIG_DIR]/includes
	[PHPDIG_DIR]/admin/temp
Load balancing can be where a domain name resolves to multiple IP addresses, so basically PhpDig doesn't know what to do.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 10-01-2006, 10:18 AM   #5
alexandercer
Green Mole
 
Join Date: Sep 2006
Posts: 4
please have a look of your database,make sure that it really do not have any data about the web which you send spider to it.
and by the way,do not spider more than 1 site ,phpdig will not work well.
that is all
alexandercer is offline   Reply With Quote
Old 10-01-2006, 02:07 PM   #6
digdug
Green Mole
 
Join Date: Sep 2006
Posts: 6
Yes, all 3 directories are already set to 777.

Btw, on the website there is already a search engine function that was set up by previous developer using generic perl scripts provided by the hosting server company. Would that be the cause of the problem ? (e.g. alexandercer mentioned to not spider more than 1 site since phpdig will not work well; this existing perl-based search function may affect the indexing -or may not - just a thought)

Also alexandercer, I already checked the database and made sure that it was clear of any data of spidered web.

Btw I searched around the phpdig database and compared it with another phpdig database in another website that works. The difference I found was: in 'sites' table, the date in the 'upddate' column is printed in a wrong format, i.e. 20061001185646. In another database where PhpDig works, it was printed in this format: 2006-10-01 18:56:46

Is this symptom of character (date) formatting mistake tells something? Or maybe this is useless and I just digged down too much.
digdug is offline   Reply With Quote
Old 10-02-2006, 07:09 PM   #7
digdug
Green Mole
 
Join Date: Sep 2006
Posts: 6
Hello,

Guess what? It works now. And the culprit is: robots.txt

I deleted it and the spidering worked like a charm.

Thanks Charter and alexandercer for your help!
digdug is offline   Reply With Quote
Old 10-04-2006, 01:14 AM   #8
Dave A
Purple Mole
 
Dave A's Avatar
 
Join Date: Aug 2004
Location: North Island New Zealand
Posts: 170
Sounds like the file is not being found in the path.
From the admin panel where you add the domain and get it to dig it, did you add the depth to which it is to go down too?

I will have a look and see what I can find out for you? but it seems to be path related from what I can see.
Dave A is offline   Reply With Quote
Old 10-18-2006, 07:25 AM   #9
alexandercer
Green Mole
 
Join Date: Sep 2006
Posts: 4
wao,congratulates!
but , excuse me , i am confused ,i can not found the file called robots.txt ,where did you found it,or that is only a config file that you write by yourself?
i think it can help us to know better about the pretty mole.
alexandercer is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Spider problem, Search mb_ereg_replace problem. (Fixed?!) cpeter Troubleshooting 0 02-24-2006 01:56 PM
autorefresh spider.php zaartix Mod Requests 0 12-21-2005 09:11 PM
sleep(5) in spider.php? bsw114 How-to Forum 2 03-12-2004 01:31 PM
spider.php via bash tomas Troubleshooting 16 02-07-2004 04:23 PM
Spider Problem i_am_cam Troubleshooting 11 12-29-2003 07:45 AM


All times are GMT -8. The time now is 10:53 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.