PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 02-23-2004, 04:06 PM   #1
paullind
Orange Mole
 
Join Date: Jan 2004
Posts: 30
Unable to create the content file and crontab not working and ixwebhosting

Hi

I just switched hosts in another attempt to get phpdig to work, last host operated in safe mode and no crontab support, so i'm now trying IXWebhosting.com, here's their main specs:

IXWebhosting.com
Linux version 2.4.22, (Red Hat Linux 7.3 2.96-110)
php4

I have 2 problems at the moment:

1] 'Warning : Unable to create the content file ../text_content/4.txt ! '

I can manually enter a site for spidering through admin/index.php but I get partial success. I receive 'Warning : Unable to create the content file ../text_content/4.txt ! ' as part of the result. My 'text_content' folder has the correct permissions and every site gives the same error message (with differnt txt file number).

I say its partial success as i can still search for the site successfully aferwards, would like to know why this shows up though in case it causes other problems, like with my next question regarding CRONTAB

2] Spider works ok manually but not through CRONTAB method. Any suggestions for troubleshooting methods here? Here's the nitty gritty:

Searched other threads and found this command to use:
/usr/bin/php -f /path/to/admin/spider.php cronlist2.txt >> spider.log .

In which cronlist2.txt contains a list of full url's, one per line, ie like http://www.phpdig.net

All it does is spit out a blank spider.log file . When I manually enter the sites (through admin/index.php) they work as above.

Thanks in advance,

Paul L
paullind is offline   Reply With Quote
Old 02-23-2004, 04:30 PM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. For part one, just to be sure... the following directories are chmod 777 permissions?

[PHPDIG_DIR]/text_content
[PHPDIG_DIR]/include
[PHPDIG_DIR]/admin/temp

For part two, cd to the admin directory and use this command:

php -f spider.php cronlist2.txt > spider.log 2>&1

What does the spider.log contain now?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 02-23-2004, 05:13 PM   #3
paullind
Orange Mole
 
Join Date: Jan 2004
Posts: 30
crontab - commercial host

hi Charter

Got the first issue straightened out, I had previuosly installed and had phpdig running on my own server and uploaded the files to my new commercail server. i guess the text files being there already caused the error, as when i deleted all the old ones the problem dissappeared.

Regarding the Crontab issue:
I don't have any shell level access, i can only enter the commands in a cron tab GUI, I'll try them with your suggested modifications.

- Tried it still no luck

It seems a lot of people here have full access to their server or own and operate it and can do so. I've learned now that when looking for a commercial host for phpdig you need to ensure the commercial host has 1] php safe mode off 2] optionally, it would be nice to have shell level access, or at least a crontab feature.

Thx again

Last edited by paullind; 02-23-2004 at 05:34 PM.
paullind is offline   Reply With Quote
Old 02-23-2004, 06:08 PM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Are you using Cpanel? Some interfaces allow cron jobs to be set that way. If you interface allows such just use the following and then view the spider.log file using FTP:
Code:
php -f spider.php cronlist2.txt > spider.log 2>&1
Another thought... maybe your host doesn't allow cron jobs to write to a file? If that is the cae then use:
Code:
php -f spider.php cronlist2.txt
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 02-24-2004, 03:56 PM   #5
paullind
Orange Mole
 
Join Date: Jan 2004
Posts: 30
Crontab still no luck - troubleshooting???

tried with and without spider log output, still no luck. It does print out a blank spider log so i think it can output alright.

Tried simplyfying it too, avoiding the cronlist file:
/usr/bin/php -f /path/to//admin/spider.php http://www.xxxxx.com
Still no luck.

Can anyone think of a way to troubleshoot this problem?
paullind is offline   Reply With Quote
Old 02-25-2004, 05:08 PM   #6
paullind
Orange Mole
 
Join Date: Jan 2004
Posts: 30
troubleshoot cron job

to troubleshoot cron job a little:

created a file spider2.php, the contents of which simply print out 'hello world'

Worked fine, could output it to a log file too.

Tried setting 777 permission on spider.php, still no luck
paullind is offline   Reply With Quote
Old 02-26-2004, 03:01 PM   #7
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. What is the content of the spider2.php file? Is it something like the following:
PHP Code:
<?php
echo "hello world";
?>
and then "hello world" shows up in the log file?

When you check your phpinfo, is register_argc_argv set to on? If not, in the spider.php file, try setting $_SERVER variables as in this thread.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 03-01-2004, 08:14 AM   #8
paullind
Orange Mole
 
Join Date: Jan 2004
Posts: 30
apache mod no argv passing - getting closer though

Hi

Settings: register_argc_argv=on

I've discovered that an Apache mod install of php does not permit the passing of variables from a a shell command to a php script argv variable.

My isp host suggested this fix to be placed in the spider.php, I placed it just inside the first if statement of spider.php:
foreach ($_GET as $name=> $value)
{
$argv = explode("+", $name);
array_shift ($argv);
}
///this to print out whats passed
foreach ($argv as $key=> $value)
{
echo "the key is $key the value is $value ";
}

It prints out the following in a log file:
-----------------
the key is 1 the value is http:www.yahoo.com Usage: php -f spider.php [option]
Opts: all (default)
forceall
http://something
filename [containing list of urls]
--------------------


So it now seems to be passing the website url to spider but unfortunteltly is doing nothing with it as it does not show up on the list of spidered sites in the admin page.

Should I place this code elsewhere in the script? Or modify it?

Thanks again,
paullind is offline   Reply With Quote
Old 03-01-2004, 09:39 AM   #9
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Untested, but try the following. In spider.php inside the first if statement, right before the $br = "\n"; line, place the following:
PHP Code:
foreach($_GET as $name => $value) {
    
$argv explode("+"$name);
    
$argc count($argv);
    
array_shift($argv);

__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 03-01-2004, 04:32 PM   #10
paullind
Orange Mole
 
Join Date: Jan 2004
Posts: 30
argv and argc and spider.php

Hi charter

i tried your solution with and without this little extra bit to view argv/c:
foreach ($argv as $key=> $value)
{
echo "the key is $key the value is $value ";
}
echo "argc is $argc ";

The LOG file printed out looks like this:
--------------------------------
the key is 0 the value is /path/to/spider/phpdig/admin/spider.php
the key is 1 the value is http:www.cdncc.com
argc is 2

Usage: php -f spider.php [option]
Opts: all (default)
forceall
http://something
filename [containing list of urls]
-----------------------------------

Still no result of site added to spidered list.

Do the values of argv and argc look correct?

Should 'filename' in the log report above be the site url being spidered, or the 'http://something' list the site I am trying to spider?

Is something in config.php preventing spider.php from doing its thing?


Getting closer.....
paullind is offline   Reply With Quote
Old 03-01-2004, 05:39 PM   #11
paullind
Orange Mole
 
Join Date: Jan 2004
Posts: 30
small edit

meant to say 'http://www.cdncc.com' in the middle of the last message
paullind is offline   Reply With Quote
Old 03-02-2004, 04:16 PM   #12
paullind
Orange Mole
 
Join Date: Jan 2004
Posts: 30
can pass argv/c , but including config.php screws spider

review:

Trying to use shell scripting/crontab to call spider and make it spider list of websites. Apache mod php

Have set up correct crontab command, it calls spider.php and gives it the file with the list of websites(only one there now)

In spider.php $argv have values as:
the key is 0 the value is /path/to/phpdig/admin/spider.php
the key is 1 the value is /path/to/phpdig/admin/cronlist2.txt
$argc is 2

Spider.php calls config.php around line 82 and the script does not make it any further beyound this include statment to config.php.

Inside config.php at line 16 I believe this 'if' statement terminates the spidering process:
--------------------
if ((isset($relative_script_path)) && ($relative_script_path != ".") && ($relative_script_path != "..")) {
exit();
}
if (eregi("config.php",$_SERVER['SCRIPT_FILENAME']) || eregi("config.php",$_SERVER['REQUEST_URI'])) {
exit();
}
---------------------
My $relative_script_path is: /path/to/phpdig/ ,so it will exit in the first 'if'.

Why exit here? Should my $relative_script_path be something different?

Has anyone ever combined all the include files into one massive spider.php and run it to avoid potential errors with include files?

Thx again
paullind is offline   Reply With Quote
Old 03-02-2004, 04:22 PM   #13
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Try adding the path in the cofig.php file like so:
PHP Code:
if ((isset($relative_script_path)) && 
(
$relative_script_path != ".") && 
(
$relative_script_path != "..") && 
(
$relative_script_path != "/path/to/phpdig/")) {
    exit();

__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 03-02-2004, 05:20 PM   #14
paullind
Orange Mole
 
Join Date: Jan 2004
Posts: 30
different mysql connection info for shell?

I applied your code above and it has gotten me further along.

I've made it to the include statement in config.php for the connect.php .

The connect script does not seem to work when accessed this way (shell command)

I can manually enter sites to spider, connection ok that way.

In phpMyAdmin I get this message at the begining:
MySQL 3.23.49-log running on 69.49.xxx.yy as abcdef@69.49.aaa.bb

I normally use the first number as the host value in the connection script, I tried the second one also, both same result, the connect script does not make it beyond :
$id_connect = @mysql_connect(PHPDIG_DB_HOST,PHPDIG_DB_USER,PHPDIG_DB_PASS);

I guess a shell script cannot access MySql the same way? I'll ask my hosting service about that one.

Thx again,

paul L
paullind is offline   Reply With Quote
Old 03-04-2004, 09:06 AM   #15
paullind
Orange Mole
 
Join Date: Jan 2004
Posts: 30
So close I can taste it

My host did something which now allows the mysql connection script to work when called from shell/crontab as my output log file is now:
----------------------------
26412: old priority 0, new priority 18
Spidering in progress...*http://www.lockmonsters.com/ Locked*
*http://www.lockmonsters.com/ Locked*
*http://www.lockmonsters.com/ Locked*
*http://www.lockmonsters.com/ Locked*
*http://www.lockmonsters.com/ Locked*
*http://www.lockmonsters.com/ Locked*
*http://www.lockmonsters.com/ Locked*
*http://www.lockmonsters.com/ Locked*
*http://www.lockmonsters.com/ Locked*
*http://www.lockmonsters.com/ Locked*
*http://www.lockmonsters.com/ Locked*
*http://www.lockmonsters.com/ Locked*
---------------------------------
When I go to my browser and admin page, the site shows up in the spidered list (yippee!), but as locked.

I'll try to figure out why it showed up as locked now. (When I enter the site manually through the browser the site spiders alright)
paullind is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
PHP commands, Crontab and mysql_connect pugg09 Troubleshooting 0 12-15-2005 05:18 PM
Unable to render template file. ccondo Troubleshooting 2 08-03-2005 08:41 AM
Unable to create the content file (it did work) rafarspd Troubleshooting 15 02-11-2005 06:18 AM
Re-indexing with crontab ZoRaC How-to Forum 2 07-28-2004 10:02 PM
Warning: Unable to create the content file Tanasja Troubleshooting 1 10-10-2003 01:51 AM


All times are GMT -8. The time now is 01:35 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.