![]() |
|
![]() |
#1 |
Orange Mole
Join Date: Jan 2004
Posts: 30
|
Unable to create the content file and crontab not working and ixwebhosting
Hi
I just switched hosts in another attempt to get phpdig to work, last host operated in safe mode and no crontab support, so i'm now trying IXWebhosting.com, here's their main specs: IXWebhosting.com Linux version 2.4.22, (Red Hat Linux 7.3 2.96-110) php4 I have 2 problems at the moment: 1] 'Warning : Unable to create the content file ../text_content/4.txt ! ' I can manually enter a site for spidering through admin/index.php but I get partial success. I receive 'Warning : Unable to create the content file ../text_content/4.txt ! ' as part of the result. My 'text_content' folder has the correct permissions and every site gives the same error message (with differnt txt file number). I say its partial success as i can still search for the site successfully aferwards, would like to know why this shows up though in case it causes other problems, like with my next question regarding CRONTAB 2] Spider works ok manually but not through CRONTAB method. Any suggestions for troubleshooting methods here? Here's the nitty gritty: Searched other threads and found this command to use: /usr/bin/php -f /path/to/admin/spider.php cronlist2.txt >> spider.log . In which cronlist2.txt contains a list of full url's, one per line, ie like http://www.phpdig.net All it does is spit out a blank spider.log file . When I manually enter the sites (through admin/index.php) they work as above. Thanks in advance, Paul L |
![]() |
![]() |
![]() |
#2 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. For part one, just to be sure... the following directories are chmod 777 permissions?
[PHPDIG_DIR]/text_content [PHPDIG_DIR]/include [PHPDIG_DIR]/admin/temp For part two, cd to the admin directory and use this command: php -f spider.php cronlist2.txt > spider.log 2>&1 What does the spider.log contain now?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
![]() |
![]() |
![]() |
#3 |
Orange Mole
Join Date: Jan 2004
Posts: 30
|
crontab - commercial host
hi Charter
Got the first issue straightened out, I had previuosly installed and had phpdig running on my own server and uploaded the files to my new commercail server. i guess the text files being there already caused the error, as when i deleted all the old ones the problem dissappeared. Regarding the Crontab issue: I don't have any shell level access, i can only enter the commands in a cron tab GUI, I'll try them with your suggested modifications. - Tried it still no luck It seems a lot of people here have full access to their server or own and operate it and can do so. I've learned now that when looking for a commercial host for phpdig you need to ensure the commercial host has 1] php safe mode off 2] optionally, it would be nice to have shell level access, or at least a crontab feature. Thx again Last edited by paullind; 02-23-2004 at 05:34 PM. |
![]() |
![]() |
![]() |
#4 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Are you using Cpanel? Some interfaces allow cron jobs to be set that way. If you interface allows such just use the following and then view the spider.log file using FTP:
Code:
php -f spider.php cronlist2.txt > spider.log 2>&1 Code:
php -f spider.php cronlist2.txt
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
![]() |
![]() |
![]() |
#5 |
Orange Mole
Join Date: Jan 2004
Posts: 30
|
Crontab still no luck - troubleshooting???
tried with and without spider log output, still no luck. It does print out a blank spider log so i think it can output alright.
Tried simplyfying it too, avoiding the cronlist file: /usr/bin/php -f /path/to//admin/spider.php http://www.xxxxx.com Still no luck. Can anyone think of a way to troubleshoot this problem? |
![]() |
![]() |
![]() |
#6 |
Orange Mole
Join Date: Jan 2004
Posts: 30
|
troubleshoot cron job
to troubleshoot cron job a little:
created a file spider2.php, the contents of which simply print out 'hello world' Worked fine, could output it to a log file too. Tried setting 777 permission on spider.php, still no luck ![]() |
![]() |
![]() |
![]() |
#7 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. What is the content of the spider2.php file? Is it something like the following:
PHP Code:
When you check your phpinfo, is register_argc_argv set to on? If not, in the spider.php file, try setting $_SERVER variables as in this thread.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
![]() |
![]() |
![]() |
#8 |
Orange Mole
Join Date: Jan 2004
Posts: 30
|
apache mod no argv passing - getting closer though
Hi
Settings: register_argc_argv=on I've discovered that an Apache mod install of php does not permit the passing of variables from a a shell command to a php script argv variable. My isp host suggested this fix to be placed in the spider.php, I placed it just inside the first if statement of spider.php: foreach ($_GET as $name=> $value) { $argv = explode("+", $name); array_shift ($argv); } ///this to print out whats passed foreach ($argv as $key=> $value) { echo "the key is $key the value is $value "; } It prints out the following in a log file: ----------------- the key is 1 the value is http:www.yahoo.com Usage: php -f spider.php [option] Opts: all (default) forceall http://something filename [containing list of urls] -------------------- So it now seems to be passing the website url to spider but unfortunteltly is doing nothing with it as it does not show up on the list of spidered sites in the admin page. Should I place this code elsewhere in the script? Or modify it? Thanks again, |
![]() |
![]() |
![]() |
#9 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Untested, but try the following. In spider.php inside the first if statement, right before the $br = "\n"; line, place the following:
PHP Code:
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
![]() |
![]() |
![]() |
#10 |
Orange Mole
Join Date: Jan 2004
Posts: 30
|
argv and argc and spider.php
Hi charter
i tried your solution with and without this little extra bit to view argv/c: foreach ($argv as $key=> $value) { echo "the key is $key the value is $value "; } echo "argc is $argc "; The LOG file printed out looks like this: -------------------------------- the key is 0 the value is /path/to/spider/phpdig/admin/spider.php the key is 1 the value is http:www.cdncc.com argc is 2 Usage: php -f spider.php [option] Opts: all (default) forceall http://something filename [containing list of urls] ----------------------------------- Still no result of site added to spidered list. Do the values of argv and argc look correct? Should 'filename' in the log report above be the site url being spidered, or the 'http://something' list the site I am trying to spider? Is something in config.php preventing spider.php from doing its thing? Getting closer..... |
![]() |
![]() |
![]() |
#11 |
Orange Mole
Join Date: Jan 2004
Posts: 30
|
small edit
meant to say 'http://www.cdncc.com' in the middle of the last message
|
![]() |
![]() |
![]() |
#12 |
Orange Mole
Join Date: Jan 2004
Posts: 30
|
can pass argv/c , but including config.php screws spider
review:
Trying to use shell scripting/crontab to call spider and make it spider list of websites. Apache mod php Have set up correct crontab command, it calls spider.php and gives it the file with the list of websites(only one there now) In spider.php $argv have values as: the key is 0 the value is /path/to/phpdig/admin/spider.php the key is 1 the value is /path/to/phpdig/admin/cronlist2.txt $argc is 2 Spider.php calls config.php around line 82 and the script does not make it any further beyound this include statment to config.php. Inside config.php at line 16 I believe this 'if' statement terminates the spidering process: -------------------- if ((isset($relative_script_path)) && ($relative_script_path != ".") && ($relative_script_path != "..")) { exit(); } if (eregi("config.php",$_SERVER['SCRIPT_FILENAME']) || eregi("config.php",$_SERVER['REQUEST_URI'])) { exit(); } --------------------- My $relative_script_path is: /path/to/phpdig/ ,so it will exit in the first 'if'. Why exit here? Should my $relative_script_path be something different? Has anyone ever combined all the include files into one massive spider.php and run it to avoid potential errors with include files? Thx again |
![]() |
![]() |
![]() |
#13 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Try adding the path in the cofig.php file like so:
PHP Code:
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
![]() |
![]() |
![]() |
#14 |
Orange Mole
Join Date: Jan 2004
Posts: 30
|
different mysql connection info for shell?
I applied your code above and it has gotten me further along.
I've made it to the include statement in config.php for the connect.php . The connect script does not seem to work when accessed this way (shell command) I can manually enter sites to spider, connection ok that way. In phpMyAdmin I get this message at the begining: MySQL 3.23.49-log running on 69.49.xxx.yy as abcdef@69.49.aaa.bb I normally use the first number as the host value in the connection script, I tried the second one also, both same result, the connect script does not make it beyond : $id_connect = @mysql_connect(PHPDIG_DB_HOST,PHPDIG_DB_USER,PHPDIG_DB_PASS); I guess a shell script cannot access MySql the same way? I'll ask my hosting service about that one. Thx again, paul L ![]() |
![]() |
![]() |
![]() |
#15 |
Orange Mole
Join Date: Jan 2004
Posts: 30
|
So close I can taste it
My host did something which now allows the mysql connection script to work when called from shell/crontab as my output log file is now:
---------------------------- 26412: old priority 0, new priority 18 Spidering in progress...*http://www.lockmonsters.com/ Locked* *http://www.lockmonsters.com/ Locked* *http://www.lockmonsters.com/ Locked* *http://www.lockmonsters.com/ Locked* *http://www.lockmonsters.com/ Locked* *http://www.lockmonsters.com/ Locked* *http://www.lockmonsters.com/ Locked* *http://www.lockmonsters.com/ Locked* *http://www.lockmonsters.com/ Locked* *http://www.lockmonsters.com/ Locked* *http://www.lockmonsters.com/ Locked* *http://www.lockmonsters.com/ Locked* --------------------------------- When I go to my browser and admin page, the site shows up in the spidered list (yippee!), but as locked. I'll try to figure out why it showed up as locked now. (When I enter the site manually through the browser the site spiders alright) |
![]() |
![]() |
![]() |
|
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
PHP commands, Crontab and mysql_connect | pugg09 | Troubleshooting | 0 | 12-15-2005 05:18 PM |
Unable to render template file. | ccondo | Troubleshooting | 2 | 08-03-2005 08:41 AM |
Unable to create the content file (it did work) | rafarspd | Troubleshooting | 15 | 02-11-2005 06:18 AM |
Re-indexing with crontab | ZoRaC | How-to Forum | 2 | 07-28-2004 10:02 PM |
Warning: Unable to create the content file | Tanasja | Troubleshooting | 1 | 10-10-2003 01:51 AM |