PDA

View Full Version : Spidering Problems on a Windows Server Website


vinyl-junkie
02-08-2004, 07:09 PM
I haven't forgotten about the fact that I said I would try and solve that authentication problem for a site on a Windows server. Instead, I wanted to bypass the authentication process for now and just see if I could spider this website at all. Sure enough, I ran into a problem. Here's the error I'm getting.
Fatal error: Call to undefined function: is_executable() in c:\hosting\webhost4life\member\vinyljunkie2\search\admin\robot_functions.ph p on line 665

The line number that references is here, in bold:
if (USE_IS_EXECUTABLE_COMMAND == 1) {
$is_exec_command_msword = is_executable(PHPDIG_PARSE_MSWORD);
I found the code in config.php which references USE_IS_EXECUTABLE_COMMAND and changed the value for that variable to zero, but all I got was a bunch more errors after refreshing my phpdig database and trying to start over again.

Any ideas what the problem could be?

mbruere
02-09-2004, 02:07 AM
Hi. PhpDig 1.6.4+ has an option in the config file to bypass is_executable. Change define('USE_IS_EXECUTABLE_COMMAND','1'); to define('USE_IS_EXECUTABLE_COMMAND','0'); in the config file.

Regards,

Mathieu

vinyl-junkie
02-09-2004, 06:25 AM
Yes, I know. As I stated in my first post, I changed that to zero and just got a bunch of different errors. I have no idea why. :( Would it be helpful if I posted those error messages too?

vinyl-junkie
02-09-2004, 07:21 AM
I'm replying to myself here. :D After posting earlier, I went on reading some other posts in the forum and came across a couple of others with problems similar to mine with that is_executable variable. OK, I thought. I'll just double-check what I've done and attempt to spider my site one more time.

I refreshed all the phpdig database tables, then attempted to spider my site again. This time, the spidering page showed up like this:
SITE : http://www.techtipscentral.net/
Exclude paths :
- test/
- 1ClickDBFree/
Unable to create temp directory
For what it's worth, I ended up having to drop that table and then re-create it to get the phpdig database refreshed so I could try spidering again.

Now I'm sure that I have that is_executable variable being set properly, so what is the problem this time?

vinyl-junkie
02-09-2004, 07:54 AM
Sorry for so many posts, but I'm replying to myself again. I found this thread (http://www.phpdig.net/showthread.php?s=&threadid=221&highlight=Unable+to+create+temp+directory) in the forums, which describes the exact same problem I'm having, including the error messages.

I tried what they said they did, creating an admin/temp directory, and also Charter's suggestion about commenting out that section of code. I'm still getting that long list of errors given in that prior post. Worse yet, it seems to only be spidering the root directory and then stopping. I have *way* more content than just one page. Help!

Charter
02-09-2004, 09:47 AM
Hi. Is safe mode on? What errors do you receive?

vinyl-junkie
02-09-2004, 10:02 AM
Re: safe mode. I have no idea. Is that something I would have to ask my web host? I didn't find anything about that in their knowledge base. If safe mode is on, is there no way around it?

Here are the errors I'm getting (full directory extension is replaced by "pathname" here):
Warning: fopen(../admin/temp/27583251.tmp): failed to open stream: Permission denied in c:\pathname\search\admin\robot_functions.php on line 705

Warning: fwrite(): supplied argument is not a valid stream resource in c:\pathname\search\admin\robot_functions.php on line 707

Warning: fclose(): supplied argument is not a valid stream resource in c:\pathname\search\admin\robot_functions.php on line 709

Warning: filesize(): Stat failed for ../admin/temp/27583251.tmp (errno=2 - No such file or directory) in c:\pathname\search\admin\robot_functions.php on line 710

Notice: Undefined variable: revisit_after in c:\pathname\search\admin\spider.php on line 374
1:http://www.techtipscentral.net/
(time : 00:00:06)
No link in temporary table

Charter
02-09-2004, 10:57 AM
Hi. Check your phpinfo (http://www.php.net/manual/en/function.phpinfo.php) for safe_mode (http://www.php.net/features.safe-mode) to see if it's on or off. Also check that the following directories have 777 permission:

[PHPDIG_DIR]/text_content
[PHPDIG_DIR]/include
[PHPDIG_DIR]/admin/temp

vinyl-junkie
02-09-2004, 12:18 PM
safe_mode is off for both local and master value.
safe_mode_exec_dir is off for both.
safe_mode_gid is off for both.
safe_mode_include_dir has "no value" for both.
open_basedir has "no value" for both.
disable_functions has "no value" for both.
disable_classes has "no value for both.

All three of those directories you had me check have permission level 777.

I was reading some of the stuff from the link you posted. Is it possible that the reason I'm unable to execute these commands is that I created the PHPDIG directories with the username/password that I was given for my web host account, but the database username/password are different? When it comes to this sort of thing, I'm way out of my element.

Charter
02-11-2004, 12:14 PM
Hi. What is allow_url_fopen (http://www.php.net/manual/en/function.fopen.php) set to in the phpinfo?

vinyl-junkie
02-11-2004, 06:48 PM
That is set to On for both local and master values.

Charter
02-11-2004, 09:18 PM
>> ...I created the PHPDIG directories with the username/password that I was given for my web host account...

Hi. Do you mean that you logged into your account and then created the directories? Who's the owner of the directories? Perhaps send an email to your host. For some reason it seems that the directories are not accessible to the script.

vinyl-junkie
02-13-2004, 11:27 PM
I contacted my Web host, and they changed the permissions for my account. That did the trick, and now I have spidered my site! :D

I just have to add that I have one of the best Web hosts around. I turned in the support ticket about 10:15 p.m. tonight, and received a response about 45 minutes later. This isn't the first time either that I have received fast service like that. Pretty cool, huh?

Now, to look at that authentication screen issue with IIS that I said I would work on....

vinyl-junkie
02-14-2004, 12:03 AM
But...

And here is where Windows is so darn picky, I'm testing the actual search page now and getting the following error:Undefined offset: 1 in pathname\search\libs\search_function.php on line 468
Line 468 in search_function.php looks like this:
list($title,$text) = explode("\n",$first_words);
I've discovered already with PHP scripts that Windows is very fussy about making sure variables are initialized before calling a function. However, I initialized $title and $text to nulls at the start of the function, but that didn't help. Any ideas on how to fix this?

Charter
02-14-2004, 01:03 PM
Hi. What do you get when you call the following from the browser using your Windows account?

<?php
$test = "This is\na test.";
list($first,$second) = explode("\n",$test);
echo $first . "<br>" . $second;
?>

vinyl-junkie
02-14-2004, 01:32 PM
With just that code in a web page, I get this:
This is
a test.

Charter
02-14-2004, 01:53 PM
Hi. In robot_functions.php $first_words is created as follows:

$first_words = $titre_resume."\n".ereg_replace('(@@@.*)','',wordwrap($page_desc['content'].$text[0], SUMMARY_LENGTH, '@@@'));

Try checking the first_words column of the spider table to verify that something like title \n some text appears in the column.

vinyl-junkie
02-14-2004, 02:06 PM
This is actually a very small website (only 20 spider-able files right now), so I was able to browse the entire "spider" table. There is no entry in the first_words field with "\n" anywhere in that field.

Charter
02-14-2004, 02:29 PM
>> There is no entry in the first_words field with "\n" anywhere in that field.

Hi. The \n is actually a newline. Did you see anything like the following in the first_words column?

title

some text

vinyl-junkie
02-14-2004, 06:07 PM
Yes, but only just a few entries. Most just have the page name. For example, the first_words fields for one entry has ContactMe.asp. Here's one that had the page description as part of that field.
CD Trustee On The Web: Introduction Would you like to have your CD Trustee music database display on your website as dynamic content pages? I've written a tutorial to show you how. If this could be broken out between title and text, it would be like so:CD Trustee On The Web: Introduction

Would you like to have your CD Trustee music database display on your website as dynamic content pages? I've written a tutorial to show you how.

vinyl-junkie
02-14-2004, 06:30 PM
Don't know if this helps or not (should have mentioned it earlier), but when I spidered this site it seemed to take a really long time (about 3 or 4 minutes), and we're talking about a site with around 20 pages total. With my Unix site, I could spider a few hundred pages in the same length of time.

Also, just for grins I decided to update the spidering just now and got this error:HTTP/1.1 502 Gateway Error Server: Microsoft-IIS/5.0 Date: Sun, 15 Feb 2004 02:18:08 GMT Connection: close Content-Length: 186 Content-Type: text/html

CGI Timeout

The specified CGI application exceeded the allowed time for processing. The server has deleted the process.

Charter
02-14-2004, 06:48 PM
Hi. It looks like you may be running into the issue listed in point one of this (http://www.phpdig.net/showthread.php?threadid=58) thread, for which a full solution is currently not available. Perhaps the 'undefined offset' error is related to CGI timeout. One suggestion is to delete the site from the admin panel, empty the tables, and set LIMIT_DAYS to zero in the config.php file and then crawl on a per page basis where search depth is zero (available since version 1.6.5) or one. Also, if interested see this (http://www.phpdig.net/showthread.php?threadid=513) thread about the LIMIT_DAYS constant.

vinyl-junkie
02-14-2004, 07:41 PM
I decided to clear the tables and reset that config.php value you suggested, then spider the site the conventional way. If that hadn't worked, I would have gone and spidered the pages individually as you suggested. It did work though, and the site searches seem to be okay now.

A Windows site is definitely touchy. :rolleyes: Now I know why my main website is on Unix.

Charter
02-20-2004, 07:44 PM
OT: Thanks vinyl-junkie for helping here and elsewhere! :D