PDA

View Full Version : Setting HTTP_USER_AGENT for spidering..


tester
01-13-2004, 06:15 PM
Hi Charter,

This is related to a problem that was posted in the troubleshooting section recently.

We have a website in which certain pages require authentication. This is performed using a function that is included on all protected pages. Our solution to allow the indexing of these pages was to check the HTTP_USER_AGENT from the request headers and allow the page to load using that. The problem is that when phpdig spiders the page, the user agent is always the default value of "PHP/4.2.2", no matter what we set it to be in the function phpdigTestUrl() in robotfunctions.php.

Is there a way to programmatically set User-agent to something secure so that the authentication mechanism is still dependable?

Thanks.

Charter
01-14-2004, 10:11 AM
Hi. What is user_agent set to in your PHP info? Are you crawling via shell or browser interface?

tester
01-14-2004, 01:54 PM
Hi Charter,

We are currently using the browser interface for crawling, with the intention of using shell later when we set up the indexing as a cron job.

In either case the spidering script is accessing the pages and providing a user agent of "PHP/4.2.2" (default before PHP version 4.3.0). The code in robot_functions.php allows the setting of the User-agent header, so why is this overridden?

The php.ini has nothing set for user_agent. Is there some other way to set the user_agent to our liking?

Charter
01-15-2004, 06:37 AM
Hi. What output do you get when you run the code snippet from php.net (http://www.php.net/manual/en/function.get-browser.php
) below?

<?php
echo $_SERVER['HTTP_USER_AGENT'] . "<hr />\n";
$browser = get_browser();
foreach ($browser as $name => $value) {
echo "<b>$name</b> $value <br />\n";
}
?>