PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   How-to Forum (http://www.phpdig.net/forum/forumdisplay.php?f=33)
-   -   Exclude links with certain url variabls (http://www.phpdig.net/forum/showthread.php?t=439)

jclementson 01-26-2004 02:35 AM

Exclude links with certain url variabls
 
Hi there,

Every page on my website has a link to a printer-friendly version of the same page, done with [thispage.php?print=y]

I need to exclude these links from the spidering process, but without excluding other url variables such as [news.php?story=11]

Basically I need a way to tell the spidering process not to follow links containing a specific string (in this case '?print=y'). I can't find this feature already there, so can someone guide me to the right fuction and how to modify it?

Thanks

TSO 01-26-2004 04:12 AM

Quote:

Every page on my website has a link to a printer-friendly version of the same page, done with [thispage.php?print=y]
Just started figuring out this case also. After line 412 in "search_function.php" add:
$content['file'] = preg_replace("print=y'si","", $content['file']);
(line before: $url = eregi_replace("([a-z0-9])[/]+... )

This strips "print=y" away. Bad thing is that you get double when searhing searching (those without "print" and those with "print" -> only url is filtered). Lets keep up looking...

jclementson 01-26-2004 04:20 AM

Thanks, that's a useful start.

I'm looking at function phpdigExplore in robot_functions.php, but I can't figure it out yet.

jclementson 01-26-2004 04:54 AM

Got it!

In robot_functions.php, I've added a test at the end of function phpdigDetectDir.

This is how I've done it for the test I need, showing lines 537 onwards. My addition is at line 543:

//test the exclude with robots.txt
if (phpdigReadRobots($exclude,$link['path'].$link['file']) == 1
|| isset($exclude['@ALL@'])
) {
$link['ok'] = 0;
}
//exclude if specific variable set
if (strpos($link['file'],'print=y')) {
$link['ok'] = 0;
}
//print "<pre>"; print_r($link); print "</pre>\n";
return $link;

TSO 01-26-2004 06:42 AM

I got it too... somehow
Edited "search_function.php" a bit. It is a bit messy, so i wont post it here. Anyway it works pretty well, not perfect. This feature would be a nice add on future versions.
I have different language versions, so I dont want to rip off search results permanently.

JoNtE 02-25-2004 12:19 AM

Use config.php constant?
 
Found this in the config.php file:

PHP Code:

// regular expression to ban useless external links in index
define('BANNED','^ad\.|banner|doubleclick'); 

change it to:
PHP Code:

define('BANNED','^ad\.|banner|doubleclick|print=y'); 

I guess this could be used to exclude the urls with strings matching the reg-exp

Have the same problem... but not tested this possible solution yet... will be back with the result.

// JoNtE


All times are GMT -8. The time now is 04:59 AM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.