PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > How-to Forum

Reply
 
Thread Tools
Old 01-26-2004, 02:35 AM   #1
jclementson
Green Mole
 
Join Date: Jan 2004
Posts: 3
Exclude links with certain url variabls

Hi there,

Every page on my website has a link to a printer-friendly version of the same page, done with [thispage.php?print=y]

I need to exclude these links from the spidering process, but without excluding other url variables such as [news.php?story=11]

Basically I need a way to tell the spidering process not to follow links containing a specific string (in this case '?print=y'). I can't find this feature already there, so can someone guide me to the right fuction and how to modify it?

Thanks
jclementson is offline   Reply With Quote
Old 01-26-2004, 04:12 AM   #2
TSO
Green Mole
 
Join Date: Jan 2004
Posts: 2
Quote:
Every page on my website has a link to a printer-friendly version of the same page, done with [thispage.php?print=y]
Just started figuring out this case also. After line 412 in "search_function.php" add:
$content['file'] = preg_replace("print=y'si","", $content['file']);
(line before: $url = eregi_replace("([a-z0-9])[/]+... )

This strips "print=y" away. Bad thing is that you get double when searhing searching (those without "print" and those with "print" -> only url is filtered). Lets keep up looking...
TSO is offline   Reply With Quote
Old 01-26-2004, 04:20 AM   #3
jclementson
Green Mole
 
Join Date: Jan 2004
Posts: 3
Thanks, that's a useful start.

I'm looking at function phpdigExplore in robot_functions.php, but I can't figure it out yet.
jclementson is offline   Reply With Quote
Old 01-26-2004, 04:54 AM   #4
jclementson
Green Mole
 
Join Date: Jan 2004
Posts: 3
Got it!

In robot_functions.php, I've added a test at the end of function phpdigDetectDir.

This is how I've done it for the test I need, showing lines 537 onwards. My addition is at line 543:

//test the exclude with robots.txt
if (phpdigReadRobots($exclude,$link['path'].$link['file']) == 1
|| isset($exclude['@ALL@'])
) {
$link['ok'] = 0;
}
//exclude if specific variable set
if (strpos($link['file'],'print=y')) {
$link['ok'] = 0;
}
//print "<pre>"; print_r($link); print "</pre>\n";
return $link;
jclementson is offline   Reply With Quote
Old 01-26-2004, 06:42 AM   #5
TSO
Green Mole
 
Join Date: Jan 2004
Posts: 2
I got it too... somehow
Edited "search_function.php" a bit. It is a bit messy, so i wont post it here. Anyway it works pretty well, not perfect. This feature would be a nice add on future versions.
I have different language versions, so I dont want to rip off search results permanently.
TSO is offline   Reply With Quote
Old 02-25-2004, 12:19 AM   #6
JoNtE
Green Mole
 
Join Date: Feb 2004
Posts: 1
Use config.php constant?

Found this in the config.php file:

PHP Code:
// regular expression to ban useless external links in index
define('BANNED','^ad\.|banner|doubleclick'); 
change it to:
PHP Code:
define('BANNED','^ad\.|banner|doubleclick|print=y'); 
I guess this could be used to exclude the urls with strings matching the reg-exp

Have the same problem... but not tested this possible solution yet... will be back with the result.

// JoNtE
JoNtE is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Exclude links from indexing, keep text digger_123 How-to Forum 0 12-20-2006 04:14 AM
exclude filenames felyx Troubleshooting 0 11-20-2006 09:29 PM
How can i exclude pages?? onlytrue How-to Forum 2 03-19-2004 02:47 PM
Exclude list? antun How-to Forum 5 03-10-2004 11:38 AM
exclude after spidering baskamer Troubleshooting 2 03-01-2004 02:17 AM


All times are GMT -8. The time now is 04:44 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.