PDA

View Full Version : Indexing META-Tags


Rolandks
10-09-2003, 02:03 AM
Version 1.6.2 with php 4.3.2 is indexing META-Tags "description", "DC.subject" and "keywords" - Is it just the same reason as indexing HTML-Comments :confused:

I think this part don't work with PHP > 4.3.2 (robot_functions.php)

//delete content of head, script, and style tags
$text = eregi_replace("<head[^<>]*>.*</head>"," ",$text);
$text = eregi_replace("<script[^>]*>.*</script>"," ",$text);
$text = eregi_replace("<style[^>]*>.*</style>"," ",$text);
// clean tags
$text = eregi_replace("(</?[a-z0-9 ]+>)",'\1 ',$text);

Try a Site with this META:

<head><!-- ID 566789 - generated by CMS -->
<!-- Global Meta Beginn, Template: meta.tpl -->
<META http-equiv="content-type" content="text/html;charset=ISO-8859-1">
<META HTTP-EQUIV="Content-Language" CONTENT="de">
<META NAME="description" CONTENT="Informationen, and your description here is indexing">
<META NAME="keywords" CONTENT="This, keywords, here, are, indexing, allin, PHPDIG ">
<META NAME="publisher" CONTENT="This is also indexing">
<META NAME="copyright" CONTENT="This is also indexing">
<META NAME="creation_Date" CONTENT="11/04/2003">
<META NAME="expires" CONTENT="never">
<META HTTP-EQUIV="Pragma" content="no-cache">
<META NAME="ROBOTS" CONTENT="INDEX,FOLLOW">
<META NAME="revisit-after" CONTENT="7 days">
<META NAME="DC.format" content="text/html">
<META NAME="DC.Date" content="2003-11-04">
<META NAME="DC.contributor" CONTENT="Your Name">
<META NAME="DC.subject" CONTENT="This, keywords, here, are, indexing, allin, PHPDIG">
<META NAME="DC.description" CONTENT="This, keywords, here, are, indexing, in, PHPDIG">
<META NAME="DC.title" CONTENT="This, keywords, here, are, indexing, allin, PHPDIG">
<META NAME="DC.language" CONTENT="de">
<META NAME="DC.type" CONTENT="Information">
<!-- Global Meta End -->
<title>Title of Homepage</title>
<link href="stylesheet.css" rel="stylesheet" media="screen">
</head>


Any hints to solve the "eregi_replace" - problem ? Perhaps does "phpdigExclude" work as workaround for this area ?

Charter
10-09-2003, 06:41 PM
//delete content of head, script, and style tags
$text = eregi_replace("<head[^<>]*>.*</head>"," ",$text);
$text = eregi_replace("<script[^>]*>.*</script>"," ",$text);
$text = eregi_replace("<style[^>]*>.*</style>"," ",$text);
// clean tags
$text = eregi_replace("(</?[a-z0-9 ]+> )",'\\1 ',$text);


Hi. If you want to strip META tags, perhaps add something like:

$text = eregi_replace("<meta[^>]*>"," ",$text);

BernhardG
10-10-2003, 01:00 AM
Hi,


The "head" regex SHOULD filter everything between <head> and </head>. This includes every meta-tag! But there is a problem if someone do not set the </head> correct. The problem with most of the available http indexing search engines ist that they think that every site is using perfect html markup - but this is not realistic :-(

Bernhard

Charter
10-10-2003, 04:24 AM
Thanks, missed that line, and I even quoted it. ;)

Rolandks
10-10-2003, 07:21 AM
Originally posted by BernhardG
But there is a problem if someone do not set the </head> correct.

Hmm, what is wrong at my <head> </head> :confused:

The Site with this HEAD is checked as Valid HTML 4.01 at www.W3C.org ?

-Roland-

BernhardG
10-10-2003, 08:05 AM
Oh yes your markup is correct - My eyes were shut when I've wrote my reply, sorry!
But wrong markup is a general problem - every parser has this problem :-(
By the way:
Should'nt the line
$text = eregi_replace("<head[^<>]*>.*</head>"," ",$text);
look like this (as every other line in your quote):
$text = eregi_replace("<head[^>]*>.*</head>"," ",$text);
Or why we need this additional '<'?

Bernhard

Rolandks
10-23-2003, 01:54 AM
Okay, i have spend a little time again in this problem :)

phpdig is only indexing this two META-tags:

<META NAME="description" CONTENT="Informationen, and your description here is indexing">
<META NAME="keywords" CONTENT="This, keywords, here, are, indexing, allin, PHPDIG ">

admin\robot_functions.php (755):

if (is_array($tags)) {
if (isset($tags['description'])) {
$page_desc = phpdigCleanHtml($tags['description']);
}
if (isset($tags['keywords'])) {
$page_keywords = phpdigCleanHtml($tags['keywords']);
}
}


Perhaps this is the problem - it is a feature which you can't dissable :confused:

Charter
10-25-2003, 07:46 AM
Hi. You should just be able to comment that piece of code out to remove the indexing of the two meta tags mentioned. As always, try it on a demo page first. ;)