PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   Bug Tracker (http://www.phpdig.net/forum/forumdisplay.php?f=27)
-   -   Indexing META-Tags (http://www.phpdig.net/forum/showthread.php?t=139)

Rolandks 10-09-2003 01:03 AM

Indexing META-Tags
 
Version 1.6.2 with php 4.3.2 is indexing META-Tags "description", "DC.subject" and "keywords" - Is it just the same reason as indexing HTML-Comments :confused:

I think this part don't work with PHP > 4.3.2 (robot_functions.php)
Code:

//delete content of head, script, and style tags
$text = eregi_replace("<head[^<>]*>.*</head>"," ",$text);
$text = eregi_replace("<script[^>]*>.*</script>"," ",$text);
$text = eregi_replace("<style[^>]*>.*</style>"," ",$text);
// clean tags
$text = eregi_replace("(</?[a-z0-9 ]+>)",'\1 ',$text);

Try a Site with this META:
Code:

<head><!-- ID 566789 - generated by CMS -->
    <!-- Global Meta Beginn, Template: meta.tpl -->
    <META http-equiv="content-type" content="text/html;charset=ISO-8859-1">
    <META HTTP-EQUIV="Content-Language" CONTENT="de">
    <META NAME="description" CONTENT="Informationen, and your description here is indexing">
    <META NAME="keywords" CONTENT="This, keywords, here, are, indexing, allin, PHPDIG ">
    <META NAME="publisher" CONTENT="This is also indexing">
    <META NAME="copyright" CONTENT="This is also indexing">
    <META NAME="creation_Date" CONTENT="11/04/2003">
    <META NAME="expires" CONTENT="never">
    <META HTTP-EQUIV="Pragma" content="no-cache">
    <META NAME="ROBOTS" CONTENT="INDEX,FOLLOW">
    <META NAME="revisit-after" CONTENT="7 days">
    <META NAME="DC.format" content="text/html">
    <META NAME="DC.Date" content="2003-11-04">
    <META NAME="DC.contributor" CONTENT="Your Name">
    <META NAME="DC.subject" CONTENT="This, keywords, here, are, indexing, allin, PHPDIG">
    <META NAME="DC.description" CONTENT="This, keywords, here, are, indexing, in, PHPDIG">
    <META NAME="DC.title" CONTENT="This, keywords, here, are, indexing, allin, PHPDIG">
    <META NAME="DC.language" CONTENT="de">
    <META NAME="DC.type" CONTENT="Information">
<!-- Global Meta End -->
    <title>Title of Homepage</title>
    <link href="stylesheet.css" rel="stylesheet" media="screen">
 </head>

Any hints to solve the "eregi_replace" - problem ? Perhaps does "phpdigExclude" work as workaround for this area ?

Charter 10-09-2003 05:41 PM

Quote:

PHP Code:

//delete content of head, script, and style tags
$text eregi_replace("<head[^<>]*>.*</head>"," ",$text);
$text eregi_replace("<script[^>]*>.*</script>"," ",$text);
$text eregi_replace("<style[^>]*>.*</style>"," ",$text);
// clean tags
$text eregi_replace("(</?[a-z0-9 ]+> )",'\\1 ',$text); 


Hi. If you want to strip META tags, perhaps add something like:
PHP Code:

$text eregi_replace("<meta[^>]*>"," ",$text); 


BernhardG 10-10-2003 12:00 AM

Hi,


The "head" regex SHOULD filter everything between <head> and </head>. This includes every meta-tag! But there is a problem if someone do not set the </head> correct. The problem with most of the available http indexing search engines ist that they think that every site is using perfect html markup - but this is not realistic :-(

Bernhard

Charter 10-10-2003 03:24 AM

Thanks, missed that line, and I even quoted it. ;)

Rolandks 10-10-2003 06:21 AM

Quote:

Originally posted by BernhardG
But there is a problem if someone do not set the </head> correct.
Hmm, what is wrong at my <head> </head> :confused:

The Site with this HEAD is checked as Valid HTML 4.01 at www.W3C.org ?

-Roland-

BernhardG 10-10-2003 07:05 AM

Oh yes your markup is correct - My eyes were shut when I've wrote my reply, sorry!
But wrong markup is a general problem - every parser has this problem :-(
By the way:
Should'nt the line
$text = eregi_replace("<head[^<>]*>.*</head>"," ",$text);
look like this (as every other line in your quote):
$text = eregi_replace("<head[^>]*>.*</head>"," ",$text);
Or why we need this additional '<'?

Bernhard

Rolandks 10-23-2003 12:54 AM

Okay, i have spend a little time again in this problem :)

phpdig is only indexing this two META-tags:

<META NAME="description" CONTENT="Informationen, and your description here is indexing">
<META NAME="keywords" CONTENT="This, keywords, here, are, indexing, allin, PHPDIG ">

admin\robot_functions.php (755):
PHP Code:

if (is_array($tags)) {
    if (isset(
$tags['description'])) {
      
$page_desc phpdigCleanHtml($tags['description']);
    }
    if (isset(
$tags['keywords'])) {
      
$page_keywords phpdigCleanHtml($tags['keywords']);
    }


Perhaps this is the problem - it is a feature which you can't dissable :confused:

Charter 10-25-2003 06:46 AM

Hi. You should just be able to comment that piece of code out to remove the indexing of the two meta tags mentioned. As always, try it on a demo page first. ;)


All times are GMT -8. The time now is 09:22 PM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.