PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Bug Tracker

Reply
 
Thread Tools
Old 10-09-2003, 01:03 AM   #1
Rolandks
Purple Mole
 
Rolandks's Avatar
 
Join Date: Sep 2003
Location: Kassel, Germany
Posts: 119
Indexing META-Tags

Version 1.6.2 with php 4.3.2 is indexing META-Tags "description", "DC.subject" and "keywords" - Is it just the same reason as indexing HTML-Comments

I think this part don't work with PHP > 4.3.2 (robot_functions.php)
Code:
//delete content of head, script, and style tags
$text = eregi_replace("<head[^<>]*>.*</head>"," ",$text);
$text = eregi_replace("<script[^>]*>.*</script>"," ",$text);
$text = eregi_replace("<style[^>]*>.*</style>"," ",$text);
// clean tags
$text = eregi_replace("(</?[a-z0-9 ]+>)",'\1 ',$text);
Try a Site with this META:
Code:
<head><!-- ID 566789 - generated by CMS -->
    <!-- Global Meta Beginn, Template: meta.tpl -->
    <META http-equiv="content-type" content="text/html;charset=ISO-8859-1">
    <META HTTP-EQUIV="Content-Language" CONTENT="de">
    <META NAME="description" CONTENT="Informationen, and your description here is indexing">
    <META NAME="keywords" CONTENT="This, keywords, here, are, indexing, allin, PHPDIG ">
    <META NAME="publisher" CONTENT="This is also indexing">
    <META NAME="copyright" CONTENT="This is also indexing">
    <META NAME="creation_Date" CONTENT="11/04/2003">
    <META NAME="expires" CONTENT="never">
    <META HTTP-EQUIV="Pragma" content="no-cache">
    <META NAME="ROBOTS" CONTENT="INDEX,FOLLOW">
    <META NAME="revisit-after" CONTENT="7 days">
    <META NAME="DC.format" content="text/html">
    <META NAME="DC.Date" content="2003-11-04">
    <META NAME="DC.contributor" CONTENT="Your Name">
    <META NAME="DC.subject" CONTENT="This, keywords, here, are, indexing, allin, PHPDIG">
    <META NAME="DC.description" CONTENT="This, keywords, here, are, indexing, in, PHPDIG">
    <META NAME="DC.title" CONTENT="This, keywords, here, are, indexing, allin, PHPDIG">
    <META NAME="DC.language" CONTENT="de">
    <META NAME="DC.type" CONTENT="Information">
<!-- Global Meta End -->
    <title>Title of Homepage</title>
    <link href="stylesheet.css" rel="stylesheet" media="screen">
 </head>
Any hints to solve the "eregi_replace" - problem ? Perhaps does "phpdigExclude" work as workaround for this area ?
Rolandks is offline   Reply With Quote
Old 10-09-2003, 05:41 PM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Quote:
PHP Code:
//delete content of head, script, and style tags
$text eregi_replace("<head[^<>]*>.*</head>"," ",$text);
$text eregi_replace("<script[^>]*>.*</script>"," ",$text);
$text eregi_replace("<style[^>]*>.*</style>"," ",$text);
// clean tags
$text eregi_replace("(</?[a-z0-9 ]+> )",'\\1 ',$text); 
Hi. If you want to strip META tags, perhaps add something like:
PHP Code:
$text eregi_replace("<meta[^>]*>"," ",$text); 
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 10-10-2003, 12:00 AM   #3
BernhardG
Green Mole
 
BernhardG's Avatar
 
Join Date: Oct 2003
Location: PĆ¼ttlingen (Saar) - Germany
Posts: 8
Hi,


The "head" regex SHOULD filter everything between <head> and </head>. This includes every meta-tag! But there is a problem if someone do not set the </head> correct. The problem with most of the available http indexing search engines ist that they think that every site is using perfect html markup - but this is not realistic :-(

Bernhard
__________________
phpCMS - Content Management System
http://www.phpcms.de/
BernhardG is offline   Reply With Quote
Old 10-10-2003, 03:24 AM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Thanks, missed that line, and I even quoted it.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 10-10-2003, 06:21 AM   #5
Rolandks
Purple Mole
 
Rolandks's Avatar
 
Join Date: Sep 2003
Location: Kassel, Germany
Posts: 119
Quote:
Originally posted by BernhardG
But there is a problem if someone do not set the </head> correct.
Hmm, what is wrong at my <head> </head>

The Site with this HEAD is checked as Valid HTML 4.01 at www.W3C.org ?

-Roland-
Rolandks is offline   Reply With Quote
Old 10-10-2003, 07:05 AM   #6
BernhardG
Green Mole
 
BernhardG's Avatar
 
Join Date: Oct 2003
Location: PĆ¼ttlingen (Saar) - Germany
Posts: 8
Oh yes your markup is correct - My eyes were shut when I've wrote my reply, sorry!
But wrong markup is a general problem - every parser has this problem :-(
By the way:
Should'nt the line
$text = eregi_replace("<head[^<>]*>.*</head>"," ",$text);
look like this (as every other line in your quote):
$text = eregi_replace("<head[^>]*>.*</head>"," ",$text);
Or why we need this additional '<'?

Bernhard
__________________
phpCMS - Content Management System
http://www.phpcms.de/
BernhardG is offline   Reply With Quote
Old 10-23-2003, 12:54 AM   #7
Rolandks
Purple Mole
 
Rolandks's Avatar
 
Join Date: Sep 2003
Location: Kassel, Germany
Posts: 119
Okay, i have spend a little time again in this problem

phpdig is only indexing this two META-tags:

<META NAME="description" CONTENT="Informationen, and your description here is indexing">
<META NAME="keywords" CONTENT="This, keywords, here, are, indexing, allin, PHPDIG ">

admin\robot_functions.php (755):
PHP Code:
if (is_array($tags)) {
    if (isset(
$tags['description'])) {
      
$page_desc phpdigCleanHtml($tags['description']);
    }
    if (isset(
$tags['keywords'])) {
      
$page_keywords phpdigCleanHtml($tags['keywords']);
    }

Perhaps this is the problem - it is a feature which you can't dissable

Last edited by Rolandks; 10-23-2003 at 12:57 AM.
Rolandks is offline   Reply With Quote
Old 10-25-2003, 06:46 AM   #8
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. You should just be able to comment that piece of code out to remove the indexing of the two meta tags mentioned. As always, try it on a demo page first.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Adding proprietary meta tags/values to be spidered danwanner How-to Forum 2 03-03-2005 11:16 AM
PgpDig 1.8.3 wont index meta tags (description, leywords) darjanp Troubleshooting 0 11-14-2004 02:38 AM
Meta-Tags: Description and Keywords herberth How-to Forum 1 06-13-2004 01:45 AM
Exclude meta tags from text snippet guillemc How-to Forum 2 05-02-2004 11:14 PM
How are the Revisit-After META tags processed? sid Troubleshooting 1 11-22-2003 11:50 AM


All times are GMT -8. The time now is 01:47 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.