PDA

View Full Version : Extracting H2 tag


mdavila
09-13-2005, 04:16 PM
Hi,

I have added the code you suggested to the robot_functions.php to pull the h2 tag instead of the title tag. It works but the problem is that it is pulling both the first and second h2 tags.

This is the code i pasted in:

//extracts title
if (preg_match_all('/< *h2 *>(.*?)< *\/ *h2 *>/is',$text,$regs,PREG_SET_ORDER)) {

// assumes there are at least three h2 tags
$title = trim($regs[0][1]." ".$regs[1][1]." ".$regs[2][1]);
}
else {
$title = "";
}

The results is showing " Contact UsContact Us"

On this page there are 2 h2 tags. http://dobleweb1.doble.com/contactus/ but i only want to show the second one.

Any suggestions?

Thanks,

-Marc

Charter
09-14-2005, 03:59 AM
If you only want the second H2 tag try:

$title = trim($regs[1][1]);

Instead of the following:

$title = trim($regs[0][1]." ".$regs[1][1]." ".$regs[2][1]);

mdavila
09-14-2005, 08:02 AM
When i try that. It brings up "Untitled" and "search.php" for most of them

http://doble.phpslave.com/search.php

-Marc

Charter
09-14-2005, 08:57 AM
Are you using the following?

if (preg_match_all('/< *h2 *>(.*?)< *\/ *h2 *>/is',$text,$regs,PREG_SET_ORDER)) {
// assumes there are exactly two h2 tags
$title = trim($regs[1][1]);
}
else {
$title = "";
}

mdavila
09-14-2005, 09:38 AM
Here is the code

//extracts title
if (preg_match_all('/< *h2 *>(.*?)< *\/ *h2 *>/is',$text,$regs,PREG_SET_ORDER)) {
$title = trim($regs[1][1]);
}
else {
$title = "";
}

Charter
09-14-2005, 10:26 AM
Keep that code and increase CHUNK_SIZE in the config file, maybe 4096 will do. If not, try another increase so to get the two H2 tags in the same chunk.

mdavila
09-14-2005, 01:50 PM
That seems to have done the trick!

Thanks,
-Marc :o