View Full Version : Too few pages indexed, Umlaut problem
salzbermat
12-15-2004, 05:53 AM
Hi there,
just upgraded from 1.6.0 to 1.8.5
The site contains about 800 pages, now all of a sudden only 250 are indexed. I made sure the max level of depth and links is set to 20 in both the index admin panel and the config file, but to no avail. I am making heavy use of phpdiginclude and exclude comments in the "middle" of the code, this hasn't changed though. What might be the problem?
Secondly, when at the beginning of a title oder description string, Umlauts (e.g. Ä or Ä as HTML entity) are displayed in lower case even if they're upper case.
Any clue?
Thanks,
Bernd
Charter
12-15-2004, 06:27 AM
1) Search depth to large number, links per to zero, LIMIT_TO_DIRECTORY to false.
2) In config.php find:
"ä" => "ä",
And afterwards add:
"Ä" => "Ä",
"Ë" => "Ë",
"Ï" => "Ï",
"Ü" => "Ü",
In robot_functions.php find:
//tries to replace htmlentities by ascii equivalent
foreach ($spec as $entity => $char) {
$text = eregi_replace ($entity."[;]?",$char,$text);
$title = eregi_replace ($entity."[;]?",$char,$title);
}
And beforehand add:
//tries to replace htmlentities by ascii equivalent
foreach ($spec as $entity => $char) {
$text = ereg_replace ($entity."[;]?",$char,$text);
$title = ereg_replace ($entity."[;]?",$char,$title);
}
salzbermat
12-15-2004, 08:38 AM
Thanks a lot! Works great!
As of 1.8.6 more entities are shown wrong in the search results. So I digged around in the code and came across the following question:
Why do you use the custom $spec array instead of just reversing the function of htmlentities?
e.g. replace your existing code in robot_functions.php:
// first case-sensitive and then case-insensitive
//tries to replace htmlentities by ascii equivalent
foreach ($spec as $entity => $char) {
$text = ereg_replace ($entity."[;]?",$char,$text);
$title = ereg_replace ($entity."[;]?",$char,$title);
}
//tries to replace htmlentities by ascii equivalent
foreach ($spec as $entity => $char) {
$text = eregi_replace ($entity."[;]?",$char,$text);
$title = eregi_replace ($entity."[;]?",$char,$title);
}
With this:
$trans = get_html_translation_table(HTML_ENTITIES, ENT_QUOTES);
$trans = array_flip($trans);
$text = strtr($text, $trans);
$title = strtr($title, $trans);
Using PHP4.3 and later, you could even make use of the new html_entity_decode() function.
Charter
12-16-2004, 10:00 AM
>> Why do you use the custom $spec array instead of just reversing the function of htmlentities?
Because in a land long, long ago and far, far away... HTML page content may not be in correct form, and & # 039; versus & # 39; (without spaces) may cause an issue.
$text = "Ä ä Ä ä"; // and so forth
$trans = get_html_translation_table(HTML_ENTITIES, ENT_QUOTES);
$trans = array_flip($trans);
$text = strtr($text, $trans);
echo $text; // prints Ä ä Ä ä
So you specify them in the $spec array, and PhpDig "tries to replace htmlentities by ascii equivalent." Just add to the $spec array those entities you want translated, and PhpDig should do the rest. Of course TMTOWTDI.
vBulletin® v3.7.3, Copyright ©2000-2025, Jelsoft Enterprises Ltd.