PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   Troubleshooting (http://www.phpdig.net/forum/forumdisplay.php?f=22)
-   -   Too few pages indexed, Umlaut problem (http://www.phpdig.net/forum/showthread.php?t=1638)

salzbermat 12-15-2004 05:53 AM

Too few pages indexed, Umlaut problem
 
Hi there,

just upgraded from 1.6.0 to 1.8.5

The site contains about 800 pages, now all of a sudden only 250 are indexed. I made sure the max level of depth and links is set to 20 in both the index admin panel and the config file, but to no avail. I am making heavy use of phpdiginclude and exclude comments in the "middle" of the code, this hasn't changed though. What might be the problem?

Secondly, when at the beginning of a title oder description string, Umlauts (e.g. Ä or Ä as HTML entity) are displayed in lower case even if they're upper case.

Any clue?

Thanks,
Bernd

Charter 12-15-2004 06:27 AM

1) Search depth to large number, links per to zero, LIMIT_TO_DIRECTORY to false.

2) In config.php find:
PHP Code:

"&auml" => "ä"

And afterwards add:
PHP Code:

"&Auml" => "Ä",
"&Euml" => "Ë",
"&Iuml" => "Ï",
"&Uuml" => "Ü"

In robot_functions.php find:
PHP Code:

//tries to replace htmlentities by ascii equivalent
foreach ($spec as $entity => $char) {
      
$text eregi_replace ($entity."[;]?",$char,$text);
      
$title eregi_replace ($entity."[;]?",$char,$title);


And beforehand add:
PHP Code:

//tries to replace htmlentities by ascii equivalent
foreach ($spec as $entity => $char) {
      
$text ereg_replace ($entity."[;]?",$char,$text);
      
$title ereg_replace ($entity."[;]?",$char,$title);



salzbermat 12-15-2004 08:38 AM

Thanks a lot! Works great!

oli 12-16-2004 08:24 AM

As of 1.8.6 more entities are shown wrong in the search results. So I digged around in the code and came across the following question:

Why do you use the custom $spec array instead of just reversing the function of htmlentities?

e.g. replace your existing code in robot_functions.php:

Code:

// first case-sensitive and then case-insensitive
//tries to replace htmlentities by ascii equivalent

foreach ($spec as $entity => $char) {
      $text = ereg_replace ($entity."[;]?",$char,$text);
      $title = ereg_replace ($entity."[;]?",$char,$title);
}
//tries to replace htmlentities by ascii equivalent
foreach ($spec as $entity => $char) {
      $text = eregi_replace ($entity."[;]?",$char,$text);
      $title = eregi_replace ($entity."[;]?",$char,$title);
}

With this:

Code:

$trans = get_html_translation_table(HTML_ENTITIES, ENT_QUOTES);
$trans = array_flip($trans);
$text = strtr($text, $trans);
$title = strtr($title, $trans);

Using PHP4.3 and later, you could even make use of the new html_entity_decode() function.

Charter 12-16-2004 10:00 AM

>> Why do you use the custom $spec array instead of just reversing the function of htmlentities?

Because in a land long, long ago and far, far away... HTML page content may not be in correct form, and & # 039; versus & # 39; (without spaces) may cause an issue.
PHP Code:

$text "Ä ä &Auml &auml"// and so forth

$trans get_html_translation_table(HTML_ENTITIESENT_QUOTES);
$trans array_flip($trans);
$text strtr($text$trans);

echo 
$text// prints Ä ä &Auml &auml 

So you specify them in the $spec array, and PhpDig "tries to replace htmlentities by ascii equivalent." Just add to the $spec array those entities you want translated, and PhpDig should do the rest. Of course TMTOWTDI.


All times are GMT -8. The time now is 06:47 AM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.