PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 12-15-2004, 05:53 AM   #1
salzbermat
Green Mole
 
Join Date: Dec 2003
Posts: 5
Too few pages indexed, Umlaut problem

Hi there,

just upgraded from 1.6.0 to 1.8.5

The site contains about 800 pages, now all of a sudden only 250 are indexed. I made sure the max level of depth and links is set to 20 in both the index admin panel and the config file, but to no avail. I am making heavy use of phpdiginclude and exclude comments in the "middle" of the code, this hasn't changed though. What might be the problem?

Secondly, when at the beginning of a title oder description string, Umlauts (e.g. Ä or Ä as HTML entity) are displayed in lower case even if they're upper case.

Any clue?

Thanks,
Bernd
salzbermat is offline   Reply With Quote
Old 12-15-2004, 06:27 AM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
1) Search depth to large number, links per to zero, LIMIT_TO_DIRECTORY to false.

2) In config.php find:
PHP Code:
"&auml" => "ä"
And afterwards add:
PHP Code:
"&Auml" => "Ä",
"&Euml" => "Ë",
"&Iuml" => "Ï",
"&Uuml" => "Ü"
In robot_functions.php find:
PHP Code:
//tries to replace htmlentities by ascii equivalent
foreach ($spec as $entity => $char) {
      
$text eregi_replace ($entity."[;]?",$char,$text);
      
$title eregi_replace ($entity."[;]?",$char,$title);

And beforehand add:
PHP Code:
//tries to replace htmlentities by ascii equivalent
foreach ($spec as $entity => $char) {
      
$text ereg_replace ($entity."[;]?",$char,$text);
      
$title ereg_replace ($entity."[;]?",$char,$title);

__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-15-2004, 08:38 AM   #3
salzbermat
Green Mole
 
Join Date: Dec 2003
Posts: 5
Thanks a lot! Works great!
salzbermat is offline   Reply With Quote
Old 12-16-2004, 08:24 AM   #4
oli
Green Mole
 
Join Date: Mar 2004
Posts: 1
As of 1.8.6 more entities are shown wrong in the search results. So I digged around in the code and came across the following question:

Why do you use the custom $spec array instead of just reversing the function of htmlentities?

e.g. replace your existing code in robot_functions.php:

Code:
// first case-sensitive and then case-insensitive
//tries to replace htmlentities by ascii equivalent

foreach ($spec as $entity => $char) {
      $text = ereg_replace ($entity."[;]?",$char,$text);
      $title = ereg_replace ($entity."[;]?",$char,$title);
}
//tries to replace htmlentities by ascii equivalent
foreach ($spec as $entity => $char) {
      $text = eregi_replace ($entity."[;]?",$char,$text);
      $title = eregi_replace ($entity."[;]?",$char,$title);
}
With this:

Code:
$trans = get_html_translation_table(HTML_ENTITIES, ENT_QUOTES);
$trans = array_flip($trans);
$text = strtr($text, $trans);
$title = strtr($title, $trans);
Using PHP4.3 and later, you could even make use of the new html_entity_decode() function.
oli is offline   Reply With Quote
Old 12-16-2004, 10:00 AM   #5
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
>> Why do you use the custom $spec array instead of just reversing the function of htmlentities?

Because in a land long, long ago and far, far away... HTML page content may not be in correct form, and & # 039; versus & # 39; (without spaces) may cause an issue.
PHP Code:
$text "Ä ä &Auml &auml"// and so forth

$trans get_html_translation_table(HTML_ENTITIESENT_QUOTES);
$trans array_flip($trans);
$text strtr($text$trans);

echo 
$text// prints Ä ä &Auml &auml 
So you specify them in the $spec array, and PhpDig "tries to replace htmlentities by ascii equivalent." Just add to the $spec array those entities you want translated, and PhpDig should do the rest. Of course TMTOWTDI.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Spider stops before all pages are indexed halide Troubleshooting 3 07-19-2005 12:26 AM
pages indexed jmitchell The Mole Hole 8 02-15-2005 12:23 PM
Pages not re-indexed wx3 Troubleshooting 0 09-16-2004 05:53 PM
Number of pages indexed claudiomet How-to Forum 0 08-30-2004 02:26 PM
how to index only not indexed pages? zaartix How-to Forum 2 07-14-2004 04:23 AM


All times are GMT -8. The time now is 01:09 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.