PDA

View Full Version : Can't index a table construct


RedThypon
11-26-2003, 02:50 PM
Hello,

I'm using:
PhpDig Version 1.6.4
Php Version 4.3.2
Apache Version 2.0.46
Linux


I'm having the problem that PhpDig can't find words which are inside of a table construct.

For example:
The html code is like this:
<table><tr>
<td>word</td>
<td second word</td>
</tr></table>

If I let PhpDig search for word or second, then I get the message "no results found"

I have 3 or 4 pages which have tables inside, how can I get PhpDig to index them correctly and find the words inside the tables?


Thank you for your answers

yours
RedThypoon

Charter
11-26-2003, 03:26 PM
<table><tr>
<td>word</td>
<td second word</td>
</tr></table>

Hi. Maybe just a typo but can you post the HTML here for a look? Also, does 'word' happen to be in the common words file?

RedThypon
11-26-2003, 03:43 PM
:)
word is just an example for some text.

I can't post the hole code, it is to much,
but the hole code is validated by w3c.
So this would be the code with only showing the table.

If you like to see the whole code, visit
http://www.redthypoon.de/walrus
and choose "On Stage" from the main menu and then "Marktplatz" from the menu in the window".

here's the code:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="de" xml:lang="de">

<head>
<title>Walrus Kultur e. V.</title>
<meta http-equiv="Content-Style-Type" content="text/css" />
<link rel="stylesheet" type="text/css" href="includes/style.css" />
<link href="http://www.walrus-kultur-ev.de/favicon.ico" rel="SHORTCUT ICON" />

</head>

<body>
<div id="body" style="color:#ff0000; background:url(<?php echo $pfad; ?>images/onstage.jpg);">
<div class="titel">
Marktplatz</div>
<div id="inhalt">
<div style="margin:0px 20px 0px 0px; text-align:right;">Stand: 30.09.2003</div>
<div style="margin:10px 0px 0px 0px; color:white;">
<table border="0" cellspacing="3">
<colgroup>
<col width="90" />
<col width="92" />
<col width="200" />
<col width="295" />
<col width="112" />
</colgroup>
<tr>
<td style="color:#ffff00; background:#ff0000; font-size:1.3em;" colspan="5">gesucht wird</td>
</tr>

<tr style="background:#808080; font-weight:bold">
<td><b>Chiffré-Nr.</b></td>
<td><b>Datum</b></td>
<td><b>Bezeichnung</b></td>
<td><b>Beschreibung</b></td>
<td><b>Kontakt</b></td>
</tr>
<tr style="vertical-align:top;">
<td>2-b-ons</td>
<td></td>
<td>Rivera Gitarrenverstärker</td>
<td>im SKS-Case. Einbau in 19"-Rack möglich. Edler 100 Watt Gitarrenverstärker aus den USA ------- VHB 900,- &euro;</td>
<td><a href="mailto:bla@bla.de">bla@bla.de</a></td>
</tr>
</table>
</div>
</div>
</div>
</body>
</html>


thank you for your help

yours
RedThypoon

Charter
11-26-2003, 04:02 PM
Hi. Yes, I understand. ;)

Maybe there is a typo in the HTML that is causing that block to be ignored. Can you post the HTML?

RedThypon
11-26-2003, 04:05 PM
sorry, i forgot,
edited my post.

Charter
11-26-2003, 04:22 PM
Hi. I just indexed http://www.redthypoon.de/walrus/index.php?mnuid=198 at one level and then searched for the word 'Kontakt' and obtained 14 results. What word(s) do not show up in your search?

RedThypon
11-27-2003, 05:28 AM
He doesn't show up:
Rivera
Gitarre
Gitarrenverstärker

another page with this problem is:
http://www.redthypoon.de/walrus/index.php?mnuid=189

He doesn't show up the names of the people or their function, like:
Sascha Schabacker
Vorsitzender
Kasse


thank you

yours
RedThypoon

Charter
11-27-2003, 06:57 AM
Hi. I crawled the link in your last post and can find, for example, Vorsitzender but I cannot find Litfaßsäule when I do a 'words begin' or 'exact words' search. However, when I do an 'any words part' search for Litfaßsäule, I get Litfaßsäule in the results. Please apply the patch in this (http://www.phpdig.net/showthread.php?threadid=247) thread to fix the highlighting issue, but this does seem like a character encoding problem. I'll need to do more checking on this issue. Thanks for bringing it to my attention.

Charter
11-27-2003, 07:25 AM
Hi. I figured out the Litfaßsäule issue. The charcater ß was not allowed in the searches. My bad! As a temporary fix, do the following. I'll come up with something better in the next release.

In search_function.php find:

if (eregi("[^[:alnum:]^ +^-]+",$query_to_parse)) { $query_to_parse = eregi_replace("[^[:alnum:]^ ]+"," ",$query_to_parse); }

and replace with:

if (eregi("[^[:alnum:]^ +^-^ß]+",$query_to_parse)) { $query_to_parse = eregi_replace("[^[:alnum:]^ ]+"," ",$query_to_parse); }


This still doesn't answer why Vorsitzender shows in searches for me but not for you. Now I'm thinking this is not a character encoding issue, but rather something to do with stored keywords.

When you run the below query what do you get?

SELECT * FROM keywords WHERE keyword like 'vo%';

RedThypon
11-27-2003, 07:36 AM
Hi, thanks for the solutions with the ß.

You can find Vorsitzender, because it is located on 2 Pages.
the word Vorsitzender is also within this page:
http://www.redthypoon.de/walrus/index.php?mnuid=189

and this is the problem I mentioned first. He can't find this page. He finds only the second page. I suppose, because Vorsitzender is within a table-construct on the page he can't find

When I run the SQL-Code I get this:
key_id twoletters keyword
Edit Delete 3577 vo voices
Edit Delete 3545 vo volker
Edit Delete 3298 vo voll
Edit Delete 3643 vo vordergrund
Edit Delete 3538 vo vorerst
Edit Delete 3121 vo vorname
Edit Delete 3045 vo vorsitzender
Edit Delete 3037 vo vorstand


Thank you for your help

yours
RedThypoon

Charter
11-27-2003, 07:42 AM
Hi. I am able to find Schabacker so I don't think it's the table-construct. Hmm, I wonder what's different.

RedThypon
11-27-2003, 07:45 AM
Sorry, you are to fast for me, or I don't think before I write :).

Please read my post above your last again, I edited it.

don't mention on the word Schabacker, it is on the same pages as Vorsitzender, so it is the same problem.

thanks

Charter
11-27-2003, 07:57 AM
Hi. Can you make a page like so and then crawl it?

<html>
<body>
Rivera Gitarre Gitarrenverstärker Sascha Schabacker Vorsitzender Kasse
</body>
</html>

Do you get search results with this simple page?

RedThypon
11-27-2003, 08:18 AM
Yes, in this simple page, he finds the words

Charter
11-27-2003, 08:52 AM
Hi. Attached is a screenshot of the http://www.redthypoon.de/walrus/index.php?mnuid=189 page. Does the page look the same as it does in your browser?

When you crawl this site, do you get any 'duplicate' page notices?

RedThypon
11-27-2003, 08:55 AM
Yes, it looks the same, and yes I get the 'duplicate' notices

Charter
11-27-2003, 08:57 AM
Are the duplicate notices for the pages that contain the words you cannot find?

RedThypon
11-27-2003, 09:06 AM
No, this was the first thing I had a look at.

Charter
11-27-2003, 09:11 AM
What's the link to your PhpDig search page?

RedThypon
11-27-2003, 09:41 AM
Do you mean on RedThypoon.de/walrus?

There is no search page, I develop the page on my local system. Yesterday I had no time to upload it, but now I will do.
The only difference between the local and the online version is, that DigPhp is only installed on the local.

I will upload the page now and post a message when it's ready.


thanks

yours redthypoon

Charter
11-27-2003, 10:06 AM
Okay, thanks. Maybe I will see something when I search your site.

You probably already checked, but did all of these links get indexed? Also, how many text files show when you do grep Vorsitzender * in the text_content directory?

RedThypon
11-27-2003, 10:18 AM
Ohhhhhh aha,

I don't know what to say.

I uploaded the page and let PhpDig crawl online.

Tadaa, it finds every word.

I am very sorry, because I have wasted your time :(

The next thing I do is finding out, why it doesn't run on my local server, to avoid such failures in the future.

Please excuse me.

You did / do a great work, and I am really glad about your support.

I hope, that I can get your help again in the future, after this mistake.

From now on, I will test the homepage online.
Shame on me!

Thank you for everything

yours
RedThypoon

Charter
11-27-2003, 10:24 AM
Hi. No problem at all. Besides, your posts led me to the issue with ß. If you find out why it didn't work on your local server, please post your findings. Others might have the same problem, and your findings could help them. :)

RedThypon
11-27-2003, 10:45 AM
Hmm,

more shame on me, I found my failure.

I used the PHPDIG_EXCLUDE_COMMENT and the PHPDIG_INCLUDE_COMMENT and I was shure, that I set them on the right positions in my code.
I removed them before I uploaded the page.

I used them, because I wanted to forbid PhpDig to index the submenu (the menu in the window). I think yesterday it was to late for me. Next time I better sleep a night and think about what I am doing.

Be sure, next time I post a problem, I will have thinked about it for a couple of days.

I don't know why I didn't delete this comments earlier.

Hope, that peoples who read this topic, can learn something of it. I have learned.

Thank you again for everything.
You are giving the best support I know.

yours
RedThypoon

Charter
11-27-2003, 10:55 AM
Thanks, but don't feel bad. I should have known about ß and other such characters. That was a silly mistake on my part, but we all make mistakes. Anyway, I don't mind at all that people post questions. If you have questions, go ahead and ask. :)