|
01-28-2005, 03:12 PM | #1 |
Green Mole
Join Date: Jan 2005
Location: New Jersey
Posts: 11
|
Indexing "<word>-<word>"?
I haven't found this in the docs or the FAQs (or anywhere else for that matter) so I'm asking here.
How do I get PHPDig to index two (or more) words with a hyphen in them as one search-item (as opposed to two seach-items)? For example: the web page contains "foo-bar". After indexing, I can search for "foo", "bar", "foo bar" but NOT "foo-bar". I'd like to be able to search for "foo-bar" as well. Suggestions? |
01-31-2005, 09:30 AM | #2 |
Green Mole
Join Date: Jan 2005
Location: New Jersey
Posts: 11
|
Here's what I've found so far:
According to the docs, dashes (and many other special characters) are allowed in indexes and searches since v1.8. Yet, in phpdig_functions.php there is a function called phpdigEpureText() that seems to be removing the special characters that the docs say are allowed. Ho, ho! There is also an entry in search_function.php that removes various characters from the search functionality! If you also remove the dash from $what_query_chars in this file and reindex, you can now search for words with dashes in them! At least it worked for me. |
02-02-2005, 08:42 PM | #3 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
The $what_query_chars variable negates a class of characters; the same goes for the phpdigEpureText function:
Code:
$what_query_chars = "[^".$phpdig_words_chars[PHPDIG_ENCODING]." \'.\_~@#$:&\%/;,=-]+"; if (eregi($what_query_chars,$query_to_parse)) { $query_to_parse = eregi_replace($what_query_chars," ",$query_to_parse); } $text = ereg_replace('[^'.$phpdig_words_chars[$encoding].' \'._~@#$:&%/;,=-]+',' ',$text); Try searching on t-shirts in the online demo. When PhpDig finds a word containing a dash in the chunk it's trying to process, it will try to highlight it. Also, try running the following query, and then search on some of the resultant words: Code:
# add your table prefix if needed SELECT keyword FROM keywords WHERE keyword LIKE '%-%'; Note that when processing search requests, PhpDig displays the DISPLAY_SNIPPETS_NUM number of snippets, so if you are searching on several words, as soon as PhpDig hits DISPLAY_SNIPPETS_NUM, it quits looking for things to highlight. Also, if you set DISPLAY_SNIPPETS to false and DISPLAY_SUMMARY to true, PhpDig will not consider DISPLAY_SNIPPETS_NUM and just display the first words of a page, highlighting only if the search words are within the first words of a page.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
02-04-2005, 07:18 AM | #4 |
Green Mole
Join Date: Jan 2005
Location: New Jersey
Posts: 11
|
Quote:
When you removed the dash from the class of characters, you essentially replaced the dash in a word with a space, so if you search on foo-bar, PhpDig will then search on foo and/or bar, not the whole word foo-bar. /Quote Okay, that explains why the results highlight "foo bar" and not "foo-bar". There is no "foo-bar" in the database tables. So how do I get phpDig to index "foo-bar"? I'm running phpDig 1.8.7. |
02-05-2005, 02:39 AM | #5 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
PhpDig v.1.8.7 should index foo-bar as a word, assuming that the dash is a literal dash and the word foo-bar isn't caught up in some JavaScript. Also, if the hyphened word is longer than MAX_WORDS_SIZE, then it won't get inserted into the database table as a keyword. Try making a demo page with some hyphened words, and after you index it, see if you can search and find the hyphened words.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
02-08-2005, 04:31 PM | #6 | |
Green Mole
Join Date: Jan 2005
Location: New Jersey
Posts: 11
|
Quote:
Go to http://www.linuxnj.com/search/search.php and search for "omni-kuff". No go. "omni","kuff" and "omni kuff" will work fine. I'm going to start whittling the page in question to see if it's something in the page... |
|
02-08-2005, 05:01 PM | #7 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Okay, thanks, I see what you mean. I was probably testing using a modified version by mistake. Anyway, in PhpDig v.1.8.7, find the phpdigCleanHtml function in robot_functions.php, look for the following line, and try removing the dash in the character class.
Code:
$text = eregi_replace("[*{}()\"\r\n\t-]+"," ",$text);
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
02-08-2005, 05:41 PM | #8 |
Green Mole
Join Date: Jan 2005
Location: New Jersey
Posts: 11
|
Perfect!
Thanks loads! Now let's see if I fixed the cron problem and everybody will be happy! :-) |
02-22-2005, 06:09 AM | #9 |
Orange Mole
Join Date: Jan 2005
Posts: 31
|
Hello
Same problem, bu not resolved. I've too 1.8.7 Make a search with "0-26-110318-0" (ISBN Number): http://www.john-howe.com/search/search.php? template_demo=phpdig.html&result_page=search.php... The indexed page: http://www.john-howe.com/portfolio/g...hp?image_id=76 The isbn number is under the pix. I can find it, but it's not display with the hyphen... How can I make this, to correct the displayed results? Regards, Dom PS: I drop the DB and reindex the site again to be sure, but doesn't see that had something to do with the hyphen case... |
02-22-2005, 05:38 PM | #10 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
02-23-2005, 12:01 AM | #11 |
Orange Mole
Join Date: Jan 2005
Posts: 31
|
Hello,
Me too i'm confused... I've on my robot_functions.php around line 147: Code:
function phpdigCleanHtml($text) { //htmlentities global $spec; //replace blank characters by spaces $text = eregi_replace("[*{}()\"\r\n\t]+"," ",$text); //$text = ereg_replace("[\r\n\t]+"," ",$text); // original and around line 138 in search_function.php: Code:
$what_query_chars = "[^".$phpdig_words_chars[PHPDIG_ENCODING]." \'.\_~@#$:&\%/;,=]+"; // epure chars \'._~@#$:&%/;,=- if (eregi($what_query_chars,$query_to_parse)) { $query_to_parse = eregi_replace($what_query_chars," ",$query_to_parse); } $query_to_parse = ereg_replace('(['.$phpdig_words_chars[PHPDIG_ENCODING].'])[\'.\_~@#$:&\%/;,=-]+($|[[:space:]]$|[[:space:]]['.$phpdig_words_chars[PHPDIG_ENCODING].'])','\1 \2',$query_to_parse); $query_to_parse = trim(ereg_replace(" +"," ",$query_to_parse)); // no more than 1 blank What I can't understand, it when I'm looking at the temp file in "text_content" folder, it so written without "-": Code:
...SIBLEY HarperCollinsPublishers ISBN 0 26 110318 0 September 2, 1994 R****m House Audio: The Two Towers CD... Can't understand... A lots of thx for your help and time. Regards, Dominqiue |
02-23-2005, 12:53 AM | #12 | ||
Head Mole
Join Date: May 2003
Posts: 2,539
|
This part is correct, no "-" after the "\t".
Quote:
Quote:
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
||
02-23-2005, 01:10 AM | #13 |
Orange Mole
Join Date: Jan 2005
Posts: 31
|
Hello,
Thx... but I've always the problem. I replace search_functions.php with the original 1.8.7 file. An keep the "robot_functions.php" without "-". I delete and reindex the page again and in my temp file, I again the ISBN code without "-": Code:
...SIBLEY HarperCollinsPublishers ISBN 0 26 110318 0 September 2, 199... http://www.john-howe.com/portfolio/g...hp?image_id=76 Sorry, but I'm really confused... Dom |
02-23-2005, 01:28 AM | #14 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Spidering in progress... [Stop spider] SITE : http://www.john-howe.com/ Exclude paths : - ads/ - cgi-bin/ - fataneh/gallery/admin/ - flash/ - forum/ - guestbook/ - linkchecker/ - links/ - links/admin/ - mailinglist/ - news/pm/ - portfolio/gallery/admin/ - search/ - stuff/gallery/admin/ - webmail/ 1:http://www.john-howe.com/portfolio/gallery/details.php?image_id=76 (time : 00:00:13) No link in temporary table links found : 1 http://www.john-howe.com/portfolio/gallery/details.php?image_id=76 Optimizing tables... Indexing complete ! [Back] to admin interface. Results 1-1, 1 total, on "ISBN" (0.05 seconds) 1. [100.00 %] :// John Howe :: Illustrator [ Portfolio ] / From Hobbiton to Mordor / Gandalf Before the Walls of Minas Tirith limit to http://www.john-howe.com/, this path : portfolio/gallery/ ...994 The Map of Tolkien's Middle-Earth Brian SIBLEY HarperCollinsPublishers ISBN - 0-26-110318-0 September 2, 1994 R****m House Audio: The Two Towers -... Results 1-1, 1 total, on "0-26-110318-0" (0.02 seconds) 1. [100.00 %] :// John Howe :: Illustrator [ Portfolio ] / From Hobbiton to Mordor / Gandalf Before the Walls of Minas Tirith limit to http://www.john-howe.com/, this path : portfolio/gallery/ ... Map of Tolkien's Middle-Earth Brian SIBLEY HarperCollinsPublishers ISBN - 0-26-110318-0 September 2, 1994 R****m House Audio: The Two Towers - CD fro... The only thing changed was: Code:
//replace foo characters by space $text = eregi_replace("[*{}()\"\r\n\t-]+"," ",$text); Code:
//replace foo characters by space $text = eregi_replace("[*{}()\"\r\n\t]+"," ",$text);
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
02-23-2005, 01:46 AM | #15 |
Orange Mole
Join Date: Jan 2005
Posts: 31
|
I drop database, folder, all. and I've made a fresh install with only the change into robot_functions.php and... nothing..
Always the damn same! You're version is 1.8.8 rc1 no? My version is 1.8.7, maybe that's the point... Dont's know. I'm the only one with that problem with my version? I can't upgrade to 1.8.8 rc1 due to my host DB version... A bug into the 1.8.7? Regards, Dom PS: I'm really sorry to bother you with that. |
Thread Tools | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Selective Indexing of URL Containing a <keyword> | Leith | How-to Forum | 0 | 01-21-2008 03:16 AM |
<!-- phpdigInclude --> and <!-- phpdigExclude --> doesn`t work | Paka76 | How-to Forum | 0 | 12-06-2005 06:44 AM |
search for "hold" not matching on the word foothold | mingus | Troubleshooting | 2 | 06-02-2004 09:54 PM |
Instructions for use <!-- phpdigExclude --> and <!-- phpdigInclude --> | maquido | How-to Forum | 1 | 06-02-2004 04:36 AM |
< phpdigInclude > | oliviert | Troubleshooting | 12 | 05-19-2004 03:13 AM |