PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   How-to Forum (http://www.phpdig.net/forum/forumdisplay.php?f=33)
-   -   limit search to contents of HTML tags? (http://www.phpdig.net/forum/showthread.php?t=1628)

beesman 12-14-2004 02:31 AM

limit search to contents of HTML tags?
 
Hi all,

I'm testing PhpDig for the first time, & while this forum is a great resource, having trawled through all the messages I can't find a solution to my problem, so any help would be greatly appreciated.

Say I have a number of HTML files with the same structure, e.g. articles with a title in <h2></h2> tags, sub-heading in <h3></h3> tags & the main content in <p class="main"></p> paras. Is it possible to set up PhpDig so that, for example, users can query title text only? Or is there an indexing solution to this issue?

Thanks in advance.

vinyl-junkie 12-14-2004 03:24 AM

Welcome to the forum, beesman. :D

Searching by title within a page is not something can phpdig was designed to do. I don't know how much interest there would be in doing so, but phpdig could probably be easily modified to search by web page titles, but that's probably the only type of change like this that Charter would be willing to make.

Hope this helps.

beesman 12-14-2004 06:24 AM

Hi, & thanks for the speedy reply :)

Just say no if it's a request too far, but could you point me in the direction of the relevant file &/or chunk of code that I'd have to play with?

Many thanks

vinyl-junkie 12-14-2004 06:02 PM

Look at admin/spider.php. That is probably what you'd need to modify to make the kind of search index you want.

Hope this helps. :)

Spider 12-15-2004 02:41 AM

@beesman: you are one day ahead of me posting this question. I will look in to it, but I'm a php-newbie. If you or somebody writes the solution I like to use it too.

Charter 12-15-2004 02:53 AM

Look at the phpdigCleanHtml function in the robot_functions.php file.

Spider 12-15-2004 07:05 AM

I placed this in robot_functions.php at line 161

Code:

$text = eregi_replace("<td[^>]*>.*</td>"," ",$text);
Because all my content is between td-tags, I thought then phpdig would show me nothing. But phpdig still finds everything. Did I make a mistake?

Charter 12-15-2004 07:23 AM

Did you reindex?

Spider 12-15-2004 08:06 AM

Yes, emptied the database and reindexed.

Charter 12-15-2004 08:17 AM

So you have the following?
PHP Code:

$text eregi_replace("<td[^>]*>.*</td>"," ",$text);
$text preg_replace("/<[\/\!]*?[^<>]*?>/is"," ",$text); 

The first removes stuff between <td...> and </td> (according to CHUNK_SIZE) and the second removes other tag-like things, so you don't really need the first one. If you want to exclude part of a page, look at this thread or look at how $title is set in the phpdigCleanHtml function in the robot_functions.php file.

Spider 12-15-2004 11:41 AM

Thanks Charter, the phpdigExclude and phpdigInclude does it for me! I didn't see that function till now.

:santa:


All times are GMT -8. The time now is 08:15 PM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.