PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   How-to Forum (http://www.phpdig.net/forum/forumdisplay.php?f=33)
-   -   spidering links but not their text (http://www.phpdig.net/forum/showthread.php?t=383)

haxored 01-13-2004 06:41 AM

spidering links but not their text
 
I've got a bunch of pages with a navigation menu on it, as well as some text I don't want indexed.

I want to exclude everything but a <div> tag that contains all the text I want to index.. However...

it seems that if I do that, all of the links in my navigation menu aren't spidered. That is not helpful.

How can I get phpDig to spider my entire site, but only index one section of every page?

Charter 01-14-2004 10:32 AM

Hi. You might try using PHPDIG_EXCLUDE_COMMENT and PHPDIG_INCLUDE_COMMENT from the config file, each on their own line, to exclude a portion of a page. Depending on your navigation menu, it might not be getting indexed because PhpDig excludes certain tags from index. If this is the case, you might try setting up a simple HTML page with links to your site, and then crawl this page. Once the crawl is done, you can delete the simple page from the admin panel.

MaXius 01-27-2004 02:23 AM

I have a similar problem,

I have a pulldown menu, (div tags) and phpdig pulls all the menu options into the search results, hence most words you would search for are on alll pages. in a big ugly mess.

Does phpdig still crawl the links within a PHPDIG_EXCLUDE_COMMENT area even tho it doesnt put these into its database?

Thanks

Charter 01-27-2004 09:58 AM

Hi. The exclude/include comments are for omitting parts of a page from indexing.

MaXius 01-27-2004 12:49 PM

Yeh, gathered that... does it still spider the links within an excluded part tho...?

Ta

Charter 01-27-2004 05:01 PM

Hi. Using the below simple example, PhpDig works as follows:
Code:

<html>
<body>
This text is indexed
<!-- phpdigExclude -->
<a href="http://www.this_link.com/is_followed.html">This text is ignored</a>
This text is also ignored
<!-- phpdigInclude -->
</body>
</html>

To change this behavior, the phpdigExplore and/or phpdigIndexFile functions in robot_functions.php could be modified.


All times are GMT -8. The time now is 08:21 PM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.