PDA

View Full Version : Invision power board - spidering doesn't work?


george
11-26-2003, 10:48 PM
Hi everyone,

I installed phpdig on my site recently and was very happy with it until i did a search for the word "beer" to test my search engine. I know the word only appears in one thread in my Invision power board forum. However I get 193 hits.

The worst thing though, is that if you click on the first result, the page it takes you to does not include the word "beer" (it takes you to the members list instead).

So can anyone tell me what is wrong and how I can get the spider to work correctly. Otherwise I am very happy with phpdig (thanks charter). Will I have to exclude the forum from the search?

george.

Charter
11-27-2003, 08:19 AM
Hi. That's a lot of beer. ;)

In the config.php file, set the following:

define('PHPDIG_DEFAULT_INDEX',false);
define('PHPDIG_SESSID_REMOVE',true);
define('PHPDIG_SESSID_VAR','s');

Then do the following to start from scratch:

empty all the PhpDig database tables
delete all files that may be in the temp dir
delete all files in the text_content dir except keepalive.txt
index the site

When you do the above, how much beer shows up now?

george
11-29-2003, 04:20 AM
Hi Charter,

I followed your advice. I am currently re-indexing my site (been going for about 1 hour now). I am indexing to a depth of 5.... is this OK? I was not really sure.

If I do a search for "beer" I get 119 hits for beer. Again, I have the same problem where the returned pages do not actually include the word "beer".

Anything else I can try?

Best Regards, George

george
11-29-2003, 04:30 AM
Hi Charter, In the meantime I have just deleted the /forums folder from the admin panel so i don't confuse my users. So if you try the search engine, there are no beer hits now. But trust me the problem was still there.
Thanks George

Charter
11-29-2003, 10:09 AM
Hi. Not sure, but I'm wondering if there is some problem with .js, .cab, and .swf files. Can you try adding js, cab, and swf into the FORBIDDEN_EXTENSIONS in the config file? Also, if this doesn't solve the problem, can you set me up with a demo IPB somewhere on your site and add a couple of posts so that I can crawl there? Do you recall if all the extra beer links went to the members list? You might try crawling http://www.domain.com/forums/ at a level of two, level one for the forum links and level two for the thread links.

george
11-29-2003, 04:39 PM
Hi Charter,

I added those extensions to the forbidden list (plus also .ico) since i noticed it spiders my fav.ico file and this is not necessary. I will respider and see what happens.

To answer your beer question. No, each link seemed to take me to a different page but none of them included the word beer.

Regards, george

george
11-29-2003, 05:35 PM
Hi Charter,

I respidered and still the same problem. I am not sure how to set up another IPB forum for you to play with.

I have left the spider results so you can see what it says.

Thanks, George

Charter
12-01-2003, 11:55 AM
Hi. The problem it seems is that for almost every http://www.domain.com/forums/index.php?different=query&string=here the text from http://www.domain.com/forums/index.php is returned. It almost seems like some sort of redirect. Anyway, I can duplicate the problem with your site, but other Invision Power Boards seem to index without this trouble. Maybe there are some settings in the IPB to allow for spiders?

george
12-01-2003, 02:10 PM
Yes there are various spider options in the Admin panel of Invision Power Board.

The board should recognise popular spiders such as hotbot and googlebot and will log their activitity. Also, you can set the privileges for the bots. So I have my privileges set to "guest". Which means that the bot can see any page that a normal guest can see. I am getting spidered by hotbot and googlebot quite often and seems to work ok.

It is weird that other IPB's work cause I have made very few modifications. I have added google ads to mt board wrapper and changed a few images and otherwise it is a standard invision board version 1.2

Here is some info from the IPB help pages:

>>>>>>

Search Engine Spiders
Enable the search engine spider recognition? - Yes/No toggle

Log all spider visits? - Yes/No toggle

Treat spider/bot as part of which group? - Choose a group that these bots are to be shown under when they index the board.

Show spider/bot in the active users list? - Choose whether they will be shown as anonymous. Yes/No toggle also.

Call Googlebot... - Name of the Googlebot, which will be shown on the active member list.

Call Microsoft / Hotbot... - Name of the Microsoft / Hotbot bot, which will be shown on the active member list.

Call Lycos... - Name of the Lycos bot, which will be shown on the active member list.

Call Ask Jeeves... - Name of the Ask Jeaves bot, which will be shown on the active member list.

Call What U Seek... - Name of the What U Seek bot, which will be shown on the active member list.

Link to this page: http://www.invisionpower.com/documentation/showdoc.php?page=31

Charter
12-01-2003, 02:57 PM
Hi. Are you able to crawl other IBP forums without experiencing the problem that appears from your boards?

george
12-02-2003, 03:49 AM
The spidering of the threads that contain "showforum" and "showtopic" are OK. I don't really want it to spider anything else.

It would good to be able to limit spidering depending on the words in the url. Or conversely, to refuse to spider pages that contain certain phrases in the url.

I will try to spider some other boards and see what happens.

Thanks charter.
george

george
12-02-2003, 04:10 AM
I just spidered a few pages at a different site.

I did a search for the phrase "connection speed" which is included in one of the spidered pages and it worked fine. You can try it. I will leave it on my board.

With my forum, if you do a search for the word "chips" you will get a similar problem to "beer".

Thanks, George

Charter
12-04-2003, 02:55 PM
Hi. I have not yet been able to come up with a reason why this problem appears with your IPB but not other IPBs. :(

george
12-04-2003, 07:15 PM
Hi Charter,

I have recently realised I have another problem that might be related. I was trying to install those Google ads on my IPB forum and the session ID in the url was interefering with the delivery of relevant ads. Then I found out that Internet Exploer was often not accepting cookies properly from my site. So if the cookies were not accepted it would include a session ID in the url. But now if i get the cookie working ok (by messing around with IE privacy settings) then my urls no longer contain the SESSION IDs because the session data is stored in the cookie instead.

So i wonder if now i am freeing myself of session IDs in url's maybe phpDig will work better. I will try to re-spider next time i have time and let u know if things are better.

If i can't get it to work it is not the end of the world cause the IPB forum has a search function anyway. i still think phpDig is very good.

Thanks again, Glen

george
12-05-2003, 07:37 AM
Hi,

I am re-spidering and the problem seems to be fixed!

so how did I do it?

Well I did a few things so not sure which achieve the end result:

1) I modified my privacy settings in internet explorer so that cookies from my website were put on the safe list. (n.b. I am running the spider.php file from within Internet Explorer)

2) I fiddled with the cookie settings in the IPB control panel. My new settings are available in the attached image.

I used to have a "cookie name prefix" set but i now just leave it blank. Also I changed the "cookie path" from /forum/ to /forum

Thanks for all your help charter.

George