PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 11-26-2003, 10:48 PM   #1
george
Green Mole
 
Join Date: Nov 2003
Posts: 10
Invision power board - spidering doesn't work?

Hi everyone,

I installed phpdig on my site recently and was very happy with it until i did a search for the word "beer" to test my search engine. I know the word only appears in one thread in my Invision power board forum. However I get 193 hits.

The worst thing though, is that if you click on the first result, the page it takes you to does not include the word "beer" (it takes you to the members list instead).

So can anyone tell me what is wrong and how I can get the spider to work correctly. Otherwise I am very happy with phpdig (thanks charter). Will I have to exclude the forum from the search?

george.
george is offline   Reply With Quote
Old 11-27-2003, 08:19 AM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. That's a lot of beer.

In the config.php file, set the following:
Code:
define('PHPDIG_DEFAULT_INDEX',false);
define('PHPDIG_SESSID_REMOVE',true);
define('PHPDIG_SESSID_VAR','s');
Then do the following to start from scratch:
  1. empty all the PhpDig database tables
  2. delete all files that may be in the temp dir
  3. delete all files in the text_content dir except keepalive.txt
  4. index the site
When you do the above, how much beer shows up now?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 11-29-2003, 04:20 AM   #3
george
Green Mole
 
Join Date: Nov 2003
Posts: 10
no luck

Hi Charter,

I followed your advice. I am currently re-indexing my site (been going for about 1 hour now). I am indexing to a depth of 5.... is this OK? I was not really sure.

If I do a search for "beer" I get 119 hits for beer. Again, I have the same problem where the returned pages do not actually include the word "beer".

Anything else I can try?

Best Regards, George

Last edited by george; 11-29-2003 at 04:31 AM.
george is offline   Reply With Quote
Old 11-29-2003, 04:30 AM   #4
george
Green Mole
 
Join Date: Nov 2003
Posts: 10
Lightbulb

Hi Charter, In the meantime I have just deleted the /forums folder from the admin panel so i don't confuse my users. So if you try the search engine, there are no beer hits now. But trust me the problem was still there.
Thanks George
george is offline   Reply With Quote
Old 11-29-2003, 10:09 AM   #5
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Not sure, but I'm wondering if there is some problem with .js, .cab, and .swf files. Can you try adding js, cab, and swf into the FORBIDDEN_EXTENSIONS in the config file? Also, if this doesn't solve the problem, can you set me up with a demo IPB somewhere on your site and add a couple of posts so that I can crawl there? Do you recall if all the extra beer links went to the members list? You might try crawling http://www.domain.com/forums/ at a level of two, level one for the forum links and level two for the thread links.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 11-29-2003, 04:39 PM   #6
george
Green Mole
 
Join Date: Nov 2003
Posts: 10
Hi Charter,

I added those extensions to the forbidden list (plus also .ico) since i noticed it spiders my fav.ico file and this is not necessary. I will respider and see what happens.

To answer your beer question. No, each link seemed to take me to a different page but none of them included the word beer.

Regards, george

Last edited by george; 11-29-2003 at 05:39 PM.
george is offline   Reply With Quote
Old 11-29-2003, 05:35 PM   #7
george
Green Mole
 
Join Date: Nov 2003
Posts: 10
Hi Charter,

I respidered and still the same problem. I am not sure how to set up another IPB forum for you to play with.

I have left the spider results so you can see what it says.

Thanks, George
george is offline   Reply With Quote
Old 12-01-2003, 11:55 AM   #8
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. The problem it seems is that for almost every http://www.domain.com/forums/index.php?different=query&string=here the text from http://www.domain.com/forums/index.php is returned. It almost seems like some sort of redirect. Anyway, I can duplicate the problem with your site, but other Invision Power Boards seem to index without this trouble. Maybe there are some settings in the IPB to allow for spiders?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-01-2003, 02:10 PM   #9
george
Green Mole
 
Join Date: Nov 2003
Posts: 10
Yes there are various spider options in the Admin panel of Invision Power Board.

The board should recognise popular spiders such as hotbot and googlebot and will log their activitity. Also, you can set the privileges for the bots. So I have my privileges set to "guest". Which means that the bot can see any page that a normal guest can see. I am getting spidered by hotbot and googlebot quite often and seems to work ok.

It is weird that other IPB's work cause I have made very few modifications. I have added google ads to mt board wrapper and changed a few images and otherwise it is a standard invision board version 1.2

Here is some info from the IPB help pages:

>>>>>>

Search Engine Spiders
Enable the search engine spider recognition? - Yes/No toggle

Log all spider visits? - Yes/No toggle

Treat spider/bot as part of which group? - Choose a group that these bots are to be shown under when they index the board.

Show spider/bot in the active users list? - Choose whether they will be shown as anonymous. Yes/No toggle also.

Call Googlebot... - Name of the Googlebot, which will be shown on the active member list.

Call Microsoft / Hotbot... - Name of the Microsoft / Hotbot bot, which will be shown on the active member list.

Call Lycos... - Name of the Lycos bot, which will be shown on the active member list.

Call Ask Jeeves... - Name of the Ask Jeaves bot, which will be shown on the active member list.

Call What U Seek... - Name of the What U Seek bot, which will be shown on the active member list.

Link to this page: http://www.invisionpower.com/documen...oc.php?page=31
george is offline   Reply With Quote
Old 12-01-2003, 02:57 PM   #10
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Are you able to crawl other IBP forums without experiencing the problem that appears from your boards?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-02-2003, 03:49 AM   #11
george
Green Mole
 
Join Date: Nov 2003
Posts: 10
The spidering of the threads that contain "showforum" and "showtopic" are OK. I don't really want it to spider anything else.

It would good to be able to limit spidering depending on the words in the url. Or conversely, to refuse to spider pages that contain certain phrases in the url.

I will try to spider some other boards and see what happens.

Thanks charter.
george

Last edited by george; 12-02-2003 at 04:11 AM.
george is offline   Reply With Quote
Old 12-02-2003, 04:10 AM   #12
george
Green Mole
 
Join Date: Nov 2003
Posts: 10
I just spidered a few pages at a different site.

I did a search for the phrase "connection speed" which is included in one of the spidered pages and it worked fine. You can try it. I will leave it on my board.

With my forum, if you do a search for the word "chips" you will get a similar problem to "beer".

Thanks, George
george is offline   Reply With Quote
Old 12-04-2003, 02:55 PM   #13
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. I have not yet been able to come up with a reason why this problem appears with your IPB but not other IPBs.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-04-2003, 07:15 PM   #14
george
Green Mole
 
Join Date: Nov 2003
Posts: 10
Hi Charter,

I have recently realised I have another problem that might be related. I was trying to install those Google ads on my IPB forum and the session ID in the url was interefering with the delivery of relevant ads. Then I found out that Internet Exploer was often not accepting cookies properly from my site. So if the cookies were not accepted it would include a session ID in the url. But now if i get the cookie working ok (by messing around with IE privacy settings) then my urls no longer contain the SESSION IDs because the session data is stored in the cookie instead.

So i wonder if now i am freeing myself of session IDs in url's maybe phpDig will work better. I will try to re-spider next time i have time and let u know if things are better.

If i can't get it to work it is not the end of the world cause the IPB forum has a search function anyway. i still think phpDig is very good.

Thanks again, Glen
george is offline   Reply With Quote
Old 12-05-2003, 07:37 AM   #15
george
Green Mole
 
Join Date: Nov 2003
Posts: 10
Hi,

I am re-spidering and the problem seems to be fixed!

so how did I do it?

Well I did a few things so not sure which achieve the end result:

1) I modified my privacy settings in internet explorer so that cookies from my website were put on the safe list. (n.b. I am running the spider.php file from within Internet Explorer)

2) I fiddled with the cookie settings in the IPB control panel. My new settings are available in the attached image.

I used to have a "cookie name prefix" set but i now just leave it blank. Also I changed the "cookie path" from /forum/ to /forum

Thanks for all your help charter.

George
george is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
not work isababa Troubleshooting 3 08-30-2005 04:54 PM
Searching UBB Message Board Beans How-to Forum 3 08-04-2005 08:18 AM
Cronjob for spidering doen't work anymore with PhpDig 1.8.6 gaam Troubleshooting 0 12-22-2004 12:28 AM
It doesn't work humanitaire.ws Script Installation 8 12-15-2004 03:37 AM
This board not letting you do something? Charter Feedback & News 2 08-03-2003 11:54 PM


All times are GMT -8. The time now is 11:08 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.