PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > How-to Forum

Reply
 
Thread Tools
Old 01-27-2004, 08:18 AM   #1
jamison
Green Mole
 
jamison's Avatar
 
Join Date: Nov 2003
Posts: 7
Spidering vBulletin web sites?

Has anyone spidered a vBulletin site? It seems like the same pages are getting reindexed with different session variables. I am using PHPdig 1.8

Jamison
__________________
Jamison White
jamison is offline   Reply With Quote
Old 01-27-2004, 08:51 AM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. If not already done, use this file and set the following in the config file to match the session variable name:
PHP Code:
define('PHPDIG_SESSID_REMOVE',true);     // remove SIDS from indexed URLS
define('PHPDIG_SESSID_VAR','PHPSESSID'); // name of the SID variable 
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 03-29-2004, 03:18 PM   #3
archeire.com
Green Mole
 
Join Date: Mar 2004
Posts: 3
match the session variable name?
how do i ascertain that?
archeire.com is offline   Reply With Quote
Old 03-29-2004, 08:06 PM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. The session variable name can sometimes be seen in the address bar of a web browser or in the HTML source of a page.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 03-30-2004, 06:58 AM   #5
archeire.com
Green Mole
 
Join Date: Mar 2004
Posts: 3
so for a vbulletin page would it be... ???
archeire.com is offline   Reply With Quote
Old 03-30-2004, 07:15 AM   #6
archeire.com
Green Mole
 
Join Date: Mar 2004
Posts: 3
the spider can deal with the forum pages in vbulletin but not the thread pages...

Duplicate of an existing document
21:http://www.archiseek.com/content/showthread.php
(time : 00:03:09)

22:http://www.archiseek.com/content/for...php?forumid=16
(time : 00:03:16)

Duplicate of an existing document
23:http://www.archiseek.com/content/showthread.php
(time : 00:04:12)

24:http://www.archiseek.com/content/for...php?forumid=22
(time : 00:04:18)
archeire.com is offline   Reply With Quote
Old 03-30-2004, 08:27 AM   #7
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. I haven't experienced that issue on PhpDig.net. What are you using for a session variable name?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 04-18-2004, 04:12 PM   #8
misterbearcom
Green Mole
 
Join Date: Apr 2004
Location: Cali
Posts: 10
Question Hi, I must be stupid... please help.

Hi Charter,

Thanks for all the great advice you give on this site. I'm reading your info and I'm still a bit confused (sorry, I'm still learning php and xml). The part of the config file you listed was:

Quote:
define('PHPDIG_SESSID_REMOVE',true); // remove SIDS from indexed URLS
Now, isn't that how it's already written in the config file? (I downloaded phpdig last week so it seems to be the most current version at the time of this writing) And if the answer is yes, then what exactly am I supposed to do with the info to remove sessions from my spidering?

Sorry, I guess I need to read up a bit more about linux, php, and xml and... and.. yep. I'm a newbie.

Thanks in advance, Charter.
misterbearcom is offline   Reply With Quote
Old 04-20-2004, 10:01 AM   #9
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. You'd need to provide the name of the SID variable in the config file:
PHP Code:
define('PHPDIG_SESSID_REMOVE',true);     // remove SIDS from indexed URLS
define('PHPDIG_SESSID_VAR','PHPSESSID'); // name of the SID variable 
Not all SID variables are named PHPSESSID so if you already indexed, then you might want to reindex once the SID name is set.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 04-20-2004, 05:19 PM   #10
misterbearcom
Green Mole
 
Join Date: Apr 2004
Location: Cali
Posts: 10
Oh okay. I think I understand. So if they are using "sid" as their session id then I guess I should go into the config file, set the PHPDIG_SESSID_VAR to 'sid' and then reindex my database? If so, cool. I'll try it out. Thanks a bunch!
misterbearcom is offline   Reply With Quote
Old 04-20-2004, 05:38 PM   #11
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Yep, that's it. You might have to delete the site, clean the dictionary, and then index anew so PhpDig doesn't continue to store old session info in the tables.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 06-06-2004, 03:27 AM   #12
shinji
Green Mole
 
Join Date: Jan 2004
Location: Hamm/NRW/Germany
Posts: 26
isn't it possible to set more then just 1 sid-variables? would be good when u are spidering some strange sites and they use different sid's
shinji is offline   Reply With Quote
Old 06-06-2004, 09:07 AM   #13
bloodjelly
Purple Mole
 
Join Date: Dec 2003
Posts: 106
Quote:
Originally posted by shinji
isn't it possible to set more then just 1 sid-variables? would be good when u are spidering some strange sites and they use different sid's
Yeah that would be great - dynamic SID variables that change depending on the one at the site (maybe Dig could check an array for a list of known SID types when it hits a certain URL?)
bloodjelly is offline   Reply With Quote
Old 06-10-2004, 06:23 PM   #14
ChadK
Green Mole
 
Join Date: May 2004
Posts: 23
Is there a way to set more than one "PHPSESSID" string? I index many different "related" sites but some use "SID" some "SESS" some "SESSID" some "PHPID" etc.,
ChadK is offline   Reply With Quote
Old 06-11-2004, 03:03 AM   #15
shinji
Green Mole
 
Join Date: Jan 2004
Location: Hamm/NRW/Germany
Posts: 26
Quote:
Originally posted by ChadK
Is there a way to set more than one "PHPSESSID" string? I index many different "related" sites but some use "SID" some "SESS" some "SESSID" some "PHPID" etc.,
i'm sry to say that... but... i already asked it?
shinji is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
web installation not spidering sderossi Troubleshooting 0 11-03-2007 11:17 AM
Get all words from web page babu How-to Forum 0 02-25-2006 02:04 PM
New Web Site. Dave A Feedback & News 4 08-23-2004 04:42 PM
Problem spidering sites at in .txt over 20 address joshuag200 Troubleshooting 3 01-30-2004 08:13 PM


All times are GMT -8. The time now is 06:57 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.