PDA

View Full Version : Spidering vBulletin web sites?


jamison
01-27-2004, 09:18 AM
Has anyone spidered a vBulletin site? It seems like the same pages are getting reindexed with different session variables. I am using PHPdig 1.8

Jamison

Charter
01-27-2004, 09:51 AM
Hi. If not already done, use this (http://www.phpdig.net/showthread.php?threadid=429) file and set the following in the config file to match the session variable name:

define('PHPDIG_SESSID_REMOVE',true); // remove SIDS from indexed URLS
define('PHPDIG_SESSID_VAR','PHPSESSID'); // name of the SID variable

archeire.com
03-29-2004, 04:18 PM
match the session variable name?
how do i ascertain that?

Charter
03-29-2004, 09:06 PM
Hi. The session variable name can sometimes be seen in the address bar of a web browser or in the HTML source of a page.

archeire.com
03-30-2004, 07:58 AM
so for a vbulletin page would it be... ???

archeire.com
03-30-2004, 08:15 AM
the spider can deal with the forum pages in vbulletin but not the thread pages...

Duplicate of an existing document
21:http://www.archiseek.com/content/showthread.php
(time : 00:03:09)

22:http://www.archiseek.com/content/forumdisplay.php?forumid=16
(time : 00:03:16)

Duplicate of an existing document
23:http://www.archiseek.com/content/showthread.php
(time : 00:04:12)

24:http://www.archiseek.com/content/forumdisplay.php?forumid=22
(time : 00:04:18)

Charter
03-30-2004, 09:27 AM
Hi. I haven't experienced that issue on PhpDig.net. What are you using for a session variable name?

misterbearcom
04-18-2004, 05:12 PM
Hi Charter,

Thanks for all the great advice you give on this site. I'm reading your info and I'm still a bit confused (sorry, I'm still learning php and xml). The part of the config file you listed was:

define('PHPDIG_SESSID_REMOVE',true); // remove SIDS from indexed URLS

Now, isn't that how it's already written in the config file? (I downloaded phpdig last week so it seems to be the most current version at the time of this writing) And if the answer is yes, then what exactly am I supposed to do with the info to remove sessions from my spidering?

Sorry, I guess I need to read up a bit more about linux, php, and xml and... and.. yep. I'm a newbie.

Thanks in advance, Charter.

Charter
04-20-2004, 11:01 AM
Hi. You'd need to provide the name of the SID variable in the config file:

define('PHPDIG_SESSID_REMOVE',true); // remove SIDS from indexed URLS
define('PHPDIG_SESSID_VAR','PHPSESSID'); // name of the SID variable

Not all SID variables are named PHPSESSID so if you already indexed, then you might want to reindex once the SID name is set.

misterbearcom
04-20-2004, 06:19 PM
Oh okay. I think I understand. So if they are using "sid" as their session id then I guess I should go into the config file, set the PHPDIG_SESSID_VAR to 'sid' and then reindex my database? If so, cool. I'll try it out. Thanks a bunch!

Charter
04-20-2004, 06:38 PM
Hi. Yep, that's it. You might have to delete the site, clean the dictionary, and then index anew so PhpDig doesn't continue to store old session info in the tables.

shinji
06-06-2004, 04:27 AM
isn't it possible to set more then just 1 sid-variables? would be good when u are spidering some strange sites and they use different sid's

bloodjelly
06-06-2004, 10:07 AM
Originally posted by shinji
isn't it possible to set more then just 1 sid-variables? would be good when u are spidering some strange sites and they use different sid's
Yeah that would be great - dynamic SID variables that change depending on the one at the site (maybe Dig could check an array for a list of known SID types when it hits a certain URL?)

ChadK
06-10-2004, 07:23 PM
Is there a way to set more than one "PHPSESSID" string? I index many different "related" sites but some use "SID" some "SESS" some "SESSID" some "PHPID" etc.,

shinji
06-11-2004, 04:03 AM
Originally posted by ChadK
Is there a way to set more than one "PHPSESSID" string? I index many different "related" sites but some use "SID" some "SESS" some "SESSID" some "PHPID" etc.,

i'm sry to say that... but... i already asked it?:bang: :D

chilling
06-11-2004, 05:55 AM
So did I :mad: