PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   How-to Forum (http://www.phpdig.net/forum/forumdisplay.php?f=33)
-   -   Spidering vBulletin web sites? (http://www.phpdig.net/forum/showthread.php?t=447)

jamison 01-27-2004 08:18 AM

Spidering vBulletin web sites?
 
Has anyone spidered a vBulletin site? It seems like the same pages are getting reindexed with different session variables. I am using PHPdig 1.8

Jamison

Charter 01-27-2004 08:51 AM

Hi. If not already done, use this file and set the following in the config file to match the session variable name:
PHP Code:

define('PHPDIG_SESSID_REMOVE',true);     // remove SIDS from indexed URLS
define('PHPDIG_SESSID_VAR','PHPSESSID'); // name of the SID variable 


archeire.com 03-29-2004 03:18 PM

match the session variable name?
how do i ascertain that?

Charter 03-29-2004 08:06 PM

Hi. The session variable name can sometimes be seen in the address bar of a web browser or in the HTML source of a page.

archeire.com 03-30-2004 06:58 AM

so for a vbulletin page would it be... ???

archeire.com 03-30-2004 07:15 AM

the spider can deal with the forum pages in vbulletin but not the thread pages...

Duplicate of an existing document
21:http://www.archiseek.com/content/showthread.php
(time : 00:03:09)

22:http://www.archiseek.com/content/for...php?forumid=16
(time : 00:03:16)

Duplicate of an existing document
23:http://www.archiseek.com/content/showthread.php
(time : 00:04:12)

24:http://www.archiseek.com/content/for...php?forumid=22
(time : 00:04:18)

Charter 03-30-2004 08:27 AM

Hi. I haven't experienced that issue on PhpDig.net. What are you using for a session variable name?

misterbearcom 04-18-2004 04:12 PM

Hi, I must be stupid... please help.
 
Hi Charter,

Thanks for all the great advice you give on this site. I'm reading your info and I'm still a bit confused (sorry, I'm still learning php and xml). The part of the config file you listed was:

Quote:

define('PHPDIG_SESSID_REMOVE',true); // remove SIDS from indexed URLS
Now, isn't that how it's already written in the config file? (I downloaded phpdig last week so it seems to be the most current version at the time of this writing) And if the answer is yes, then what exactly am I supposed to do with the info to remove sessions from my spidering?

Sorry, I guess I need to read up a bit more about linux, php, and xml and... and.. yep. I'm a newbie.

Thanks in advance, Charter.

Charter 04-20-2004 10:01 AM

Hi. You'd need to provide the name of the SID variable in the config file:
PHP Code:

define('PHPDIG_SESSID_REMOVE',true);     // remove SIDS from indexed URLS
define('PHPDIG_SESSID_VAR','PHPSESSID'); // name of the SID variable 

Not all SID variables are named PHPSESSID so if you already indexed, then you might want to reindex once the SID name is set.

misterbearcom 04-20-2004 05:19 PM

Oh okay. I think I understand. So if they are using "sid" as their session id then I guess I should go into the config file, set the PHPDIG_SESSID_VAR to 'sid' and then reindex my database? If so, cool. I'll try it out. Thanks a bunch!

Charter 04-20-2004 05:38 PM

Hi. Yep, that's it. You might have to delete the site, clean the dictionary, and then index anew so PhpDig doesn't continue to store old session info in the tables.

shinji 06-06-2004 03:27 AM

isn't it possible to set more then just 1 sid-variables? would be good when u are spidering some strange sites and they use different sid's

bloodjelly 06-06-2004 09:07 AM

Quote:

Originally posted by shinji
isn't it possible to set more then just 1 sid-variables? would be good when u are spidering some strange sites and they use different sid's
Yeah that would be great - dynamic SID variables that change depending on the one at the site (maybe Dig could check an array for a list of known SID types when it hits a certain URL?)

ChadK 06-10-2004 06:23 PM

Is there a way to set more than one "PHPSESSID" string? I index many different "related" sites but some use "SID" some "SESS" some "SESSID" some "PHPID" etc.,

shinji 06-11-2004 03:03 AM

Quote:

Originally posted by ChadK
Is there a way to set more than one "PHPSESSID" string? I index many different "related" sites but some use "SID" some "SESS" some "SESSID" some "PHPID" etc.,
i'm sry to say that... but... i already asked it?:bang: :D


All times are GMT -8. The time now is 12:53 AM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.