PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   Troubleshooting (http://www.phpdig.net/forum/forumdisplay.php?f=22)
-   -   Indexing dynamically generated web pages (http://www.phpdig.net/forum/showthread.php?t=1756)

Dave A 01-18-2005 02:59 PM

Indexing dynamically generated web pages
 
I wonder if there is a way of getting Phpdig to index dynaically generated web pages?
Each time I try to spider a web site that has dynanic pages generation it would appear that the spider doesn't find any content and can't index it. Perhaps there may be a few things within the configuration files that need amending?
So if any one has any ides could they please post an answer to the forum.

Many thanks
from Dave Downunder

Charter 01-18-2005 03:57 PM

What version of PhpDig are you using? Would you provide an example link?

Dave A 01-18-2005 06:10 PM

Relpy
 
Hi Charter,
firstly many thanks for getting back to me so quickly.
One example of the type of dynamic files I can't seem to index can be found via www.hastings.co.nz
The spider visits and tries for a few moments then replies with Indexed 0 files and then no link found in Temporary folder.
I have contacted the people who designed the web site and they have said that each page is dynamically generated.
Thanks for your help with this, typing is a little hard because I had a couple of cataract ops yesterday and things seem just a little fuzzy around the edges until the swelling has gone.

Many regards

Dave Andrews

Charter 01-18-2005 06:47 PM

What version of PhpDig are you using?

If you use the latest version, you should see the following type output:

Spidering in progress... [Stop spider]
SITE : http://www.hastings.co.nz/
Exclude paths :
- @NONE@
1:http://www.hastings.co.nz/
(time : 00:00:08)

HTTP/1.1 403 Forbidden - http://www.hastings.co.nz/editable/welcome.shtml/
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.

HTTP/1.1 403 Forbidden - http://www.hastings.co.nz/editable/welcome.shtml
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
+
level 1...

HTTP/1.1 403 Forbidden - http://www.hastings.co.nz/editable/welcome.shtml
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
2:http://www.hastings.co.nz/editable/welcome.shtml
(time : 00:00:20)

No link in temporary table
links found : 2
http://www.hastings.co.nz/
http://www.hastings.co.nz/editable/welcome.shtml
Optimizing tables...
Indexing complete ! [Back] to admin interface.

It has nothing to do with dynamic files. That site is giving 403s, meaning forbidden, not allowed, go away.

Hope your eyes feel better soon. :cool:

Paul D. Buck 02-01-2005 11:03 AM

Quote:

Originally Posted by Charter
What version of PhpDig are you using?

If you use the latest version, you should see the following type output:

Spidering in progress... [Stop spider]
SITE : http://www.hastings.co.nz/
Exclude paths :
- @NONE@
1:http://www.hastings.co.nz/
(time : 00:00:08)

HTTP/1.1 403 Forbidden - http://www.hastings.co.nz/editable/welcome.shtml/
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.

HTTP/1.1 403 Forbidden - http://www.hastings.co.nz/editable/welcome.shtml
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
+
level 1...

HTTP/1.1 403 Forbidden - http://www.hastings.co.nz/editable/welcome.shtml
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
2:http://www.hastings.co.nz/editable/welcome.shtml
(time : 00:00:20)

No link in temporary table
links found : 2
http://www.hastings.co.nz/
http://www.hastings.co.nz/editable/welcome.shtml
Optimizing tables...
Indexing complete ! [Back] to admin interface.

It has nothing to do with dynamic files. That site is giving 403s, meaning forbidden, not allowed, go away.

Hope your eyes feel better soon. :cool:

Ok, My question becomes, what links is it not getting? In other words, what has to change in the phpDig to find out which links are failing? I have had these failures on pages where all the links are internal to my site (as far as *I* can tell) but I am getting these 403 rejections too ...


All times are GMT -8. The time now is 09:05 AM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.