PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 01-19-2005, 03:08 AM   #16
zaartix
Orange Mole
 
Join Date: May 2004
Location: russia, samara
Posts: 56
Quote:
So + works for your type of [ ] links, right? I'm not sure if you are still having a problem with [ ] type links, but remember to use + in those two regexs.
i think that in 1.8.7 of phpdig all should be work?

working only if link contain only one pair of []
zaartix is offline   Reply With Quote
Old 01-19-2005, 03:12 AM   #17
zaartix
Orange Mole
 
Join Date: May 2004
Location: russia, samara
Posts: 56
first regexp doesn't needed becourse site have'nt frames

Last edited by zaartix; 01-19-2005 at 03:18 AM.
zaartix is offline   Reply With Quote
Old 01-19-2005, 04:09 AM   #18
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
>> working only if link contain only one pair of []

So it works in example but not with PhpDig? What's a link to a page containing multiple [ ] in its links?

>> first regexp doesn't needed becourse site have'nt frames

Other people might have frames though.

The RFC2732 protocol states in part:
Quote:
Code:
   (3) Add "[" and "]" to the set of 'reserved' characters:

      reserved    = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
                    "$" | "," | "[" | "]"

   and remove them from the 'unwise' set:

      unwise      = "{" | "}" | "|" | "\" | "^" | "`"
Sometimes using reserved characters in links, other than for their intended purpose, can cause problems as was the case in this thread (colon used outside of <user>:<pass>@<host>:<port> meaning so the PHP parse_url function did not understand).

You might want to consider encoding your URIs according to this rather than use literal square brackets in your links.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 01-19-2005, 07:24 PM   #19
zaartix
Orange Mole
 
Join Date: May 2004
Location: russia, samara
Posts: 56
>>So it works in example but not with PhpDig? What's a link to a page containing multiple [ ] in its links?
Yep.
Just try to dig this page:
http://zaartix.ru/krit

Sorry for russian on that page
zaartix is offline   Reply With Quote
Old 01-19-2005, 07:40 PM   #20
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
That page contains tons of links to 404 pages.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 01-19-2005, 09:01 PM   #21
zaartix
Orange Mole
 
Join Date: May 2004
Location: russia, samara
Posts: 56
they are all to 404
so phpdig extract not all links from main page
zaartix is offline   Reply With Quote
Old 01-19-2005, 09:08 PM   #22
zaartix
Orange Mole
 
Join Date: May 2004
Location: russia, samara
Posts: 56
i'am not upload other pages, only one page.
for what other pages? if phpdig find all links which are on that page and all links are correct, then extractng regexp working right. Is it so?
zaartix is offline   Reply With Quote
Old 01-20-2005, 03:07 AM   #23
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
PhpDig tests links, and if PhpDig gets a 404 from a link, then PhpDig does not index that link. The + works in example, so maybe try setting up an online demo with a few links.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 01-20-2005, 03:23 AM   #24
zaartix
Orange Mole
 
Join Date: May 2004
Location: russia, samara
Posts: 56
so, phpdig, when it parsing page, trying to open each of link? on first step? i think, that phpdig extracting all links and paste it in tempspider table. at next step phpdig try to open each of links.
I'am wrong?
zaartix is offline   Reply With Quote
Old 01-20-2005, 03:53 AM   #25
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Nope, that is not how it works. PhpDig does not insert server response 404s in the tempspider table. With all the links currently returning 404s, the only thing inserted into the tempspider table is the zaartix.ru/krit/ page.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 01-20-2005, 08:51 PM   #26
zaartix
Orange Mole
 
Join Date: May 2004
Location: russia, samara
Posts: 56
at now you can try to dig http://zaartix.ru/krit
plz, help to solve this problem

Last edited by zaartix; 01-20-2005 at 09:08 PM.
zaartix is offline   Reply With Quote
Old 01-20-2005, 09:31 PM   #27
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
There are no regular links with more than one set of [ ] square brackets in them.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 01-21-2005, 03:05 AM   #28
zaartix
Orange Mole
 
Join Date: May 2004
Location: russia, samara
Posts: 56
There are many levels of pages. Just try to dig all aviable pages, mane different types of links
http://zaartix.ru/krit

Last edited by zaartix; 01-21-2005 at 03:20 AM.
zaartix is offline   Reply With Quote
Old 01-21-2005, 04:33 AM   #29
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Here's a one-page test...

Spider:

http://zaartix.ru/krit/index.php-razdel=about&mach[2]=news&mach[3]=79.htm

Results:

Spidering in progress... [Stop spider]
SITE : http://zaartix.ru/
Exclude paths :
- @NONE@
1:http://zaartix.ru/krit/index.php-razdel=about&mach[2]=news&mach[3]=79.htm
(time : 00:00:09)
No link in temporary table
links found : 1
http://zaartix.ru/krit/index.php-razdel=about&mach[2]=news&mach[3]=79.htm
Optimizing tables...
Indexing complete ! [Back] to admin interface.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 01-21-2005, 04:55 AM   #30
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Here's a multi-page test...

Spider:

http://zaartix.ru/krit/index.php-razdel=about&mach[2]=news.htm

Results:

Spidering in progress... [Stop spider]
SITE : http://zaartix.ru/
Exclude paths :
- @NONE@
1:http://zaartix.ru/krit/index.php-razdel=about&mach[2]=news.htm
(time : 00:00:10)
+ + + + + + + + + + + + + + + + + + + + + +
level 1...
2:http://zaartix.ru/krit/index.php-razdel=price&mach[2]=23.htm
(time : 00:00:34)

3:http://zaartix.ru/krit/index.php-razdel=about&mach[2]=24.htm
(time : 00:00:46)

4:http://zaartix.ru/krit/index.php-razdel=price.htm
(time : 00:01:04)

5:http://zaartix.ru/krit/index.php-razdel=quality&mach[2]=34.htm
(time : 00:01:13)

6:http://zaartix.ru/krit/index.php-razdel=contact&mach[2]=19.htm
(time : 00:01:23)

Duplicate of an existing document
7:http://zaartix.ru/krit/index.php-razdel=price&mach[2]=view.htm
(time : 00:01:40)

8:http://zaartix.ru/krit/index.php-razdel=about&mach[2]=22.htm
(time : 00:01:50)

9:http://zaartix.ru/krit/index.php-razdel=about&mach[2]=21.htm
(time : 00:01:59)

10:http://zaartix.ru/krit/index.htm
(time : 00:02:08)

11:http://zaartix.ru/krit/index.php-razdel=about&mach[2]=20.htm
(time : 00:02:17)

12:http://zaartix.ru/krit/index.php-razdel=price&mach[2]=ost.htm
(time : 00:02:25)

13:http://zaartix.ru/krit/index.php-razdel=price&mach[2]=tech.htm
(time : 00:02:34)

14:http://zaartix.ru/krit/index.php-razdel=price&mach[2]=sert.htm
(time : 00:02:43)

15:http://zaartix.ru/krit/index.php-razdel=quality&mach[2]=27.htm
(time : 00:02:51)

16:http://zaartix.ru/krit/index.php-razdel=quality&mach[2]=32.htm
(time : 00:03:00)

17:http://zaartix.ru/krit/index.php-razdel=quality&mach[2]=33.htm
(time : 00:03:09)

18:http://zaartix.ru/krit/index.php-razdel=contact&mach[2]=16.htm
(time : 00:03:17)

19:http://zaartix.ru/krit/index.php-razdel=contact&mach[2]=17.htm
(time : 00:03:26)

20:http://zaartix.ru/krit/index.php-razdel=contact&mach[2]=vacancies.htm
(time : 00:03:35)

21:http://zaartix.ru/krit/index.php-razdel=about&mach[2]=news&mach[3]=79.htm
(time : 00:03:43)

22:http://zaartix.ru/krit/index.php-razdel=about&mach[2]=news&mach[3]=78.htm
(time : 00:03:51)

23:http://zaartix.ru/krit/index.php-razdel=about&mach[2]=news&mach[3]=2.htm
(time : 00:04:01)

No link in temporary table
links found : 23
http://zaartix.ru/krit/index.php-razdel=about&mach[2]=news.htm
http://zaartix.ru/krit/index.php-razdel=price&mach[2]=23.htm
http://zaartix.ru/krit/index.php-razdel=about&mach[2]=24.htm
http://zaartix.ru/krit/index.php-razdel=price.htm
http://zaartix.ru/krit/index.php-razdel=quality&mach[2]=34.htm
http://zaartix.ru/krit/index.php-razdel=contact&mach[2]=19.htm
http://zaartix.ru/krit/index.php-razdel=price&mach[2]=view.htm
http://zaartix.ru/krit/index.php-razdel=about&mach[2]=22.htm
http://zaartix.ru/krit/index.php-razdel=about&mach[2]=21.htm
http://zaartix.ru/krit/index.htm
http://zaartix.ru/krit/index.php-razdel=about&mach[2]=20.htm
http://zaartix.ru/krit/index.php-razdel=price&mach[2]=ost.htm
http://zaartix.ru/krit/index.php-razdel=price&mach[2]=tech.htm
http://zaartix.ru/krit/index.php-razdel=price&mach[2]=sert.htm
http://zaartix.ru/krit/index.php-razdel=quality&mach[2]=27.htm
http://zaartix.ru/krit/index.php-razdel=quality&mach[2]=32.htm
http://zaartix.ru/krit/index.php-razdel=quality&mach[2]=33.htm
http://zaartix.ru/krit/index.php-razdel=contact&mach[2]=16.htm
http://zaartix.ru/krit/index.php-razdel=contact&mach[2]=17.htm
http://zaartix.ru/krit/index.php-razdel=contact&mach[2]=vacancies.htm
http://zaartix.ru/krit/index.php-razdel=about&mach[2]=news&mach[3]=79.htm
http://zaartix.ru/krit/index.php-razdel=about&mach[2]=news&mach[3]=78.htm
http://zaartix.ru/krit/index.php-razdel=about&mach[2]=news&mach[3]=2.htm
Optimizing tables...
Indexing complete ! [Back] to admin interface.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Excluding only one link arena75 How-to Forum 5 10-10-2004 01:46 PM
i need only the link, without the title Fking How-to Forum 1 10-05-2004 05:29 PM
Too many duplicate link, someone help please! warrence Troubleshooting 1 09-07-2004 04:26 PM
don't follow link Onno How-to Forum 1 03-05-2004 09:45 AM
Installation correct? DrKamikaze83 Script Installation 1 02-16-2004 05:56 AM


All times are GMT -8. The time now is 08:19 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.