|
01-19-2005, 03:08 AM | #16 | |
Orange Mole
Join Date: May 2004
Location: russia, samara
Posts: 56
|
Quote:
working only if link contain only one pair of [] |
|
01-19-2005, 03:12 AM | #17 |
Orange Mole
Join Date: May 2004
Location: russia, samara
Posts: 56
|
first regexp doesn't needed becourse site have'nt frames
Last edited by zaartix; 01-19-2005 at 03:18 AM. |
01-19-2005, 04:09 AM | #18 | |
Head Mole
Join Date: May 2003
Posts: 2,539
|
>> working only if link contain only one pair of []
So it works in example but not with PhpDig? What's a link to a page containing multiple [ ] in its links? >> first regexp doesn't needed becourse site have'nt frames Other people might have frames though. The RFC2732 protocol states in part: Quote:
You might want to consider encoding your URIs according to this rather than use literal square brackets in your links.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
|
01-19-2005, 07:24 PM | #19 |
Orange Mole
Join Date: May 2004
Location: russia, samara
Posts: 56
|
>>So it works in example but not with PhpDig? What's a link to a page containing multiple [ ] in its links?
Yep. Just try to dig this page: http://zaartix.ru/krit Sorry for russian on that page |
01-19-2005, 07:40 PM | #20 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
That page contains tons of links to 404 pages.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
01-19-2005, 09:01 PM | #21 |
Orange Mole
Join Date: May 2004
Location: russia, samara
Posts: 56
|
they are all to 404
so phpdig extract not all links from main page |
01-19-2005, 09:08 PM | #22 |
Orange Mole
Join Date: May 2004
Location: russia, samara
Posts: 56
|
i'am not upload other pages, only one page.
for what other pages? if phpdig find all links which are on that page and all links are correct, then extractng regexp working right. Is it so? |
01-20-2005, 03:07 AM | #23 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
PhpDig tests links, and if PhpDig gets a 404 from a link, then PhpDig does not index that link. The + works in example, so maybe try setting up an online demo with a few links.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
01-20-2005, 03:23 AM | #24 |
Orange Mole
Join Date: May 2004
Location: russia, samara
Posts: 56
|
so, phpdig, when it parsing page, trying to open each of link? on first step? i think, that phpdig extracting all links and paste it in tempspider table. at next step phpdig try to open each of links.
I'am wrong? |
01-20-2005, 03:53 AM | #25 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Nope, that is not how it works. PhpDig does not insert server response 404s in the tempspider table. With all the links currently returning 404s, the only thing inserted into the tempspider table is the zaartix.ru/krit/ page.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
01-20-2005, 08:51 PM | #26 |
Orange Mole
Join Date: May 2004
Location: russia, samara
Posts: 56
|
at now you can try to dig http://zaartix.ru/krit
plz, help to solve this problem Last edited by zaartix; 01-20-2005 at 09:08 PM. |
01-20-2005, 09:31 PM | #27 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
There are no regular links with more than one set of [ ] square brackets in them.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
01-21-2005, 03:05 AM | #28 |
Orange Mole
Join Date: May 2004
Location: russia, samara
Posts: 56
|
There are many levels of pages. Just try to dig all aviable pages, mane different types of links
http://zaartix.ru/krit Last edited by zaartix; 01-21-2005 at 03:20 AM. |
01-21-2005, 04:33 AM | #29 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Here's a one-page test...
Spider: http://zaartix.ru/krit/index.php-razdel=about&mach[2]=news&mach[3]=79.htm Results: Spidering in progress... [Stop spider] SITE : http://zaartix.ru/ Exclude paths : - @NONE@ 1:http://zaartix.ru/krit/index.php-razdel=about&mach[2]=news&mach[3]=79.htm (time : 00:00:09) No link in temporary table links found : 1 http://zaartix.ru/krit/index.php-razdel=about&mach[2]=news&mach[3]=79.htm Optimizing tables... Indexing complete ! [Back] to admin interface.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
01-21-2005, 04:55 AM | #30 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Here's a multi-page test...
Spider: http://zaartix.ru/krit/index.php-razdel=about&mach[2]=news.htm Results: Spidering in progress... [Stop spider] SITE : http://zaartix.ru/ Exclude paths : - @NONE@ 1:http://zaartix.ru/krit/index.php-razdel=about&mach[2]=news.htm (time : 00:00:10) + + + + + + + + + + + + + + + + + + + + + + level 1... 2:http://zaartix.ru/krit/index.php-razdel=price&mach[2]=23.htm (time : 00:00:34) 3:http://zaartix.ru/krit/index.php-razdel=about&mach[2]=24.htm (time : 00:00:46) 4:http://zaartix.ru/krit/index.php-razdel=price.htm (time : 00:01:04) 5:http://zaartix.ru/krit/index.php-razdel=quality&mach[2]=34.htm (time : 00:01:13) 6:http://zaartix.ru/krit/index.php-razdel=contact&mach[2]=19.htm (time : 00:01:23) Duplicate of an existing document 7:http://zaartix.ru/krit/index.php-razdel=price&mach[2]=view.htm (time : 00:01:40) 8:http://zaartix.ru/krit/index.php-razdel=about&mach[2]=22.htm (time : 00:01:50) 9:http://zaartix.ru/krit/index.php-razdel=about&mach[2]=21.htm (time : 00:01:59) 10:http://zaartix.ru/krit/index.htm (time : 00:02:08) 11:http://zaartix.ru/krit/index.php-razdel=about&mach[2]=20.htm (time : 00:02:17) 12:http://zaartix.ru/krit/index.php-razdel=price&mach[2]=ost.htm (time : 00:02:25) 13:http://zaartix.ru/krit/index.php-razdel=price&mach[2]=tech.htm (time : 00:02:34) 14:http://zaartix.ru/krit/index.php-razdel=price&mach[2]=sert.htm (time : 00:02:43) 15:http://zaartix.ru/krit/index.php-razdel=quality&mach[2]=27.htm (time : 00:02:51) 16:http://zaartix.ru/krit/index.php-razdel=quality&mach[2]=32.htm (time : 00:03:00) 17:http://zaartix.ru/krit/index.php-razdel=quality&mach[2]=33.htm (time : 00:03:09) 18:http://zaartix.ru/krit/index.php-razdel=contact&mach[2]=16.htm (time : 00:03:17) 19:http://zaartix.ru/krit/index.php-razdel=contact&mach[2]=17.htm (time : 00:03:26) 20:http://zaartix.ru/krit/index.php-razdel=contact&mach[2]=vacancies.htm (time : 00:03:35) 21:http://zaartix.ru/krit/index.php-razdel=about&mach[2]=news&mach[3]=79.htm (time : 00:03:43) 22:http://zaartix.ru/krit/index.php-razdel=about&mach[2]=news&mach[3]=78.htm (time : 00:03:51) 23:http://zaartix.ru/krit/index.php-razdel=about&mach[2]=news&mach[3]=2.htm (time : 00:04:01) No link in temporary table links found : 23 http://zaartix.ru/krit/index.php-razdel=about&mach[2]=news.htm http://zaartix.ru/krit/index.php-razdel=price&mach[2]=23.htm http://zaartix.ru/krit/index.php-razdel=about&mach[2]=24.htm http://zaartix.ru/krit/index.php-razdel=price.htm http://zaartix.ru/krit/index.php-razdel=quality&mach[2]=34.htm http://zaartix.ru/krit/index.php-razdel=contact&mach[2]=19.htm http://zaartix.ru/krit/index.php-razdel=price&mach[2]=view.htm http://zaartix.ru/krit/index.php-razdel=about&mach[2]=22.htm http://zaartix.ru/krit/index.php-razdel=about&mach[2]=21.htm http://zaartix.ru/krit/index.htm http://zaartix.ru/krit/index.php-razdel=about&mach[2]=20.htm http://zaartix.ru/krit/index.php-razdel=price&mach[2]=ost.htm http://zaartix.ru/krit/index.php-razdel=price&mach[2]=tech.htm http://zaartix.ru/krit/index.php-razdel=price&mach[2]=sert.htm http://zaartix.ru/krit/index.php-razdel=quality&mach[2]=27.htm http://zaartix.ru/krit/index.php-razdel=quality&mach[2]=32.htm http://zaartix.ru/krit/index.php-razdel=quality&mach[2]=33.htm http://zaartix.ru/krit/index.php-razdel=contact&mach[2]=16.htm http://zaartix.ru/krit/index.php-razdel=contact&mach[2]=17.htm http://zaartix.ru/krit/index.php-razdel=contact&mach[2]=vacancies.htm http://zaartix.ru/krit/index.php-razdel=about&mach[2]=news&mach[3]=79.htm http://zaartix.ru/krit/index.php-razdel=about&mach[2]=news&mach[3]=78.htm http://zaartix.ru/krit/index.php-razdel=about&mach[2]=news&mach[3]=2.htm Optimizing tables... Indexing complete ! [Back] to admin interface.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
Thread Tools | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Excluding only one link | arena75 | How-to Forum | 5 | 10-10-2004 01:46 PM |
i need only the link, without the title | Fking | How-to Forum | 1 | 10-05-2004 05:29 PM |
Too many duplicate link, someone help please! | warrence | Troubleshooting | 1 | 09-07-2004 04:26 PM |
don't follow link | Onno | How-to Forum | 1 | 03-05-2004 09:45 AM |
Installation correct? | DrKamikaze83 | Script Installation | 1 | 02-16-2004 05:56 AM |