PDA

View Full Version : not indexing anything (need help urgently)


Fking
09-29-2004, 07:30 AM
well i want to index this
http://www.studentskigrad.com/znanie/data

but got this...
SITE : http://studentskigrad.com/
Exclude paths :
- @NONE@
1:http://studentskigrad.com/znanie/data/
(time : 00:01:01)
No link in temporary table

--------------------------------------------------------------------------------

links found : 1
http://studentskigrad.com/znanie/data/
Optimizing tables...
Indexing complete !


---------------------------

i used zeros for search depth and links per....

and of course the start path was http://studentskigrad.com/znanie/data

Charter
09-29-2004, 08:24 AM
>> i used zeros for search depth and links per....

Try using a larger search depth. ;)

Fking
09-29-2004, 09:39 AM
i set them to the maximum and still got the same thing
it just does not want to see any of the .php there....

Charter
09-29-2004, 09:46 AM
Try http://www.studentskigrad.com/znanie/data/ (with www and ending slash).

Fking
09-29-2004, 09:49 AM
i will try but need to stop the spider first because just did something stupid
how to stop it?

Charter
09-29-2004, 09:54 AM
Keep clicking the delete button without selecting a site until the temp table remains empty.

Fking
09-29-2004, 10:12 AM
i clicked it like 50 times, and it's still locked

Fking
09-29-2004, 12:46 PM
anyway, i stoped the spider
and tried with the url u gave....well still got the same error

vinyl-junkie
09-29-2004, 06:07 PM
Regarding the locked site, read this thread (http://www.phpdig.net/forum/showthread.php?t=334).

As for the other problem, is there any sort of redirect in .htaccess?

Fking
09-29-2004, 11:55 PM
i have only one .htaccess file in the root dir, where i have set the 404 page
that's all


i'm very confused now
how is possible just a page with link, to be impossible to index...?


please take a look at the page and lmk if something with the lins is wrong

but i don't think that it may be.....they are all working in a browser...

Fking
09-30-2004, 12:31 AM
wow....it took me all day to realize the problem infact they was few
i'll explane them later, cause this may happen to others

but first i would like to know how to make it to spireder more than 20 links per page?

i have only one page for spidering with 300links

vinyl-junkie
09-30-2004, 03:17 AM
but first i would like to know how to make it to spireder more than 20 links per page?Just specify 0 on the "links per" field when spidering, and phpdig will spider all the links on that page.

Another thing to keep in mind, if you ever need it, is setting the search depth. This is found in the following code in config.php:

define('SPIDER_MAX_LIMIT',20); //max recurse levels in spider

Fking
09-30-2004, 07:02 AM
i fixed this one too
now it's working and indexing but....
seems like it does not index all words
i'm trying searches for r****m words from the files, sometimes it get much less results, than the document with this words, sometimes it doesn't get results at all, and this word exist in the documents.....???

Fking
09-30-2004, 08:33 AM
it indexed around 200 files (from 380 total) and stoped
i cleaned all the records about them and started it again
now it index only 6!!! pages.... ???
i'm trying again and again, and still only 6 pages...


huh i'm pretty confused now

Fking
09-30-2004, 09:47 AM
hm i shows maybe around.....300-400 pluses
but i guess it follows only little part of them, cause reports only 6 pages indexed

Charter
09-30-2004, 10:34 AM
Three things I can think of...

1) The links may not match the regex for links. Search for ([a-z]{3,5}://) in the robot_functions.php file to find two regex for links.

2) Some of the pages you are trying to crawl are encoded windows-1251 but the search results look to be using iso-8859-1 instead.

3) Some of the pages are using a whole lot of HTML entities instead of an encoding. PhpDig currently support windows-1251 for Cyrillic.

Fking
09-30-2004, 12:18 PM
i also think that the problem is related with the pages encoding....



what i can do in order to make them spiderable?

Charter
09-30-2004, 12:36 PM
Use links that match the regex, encode pages using windows-1251, set define('PHPDIG_ENCODING','windows-1251'); in the config file.