PDA

View Full Version : Missing files, indexer and strange output


Nosmada
02-27-2004, 01:52 PM
The indexer has indexed all of my folders but has missed most of the files in each folder.. So I tried reindexing just one folder to see if it would get all the files and this is what is happening. Don't know it is doing after Level 1 with all of those times in brackets. Should I let it keep going. What is happening and why is it missing most files in each folder?

Duplicate of an existing document
1:http://www.posterbreak.com/americana/c9587200-native-american.shtml
(time : 00:00:09)

Duplicate of an existing document
2:http://www.posterbreak.com/americana/c95812412-american-entertainment.shtml
(time : 00:00:24)

Duplicate of an existing document
3:http://www.posterbreak.com/americana/c95812411-american-design.shtml
(time : 00:00:30)

Duplicate of an existing document
4:http://www.posterbreak.com/americana/c95812410-american-culture.shtml
(time : 00:00:36)

Duplicate of an existing document
5:http://www.posterbreak.com/americana/
(time : 00:00:49)

level 1...
(time : 00:01:09)

(time : 00:01:22)

(time : 00:01:36)

(time : 00:01:50)

(time : 00:02:05)

(time : 00:02:19)

(time : 00:02:33)

(time : 00:02:52)

(time : 00:03:07)

(time : 00:03:21)

(time : 00:03:35)

(time : 00:03:48)

(time : 00:04:03)

(time : 00:04:17)

(time : 00:04:31)

(time : 00:04:45)

(time : 00:04:59)

(time : 00:05:12)

(time : 00:05:25)

(time : 00:05:40)

(time : 00:05:55)

(time : 00:06:10)

(time : 00:06:24)

(time : 00:06:37)

(time : 00:06:51)

(time : 00:07:05)

(time : 00:07:19)

(time : 00:07:33)

Charter
02-28-2004, 05:07 PM
Hi. Hmm, I've never seen all those times in parentheses like that before. Maybe this has something to do with your MySQL being down the other day? As for the files, are there links to them?

Nosmada
02-29-2004, 03:27 PM
Are there links to which files.

Charter
02-29-2004, 03:28 PM
Hi, to the files that are not being crawled.

Nosmada
02-29-2004, 03:37 PM
Yes, there are links.

The first links in the folder are say here for example:

http://www.posterbreak.com/art/

Then if you click one of the links you will see a whole bunch of other links which are actually in the same art folder, for example:

http://www.posterbreak.com/art/c101319274-art-movements.shtml

Charter
02-29-2004, 04:15 PM
Hi. It seems that it takes PhpDig about ten minutes, depending on machines, traffic, etcetera, to process through the Keyword QuickFind links. Below is output from crawling http://www.posterbreak.com/art/ at a search depth of one. PhpDig hits the Keyword QuickFind links first and after it gets through those it hits the other links. Try setting a search depth of one and let the spider run for a while. What do you get?

SITE : http://www.posterbreak.com/
Exclude paths :
- _private/
- cgi-bin/
- images/
- search/
- searchsite/
- templates/
1:http://www.posterbreak.com/art/
(time : 00:00:13)
+ + + + + + + + + + + + + + +
level 1...
2:http://www.posterbreak.com/art/c5964-museum-landscapes.shtml
(time : 00:11:05)

3:http://www.posterbreak.com/art/c101318251-museum-religious-art.shtml
(time : 00:11:17)

4:http://www.posterbreak.com/art/c6461-museum-still-life.shtml
(time : 00:11:28)

5:http://www.posterbreak.com/art/c101310996-museum-tours.shtml
(time : 00:11:41)

6:http://www.posterbreak.com/art/c101319580-special-mediums.shtml
(time : 00:11:52)

7:http://www.posterbreak.com/privacy.shtml
(time : 00:11:59)

8:http://www.posterbreak.com/art/c10135861-art-by-nationality.shtml
(time : 00:12:10)

9:http://www.posterbreak.com/art/c101319277-four-centuries.shtml
(time : 00:12:27)

10:http://www.posterbreak.com/art/c101319312-museum-floral.shtml
(time : 00:12:38)

11:http://www.posterbreak.com/art/c101319275-museum-figurative.shtml
(time : 00:12:49)

12:http://www.posterbreak.com/art/c101319282-museum-artists.shtml
(time : 00:12:59)

13:http://www.posterbreak.com/art/c101319594-museum-abstract.shtml
(time : 00:13:10)

14:http://www.posterbreak.com/art/c101319274-art-movements.shtml
(time : 00:13:22)

15:http://www.posterbreak.com/contact.shtml
(time : 00:13:33)

16:http://www.posterbreak.com/
(time : 00:13:39)

No link in temporary table

--------------------------------------------------------------------------------

links found : 16
http://www.posterbreak.com/art/
http://www.posterbreak.com/art/c5964-museum-landscapes.shtml
http://www.posterbreak.com/art/c101318251-museum-religious-art.shtml
http://www.posterbreak.com/art/c6461-museum-still-life.shtml
http://www.posterbreak.com/art/c101310996-museum-tours.shtml
http://www.posterbreak.com/art/c101319580-special-mediums.shtml
http://www.posterbreak.com/privacy.shtml
http://www.posterbreak.com/art/c10135861-art-by-nationality.shtml
http://www.posterbreak.com/art/c101319277-four-centuries.shtml
http://www.posterbreak.com/art/c101319312-museum-floral.shtml
http://www.posterbreak.com/art/c101319275-museum-figurative.shtml
http://www.posterbreak.com/art/c101319282-museum-artists.shtml
http://www.posterbreak.com/art/c101319594-museum-abstract.shtml
http://www.posterbreak.com/art/c101319274-art-movements.shtml
http://www.posterbreak.com/contact.shtml
http://www.posterbreak.com/
Optimizing tables...
Indexing complete !

Nosmada
03-03-2004, 01:48 AM
Hi Charter,

Thanks for looking into it. Seems that there are still many files in the folder that are still missing from the index. Since I don't really want to follow the related search results maybe I should comment them out (from the indexer that is). So that it won't take the time to follow them?

Charter
03-03-2004, 02:17 AM
Hi. On the page http://www.posterbreak.com/art/ there should be sixteen links, assuming a search depth of one, not counting javascript, offsite, or excluded path links. The exclude/include comments work like in this (http://www.phpdig.net/showthread.php?threadid=383) thread, so maybe consider sticking the QuickFind links in a separate file and include them in the shtml files using something like the following:

<!--#include file="filename.shtml" -->

Then you could 'turn off' the QuickFind links, and instead include a file with a space, when indexing. Just an idea. :)