PDA

View Full Version : Indexing Help...I am missing something


tscholle
08-02-2005, 03:28 PM
Hello All:

I have not been lucky finding any posts exactly like this and need a little help. Unfourtanatly my site is on an intranet and I can not provide a link for you to review so I will do the best that I can to explain this.

PhpDig v.1.8.7

All of my data is stored in a directory on the site that is broken into a year directory then a month directory. So it looks like this

-->Archives
-->2005
-->January

All of the month directories contain a bunch of html files that are a listed in a html file called publised.html that is also in the month directory.

Everything seems to go fine when I set up Phpdig and index database looks fine. However when I go to search evey link takes you to the published.html file and not the html page that has the data you really want.

What am I doing wrong? Am I choosing something wrong in the search depth?

When I enter what should be indexed I do put in something like this...

http://archive/archives/1990/jan/published.html
http://archive/archives/1990/feb/published.html
http://archive/archives/1990/mar/published.html
http://archive/archives/1990/apr/published.html
http://archive/archives/1990/may/published.html
http://archive/archives/1990/jun/published.html
http://archive/archives/1990/jul/published.html
http://archive/archives/1990/aug/published.html
http://archive/archives/1990/sep/published.html
http://archive/archives/1990/oct/published.html

Is that wrong?

Any help or advice that anyone could offer would be GREATLY appriciated and I thank you in Advance!

Tom Scholle
tjscholle@cbs.com

Charter
08-02-2005, 06:00 PM
So each published.html page contains links to other pages in the archives/year/month/ directory? Try setting LIMIT_TO_DIRECTORY to false and PHPDIG_IN_DOMAIN to true (both in the config file) and then, from the admin panel, use a large search depth, set links per to zero, and use the no option.

tscholle
08-04-2005, 11:11 AM
I am afraid everything still comes back pointing to published.html. Should I change how I set the dig

from http://archive/archives/1990/jan/published.html

to http://archive/archives/1990/jan

Would that fix it? am I limiting it too much?

Charter
08-04-2005, 11:55 AM
What does the HTML from one of the published.html look like? Just attach one of the published.html files, if you will, so I can have a look-see. Also, if you can, attach a screenshot showing the trouble area. This will help me get a better understanding.

tscholle
08-04-2005, 02:39 PM
I have added a zip file with a published.html and a screen shot of the results. I hope that helps. Thank you for your help!

Charter
08-04-2005, 02:56 PM
When you click the published.html link, like the one shown in the screenshot, where are you taken? Also attach one of those 16634f0b.html type files so I can look-see and test.

tscholle
08-04-2005, 04:47 PM
I guess a this point it would help you to know that these pages get created by our NRCS (newsroom computer system). This is an archive of a shows rundown.

When I click on a link like the one in the screenshot I am taken directly to the published.html for that month and not to the story it is refrencing.

I hope am answering the questions correctly here...

Again I thank you for this help!

Tom

Charter
08-04-2005, 09:20 PM
Okay, I did a test using the following setup:

http://www.phpdig.net/temp/published.html
http://www.phpdig.net/temp/16634f0b.html

Where published.html contained the first three links:

<A HREF="16634f0b.html">chinese orch &nbsp;</A><BR>
<A HREF="18634f0b.html">the bit &nbsp;</A><BR>
<A HREF="1c634f0b.html">wt:pegasus &nbsp;</A><BR>

And PhpDig v.1.8.7 was limited to indexing a couple of links.

PhpDig printed out the following:

Spidering in progress... [Stop spider]
SITE : http://www.phpdig.net/
Exclude paths :
- @NONE@
1:http://www.phpdig.net/temp/published.html
(time : 00:00:06)
+
level 1...
2:http://www.phpdig.net/temp/16634f0b.html
(time : 00:00:16)
No link in temporary table
links found : 2
http://www.phpdig.net/temp/published.html
http://www.phpdig.net/temp/16634f0b.html
Optimizing tables...
Indexing complete ! [Back] to admin interface.

A test search on orch yielded the attached image.

What happens if you directly index the following:

http://archive/archives/????/???/16634f0b.html
(replacing the ?'s with year and month info)

If you want to see 16634f0b.html, what do you type in your browser:

http://archive/archives/YYYY/MMM/16634f0b.html or something else?