PDA

View Full Version : How to index a directory with pdf files


simonced
02-11-2004, 07:54 AM
Hello everybody,

I didn't find answer to this question, I post it here in details :
I want to index a folder on my site wich contents only pdf files.
These files are added by an admin page, and I want these pdf files to be findable with phpdig.
(launching once a day with a cron an indexing new or changed files)
so, I don't want to index by http way, but, by the filesystem way.
(Not ftp...)
Is it possible ?

Thanks a lot by advance.

Charter
02-11-2004, 08:05 AM
Hi. You could make one filename.html that links to the PDF files that you want crawled, and index filename.html at level one. After the index is done, just go to the admin panel, click a site, click the update button, click a blue arrow, and delete the filename.html on the right hand side if you don't want it to show in the search results.

simonced
02-12-2004, 12:28 AM
Thanks for your so quick reply :)

I see, it's a good way.
So, I think my file that lists the pdf can be a php script ?
So, it's easy in fact.

How can I crawl only my pdf listing file in shell command ?
$ php [...]/spider.php http://website/fold1/...../foldx/listingfiles.php
(I don't know how to set the level at one this way...)
May I put a robots.txt in the foldx ?

Where can I get a robots configuration help ?

Thank you very much.

Charter
02-13-2004, 11:41 AM
Hi. To get a search depth of one for indexing from shell, set the following in the config.php file:

define('SPIDER_MAX_LIMIT',1); //max recurse levels in spider
define('SPIDER_DEFAULT_LIMIT',1); //default value
define('RESPIDER_LIMIT',1); //recurse limit for update