PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 11-25-2003, 07:30 AM   #1
bigals
Orange Mole
 
Join Date: Nov 2003
Posts: 41
index only HTML files

a have indexed my site and it indexes .html and .swf files,
it also indexes the file directory. i.e.: '-'

but i just want the html files to be indexed is there a way of setting this if so how and where because i cant find it anywhere,

the '-' index links are the biggest problem, the swf files don'e really matter,

can anyone please help me!!!

cheers,

alex
bigals is offline   Reply With Quote
Old 11-25-2003, 09:17 AM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. You might try adding a robots.txt file in web root with the following, assuming it's the index.html to the main site that you don't want to crawl:

User-agent: PhpDig
Disallow: index.html

To remove the '-' index links that were crawled, go to the admin panel, click a site, click the update button, click a blue arrow, and on the right side, click a red X for those links you want to delete.

Another option, if you have shell access, would be to crawl via command line using a text file, where only the links you want crawled are in the text file, one per line. There are three options in the config file (SPIDER_MAX_LIMIT, SPIDER_DEFAULT_LIMIT, RESPIDER_LIMIT) that can be set to limit the number of levels crawled when using shell to index.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 11-26-2003, 12:41 AM   #3
bigals
Orange Mole
 
Join Date: Nov 2003
Posts: 41
not what i meant

cheers, i meant that within each folder it spiders there is three results for example:

-
hello.txt
hello.swf

the top result is just a '-' but it links to the folder itself so you get a kind of ftp page not a html page, the swf doesn't really matter because i dont think it appears as a result in any searches.

but when i said index i meant the ftp version of the folder in question does that make sense, surely there is a way of tellin phpdig to ONLY index html files and no folders or files without the .html file type

hope this makes more sense,

alex.
bigals is offline   Reply With Quote
Old 11-26-2003, 08:16 AM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Is there a link from dir/filename.html to just dir/ in the filename.html files? What are the filenames of the html files? You might try setting a .htaccess file in web root with the following as the first line:

Options -Indexes

For the swf files, try adding swf to the FORBIDDEN_EXTENSIONS list in the config file.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 11-26-2003, 02:50 PM   #5
bigals
Orange Mole
 
Join Date: Nov 2003
Posts: 41
i dont think there are links from filenames.html

i dont think there are links from filenames.html to dir/
couldn't i add '-' to the forbidden extentions list or will that just mess it all up?

the html files are named by regions and towns in england, i.e. 'norwich.html', they are not called 'index.html' if thats what you were thinking perhaps.

do you get these directory indexs in you spider results?

alex
bigals is offline   Reply With Quote
Old 11-26-2003, 03:32 PM   #6
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. I wouldn't add '-' to the forbidden extentions because it isn't an extension; it's just a representation for domain.com. Yes, I do get '-' in my results. Did using the .htaccess file work?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 11-26-2003, 03:38 PM   #7
bigals
Orange Mole
 
Join Date: Nov 2003
Posts: 41
i don't know how to get .htaccess files made or added to my site root, do you get the 'index of blah blah blah' in your search results?

if i type index of into my search field and click go i get a huge list of search results made up of the pages i don't want listed do you get the same?
bigals is offline   Reply With Quote
Old 11-26-2003, 03:58 PM   #8
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. No, I don't get that because I don't allow directory listings. The attached zip file contains a .htaccess file. Just FTP the .htaccess file to your web root in ASCII mode.
Attached Files
File Type: zip htaccess.zip (132 Bytes, 17 views)
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 11-26-2003, 04:01 PM   #9
bigals
Orange Mole
 
Join Date: Nov 2003
Posts: 41
cheers, o.k. i'll have to ask my domain hosts to tell me what my root is because they have set it up and may have changed things round a bit, i'll reply and tell how it goes, cheers!

alex
bigals is offline   Reply With Quote
Old 11-26-2003, 04:04 PM   #10
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. The place to FTP the file is the same place the main index.html file would go for your site. For instance, if your main site page is domain.com/index.html, then the web root is where this index.html file resides.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 11-27-2003, 01:32 AM   #11
bigals
Orange Mole
 
Join Date: Nov 2003
Posts: 41
aargghh!!! if i place the htaccess file on the server it restricts access to the phpdig administration page, even if i rename the index.php page it still wont allow access.

if that had workied it would've been cool, sorry.
any other ideas on how to avoid this problem, i'd deal with it normally but the index directories get in the way of the actuall relevent results of the serach you see.

cheers,

alex
bigals is offline   Reply With Quote
Old 11-27-2003, 08:03 AM   #12
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Another option would be to make one filename.html that links to the files that you want crawled and index filename.html at level one. After the index is done, just go to the admin panel, click a site, click the update button, click a blue arrow, and delete the '-' on the right hand side.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-01-2003, 10:34 AM   #13
bigals
Orange Mole
 
Join Date: Nov 2003
Posts: 41
sorry for the long wait,

but i have constantly changing html pages, new ones created regularly, so i need to somehow disable the index directories from being indexed, there must be something in the phpdig that tells the engine to search and display these pages else they wouldn't be indexed,

who may know how to find and disable such a function?

thankyou,

alex
bigals is offline   Reply With Quote
Old 12-01-2003, 12:13 PM   #14
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Do your HTML pages link to the index directories?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-01-2003, 12:25 PM   #15
bigals
Orange Mole
 
Join Date: Nov 2003
Posts: 41
no

the pages i have made do not link to the index directory, i was wondering the same thing,

the link created by the spider/indexer is a link to the directory not a html file, phpdig is finding the index of a folder and displaying it as a link:

see here is an example i have taken from the site:

http://www.robotstxt.org/wc/

the '-' i get is linking to pages the same as the above link:

so if i search my 'cars' html page on my site i get results that link to addresses like:
(these are made up examples)

'cars/cars.html' and 'cars/'
bigals is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
index only *.doc files ? ipguy Troubleshooting 1 01-16-2006 03:45 PM
How to index only local directory files? sf44 How-to Forum 0 01-28-2005 02:56 AM
How to make phpdig index certain content, located in certain html tags?! r3m How-to Forum 1 11-18-2004 05:27 PM
Can only index files in a single directory gcrachel Troubleshooting 5 09-28-2004 06:23 AM
Index on html pages build by template Magnetic Core How-to Forum 1 09-07-2004 10:06 AM


All times are GMT -8. The time now is 01:52 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.