PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > How-to Forum

Reply
 
Thread Tools
Old 03-10-2004, 12:42 AM   #1
bigals
Orange Mole
 
Join Date: Nov 2003
Posts: 41
Remove Index pages

Hi I have php dig set up on a web site and I'm having trouble getting round a problem.

PHP DIG indexes pages when spidering, these pages are roots of folders, they appear in the spider directory as '-' entries, these are then listed when a user searches, when they are clicked they just open folder pages on my web site such as:

www.domain.com/html/one/

thats a bit confusing so try this description:

if you have a page called:

www.domain.com/html/one/one.html

the spider indexes this link but also indexes the root folder of that html page, so you get both of these results:

www.domain.com/html/one/one.html
www.domain.com/html/one/

I don't want these folders indexed, i just want the html pages indexed,

SO, does anyone know how to get round this, your help will be much appreciated.
bigals is offline   Reply With Quote
Old 03-10-2004, 09:03 AM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Perhaps set an htaccess file with the following line at the top of the file:
Code:
Options -Indexes
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 03-10-2004, 02:21 PM   #3
bigals
Orange Mole
 
Join Date: Nov 2003
Posts: 41
hi, where would i put this file and what do i call it, im not familiar with htaccess,

also is that the only content that would be needed in the htaccess file?
bigals is offline   Reply With Quote
Old 03-10-2004, 03:49 PM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. You could try adding something like ^/dir/to/ban/$ in the BANNED constant in the config file. Otherwise, try making a file named .htaccess containing Options -Indexes on one line and stick in your web root directory, assuming your OS/setup allows for htaccess files.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 03-11-2004, 05:27 AM   #5
bigals
Orange Mole
 
Join Date: Nov 2003
Posts: 41
ok, cheers, i added the .htaccess file and it seemed to work, but now my spider only goes one link deep into the site which means i'm only getting about 15 pages indexed, any ideas why this is happening?
bigals is offline   Reply With Quote
Old 03-11-2004, 09:48 AM   #6
bigals
Orange Mole
 
Join Date: Nov 2003
Posts: 41
right, well now its sppidering properly so thats ok, but its still displaying these index pages in the serach results, what was that config file script all about and how do i add that, i don't want to cock anything up, im not very experienced with this program so any help on placing that code would be appreciated

cheers, BIGALS!
bigals is offline   Reply With Quote
Old 03-11-2004, 02:14 PM   #7
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
>> ...getting about 15 pages indexed, any ideas why...

Hi. When directory listing is on, there are links to all of the files. With directory listing set to off, only those links found in your site would be crawled.

>> ...its still displaying these index pages in the serach...

Perhaps an easier way would be to go to the admin panel, choose the site, click the update button, click a blue arrow on the left until you see the '-' entry on the right, and then click the red X on the right next to the '-' to delete them.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 03-11-2004, 03:30 PM   #8
bigals
Orange Mole
 
Join Date: Nov 2003
Posts: 41
i have about 1500 of those little '-' indexes so i'm not going to delet them one by one.

That config script you gave me means i'd have to set each index to be banned too, is that correct?, any ideas now?

i can't beleive no one else has this problem, who in their right mind wants index pages to be indexed???

Last edited by bigals; 03-11-2004 at 03:33 PM.
bigals is offline   Reply With Quote
Old 03-11-2004, 04:33 PM   #9
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. I didn't realize your site was so big. Yes, that would be a very tedious process. Anyway, not tested much but perhaps try the following.

Make a file called cleanup_dashes.php with the below content, stick it in the PhpDig admin directory, and call it from the browser. Once done, run the other cleans to shore up the engine (remove orphan keywords, etcetera).
PHP Code:
<?php

echo "<html><body>";
$count 0;
$relative_script_path '..';

include 
"$relative_script_path/includes/config.php";
include 
"$relative_script_path/libs/auth.php";
include 
"$relative_script_path/admin/robot_functions.php";

$query mysql_query("SELECT spider_id FROM ".PHPDIG_DB_PREFIX."spider WHERE file = '';");

while (
$row mysql_fetch_array($query)) {
  
mysql_query("DELETE FROM ".PHPDIG_DB_PREFIX."engine WHERE spider_id=".$row['spider_id'].";");
  
mysql_query("DELETE FROM ".PHPDIG_DB_PREFIX."spider WHERE spider_id=".$row['spider_id'].";");
  
phpdigDelText($relative_script_path,$spider_id);
  
$count++;
  echo 
$count "<br>\n";
}

echo 
"<br>Done. <a href=\"index.php\" target=\"_top\">[Back]</a> to admin interface.";
echo 
"</body></html>";

?>
Remember to remove any "word" wrapping in the above code.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 03-11-2004, 11:37 PM   #10
bigals
Orange Mole
 
Join Date: Nov 2003
Posts: 41
Right it tried that but it keeps returning the error:

cannot modify header info, headers already sent by auth.php
...and this french afterwards -
Vous ne pouvez accéder Ã* cette page

(The header errors are longer than that its just i'm abreviating them)

also the '-' symbol is not contained within the script you sent me so how is the script finding and removing the '-' entries?

$query = mysql_query("SELECT spider_id FROM ".PHPDIG_DB_PREFIX.

in the querie it states:
"spider WHERE file = ' ';");

shouldn't it be:
"spider WHERE file = '-'");

i don't know its just i'm trying to figure out the script.
bigals is offline   Reply With Quote
Old 03-11-2004, 11:57 PM   #11
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. The username and password should be the ones in the config.php file. Otherwise just comment that include auth.php line out, but make sure to protect the admin directory so nobody else can run those scripts. Also, the query should be just file = '' (two single quotes, no space inbetween). The '-' is what you see onscreen, not what is in the table.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 03-12-2004, 12:16 AM   #12
bigals
Orange Mole
 
Join Date: Nov 2003
Posts: 41
My Friend, you are a star!!!

That worked first time, brilliant, I really appreciate all the help you have given me.

My search results are fine now and I'm chuffed to bits, its looks a million times better now,

AMAZING!!!

thanks again,

BIGALS.
bigals is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
How to index other pages but not farther from them? WebSpider Mod Requests 5 02-07-2005 06:18 PM
how to index only not indexed pages? zaartix How-to Forum 2 07-14-2004 04:23 AM
How do you index dynamice pages? orbitalz How-to Forum 2 05-10-2004 04:06 PM
converted from html pages to php pages now no pages will index!!! help!! bigals Troubleshooting 24 04-01-2004 09:34 AM
do not index all pages robilix Troubleshooting 2 11-25-2003 01:50 PM


All times are GMT -8. The time now is 12:08 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.