PDA

View Full Version : Remove Index pages


bigals
03-10-2004, 12:42 AM
Hi I have php dig set up on a web site and I'm having trouble getting round a problem.

PHP DIG indexes pages when spidering, these pages are roots of folders, they appear in the spider directory as '-' entries, these are then listed when a user searches, when they are clicked they just open folder pages on my web site such as:

www.domain.com/html/one/

thats a bit confusing so try this description:

if you have a page called:

www.domain.com/html/one/one.html

the spider indexes this link but also indexes the root folder of that html page, so you get both of these results:

www.domain.com/html/one/one.html
www.domain.com/html/one/

I don't want these folders indexed, i just want the html pages indexed,

SO, does anyone know how to get round this, your help will be much appreciated.

Charter
03-10-2004, 09:03 AM
Hi. Perhaps set an htaccess file with the following line at the top of the file:

Options -Indexes

bigals
03-10-2004, 02:21 PM
hi, where would i put this file and what do i call it, im not familiar with htaccess,

also is that the only content that would be needed in the htaccess file?

Charter
03-10-2004, 03:49 PM
Hi. You could try adding something like ^/dir/to/ban/$ in the BANNED constant in the config file. Otherwise, try making a file named .htaccess containing Options -Indexes on one line and stick in your web root directory, assuming your OS/setup allows for htaccess files.

bigals
03-11-2004, 05:27 AM
ok, cheers, i added the .htaccess file and it seemed to work, but now my spider only goes one link deep into the site which means i'm only getting about 15 pages indexed, any ideas why this is happening?

bigals
03-11-2004, 09:48 AM
right, well now its sppidering properly so thats ok, but its still displaying these index pages in the serach results, what was that config file script all about and how do i add that, i don't want to cock anything up, im not very experienced with this program so any help on placing that code would be appreciated

cheers, BIGALS!

Charter
03-11-2004, 02:14 PM
>> ...getting about 15 pages indexed, any ideas why...

Hi. When directory listing is on, there are links to all of the files. With directory listing set to off, only those links found in your site would be crawled.

>> ...its still displaying these index pages in the serach...

Perhaps an easier way would be to go to the admin panel, choose the site, click the update button, click a blue arrow on the left until you see the '-' entry on the right, and then click the red X on the right next to the '-' to delete them.

bigals
03-11-2004, 03:30 PM
i have about 1500 of those little '-' indexes so i'm not going to delet them one by one.

That config script you gave me means i'd have to set each index to be banned too, is that correct?, any ideas now?

i can't beleive no one else has this problem, who in their right mind wants index pages to be indexed???

Charter
03-11-2004, 04:33 PM
Hi. I didn't realize your site was so big. :( Yes, that would be a very tedious process. Anyway, not tested much but perhaps try the following.

Make a file called cleanup_dashes.php with the below content, stick it in the PhpDig admin directory, and call it from the browser. Once done, run the other cleans to shore up the engine (remove orphan keywords, etcetera).

<?php

echo "<html><body>";
$count = 0;
$relative_script_path = '..';

include "$relative_script_path/includes/config.php";
include "$relative_script_path/libs/auth.php";
include "$relative_script_path/admin/robot_functions.php";

$query = mysql_query("SELECT spider_id FROM ".PHPDIG_DB_PREFIX."spider WHERE file = '';");

while ($row = mysql_fetch_array($query)) {
mysql_query("DELETE FROM ".PHPDIG_DB_PREFIX."engine WHERE spider_id=".$row['spider_id'].";");
mysql_query("DELETE FROM ".PHPDIG_DB_PREFIX."spider WHERE spider_id=".$row['spider_id'].";");
phpdigDelText($relative_script_path,$spider_id);
$count++;
echo $count . "<br>\n";
}

echo "<br>Done. <a href=\"index.php\" target=\"_top\">[Back]</a> to admin interface.";
echo "</body></html>";

?>

Remember to remove any "word" wrapping in the above code.

bigals
03-11-2004, 11:37 PM
Right it tried that but it keeps returning the error:

cannot modify header info, headers already sent by auth.php
...and this french afterwards -
Vous ne pouvez acc√©der √* cette page

(The header errors are longer than that its just i'm abreviating them)

also the '-' symbol is not contained within the script you sent me so how is the script finding and removing the '-' entries?

$query = mysql_query("SELECT spider_id FROM ".PHPDIG_DB_PREFIX.

in the querie it states:
"spider WHERE file = ' ';");

shouldn't it be:
"spider WHERE file = '-'");

i don't know its just i'm trying to figure out the script.

Charter
03-11-2004, 11:57 PM
Hi. The username and password should be the ones in the config.php file. Otherwise just comment that include auth.php line out, but make sure to protect the admin directory so nobody else can run those scripts. Also, the query should be just file = '' (two single quotes, no space inbetween). The '-' is what you see onscreen, not what is in the table.

bigals
03-12-2004, 12:16 AM
My Friend, you are a star!!!

That worked first time, brilliant, I really appreciate all the help you have given me.

My search results are fine now and I'm chuffed to bits, its looks a million times better now,

AMAZING!!!

thanks again,

BIGALS.