PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > How-to Forum

Reply
 
Thread Tools
Old 10-04-2004, 06:20 AM   #1
arena75
Green Mole
 
Join Date: Jul 2004
Posts: 5
Excluding only one link

I have been walking around with this problem too long. I hope someone can help.

I have quite a big site, when I spider it, I get about 8000 pages. But most of them are duplicate, about 6500 of them.

Those are the pages to compose a message to a forum poster. Like:
.../forum/messagecompose.asp?senduser=pluimenest&topic=1227&recordnum=20

I tried taking out the variable senduser. (the others, topic and recordnum I cannot take out, cause they are used on other pages as well)

I also tried using phpdigInclude and phpdigExclude to not get that page indexed. The page is out of the searchresults, but they still get spidered. 6500 times a page that is spiderred but not indexed, still takes 9 hours. ( I know I can set the interval time lower, but thats not a sollution)

I do want to have something, so the file messagecompose.asp won't be spiderred at all. Easiest would be if there is a way to have an exclude/include tag that won't follow links between the tags. But not ruining the orginal exclude/include tags.

This way I only will have to update one page, setting the tags, and I lose 6500 indexed pages, and have a gain of 9 hours. Anyone can help me with this?

Thanx
arena75 is offline   Reply With Quote
Old 10-04-2004, 10:25 AM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Try sticking messagecompose.asp in a robots.txt file.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 10-04-2004, 03:16 PM   #3
arena75
Green Mole
 
Join Date: Jul 2004
Posts: 5
This is in my robots.txt :

User-agent: PhpDig
Disallow: /pcs/philboard_reply.asp
Disallow: /pcs/messagecompose.asp
Disallow: /pcs/nologin.asp

as you can see there are more sites I don't want to index.
But it's not working, I started spiderring again, and still all those pages are spiderred.

Quote:
levels 2...
Meta Robots = NoIndex, or already indexed : Geen inhoud geindexeerd
3:http://www.xxxxxxxx.com/pcs/philboar...50&recordnum=0
(tijd : 00:00:45)
+ + + +
levels 3...
Meta Robots = NoIndex, or already indexed : Geen inhoud geindexeerd
4:http://www.xxxxxxx.com/pcs/nologin.a...50&recordnum=0
(tijd : 00:00:59)
+ + +
levels 4...
Meta Robots = NoIndex, or already indexed : Geen inhoud geindexeerd
5:http://www.xxxxxxx.com/pcs/philboard...50&recordnum=0
(tijd : 00:01:13)
+ +
levels 5...
Meta Robots = NoIndex, or already indexed : Geen inhoud geindexeerd
6:http://www.xxxxxx.com/pcs/nologin.as...50&recordnum=0
(tijd : 00:01:26)
+
levels 6...
Meta Robots = NoIndex, or already indexed : Geen inhoud geindexeerd
7:http://www.xxxxxx.com/pcs/philboard_...50&recordnum=0
(tijd : 00:01:39)
So the bot still spiders them all.
Isn't there a way to make phpinclude/phpexclude to not follow links?
arena75 is offline   Reply With Quote
Old 10-10-2004, 12:26 AM   #4
arena75
Green Mole
 
Join Date: Jul 2004
Posts: 5
Anyone any idea how to stop those links to be spidered?
arena75 is offline   Reply With Quote
Old 10-10-2004, 08:56 AM   #5
vinyl-junkie
Purple Mole
 
Join Date: Jan 2004
Posts: 694
This is just a guess at a solution, but go into your admin panel and delete one of those pages from your index that you don't want there. Now, with your robots.txt in place as discussed above, try indexing just that one page and see what phpdig does with it.

I'm thinking that perhaps phpdig isn't clearing out old url's that you don't want in the index, but with them gone from your database and rules in place to exclude them from future indexing, maybe phpdig won't try to add them back in.

Let us know if that worked or not.
vinyl-junkie is offline   Reply With Quote
Old 10-10-2004, 01:46 PM   #6
arena75
Green Mole
 
Join Date: Jul 2004
Posts: 5
I feel so stupid.

Quote:
SITE : http://www.xxxxxxxx.com/
Uit te sluiten paden :
- pcs/philboard_reply.asp
- pcs/messagecompose.asp
- pcs/nologin.asp
- pcs/signupform.asp
- pcs/signinpage.asp
1:http://www.xxxxxxx.com/pcs/philboard_read.asp?id=936
(tijd : 00:00:16)

Geen link in tijdelijke tabel

--------------------------------------------------------------------------------

gevonden links : 1
http://www.xxxxx.com/pcs/philboard_read.asp?id=936
Optimizing tables...
Indexeren kompleet !
It works now.
Only one "s" was the problem, I had my robot text file called: robot.txt
After changing it to robots.txt it works great.

So stupid of me.
Thanx for all the help.
arena75 is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Excluding/Including by Path djwm How-to Forum 2 01-02-2005 04:13 AM
Excluding phpsessid's Slider How-to Forum 1 01-01-2005 04:35 PM
excluding keywords from indexing Fking How-to Forum 1 10-05-2004 05:43 PM
Excluding toolbars etc cgbrowne How-to Forum 1 04-14-2004 05:39 AM
excluding problem schaapje Troubleshooting 1 03-30-2004 09:51 AM


All times are GMT -8. The time now is 09:13 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.