PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   How-to Forum (http://www.phpdig.net/forum/forumdisplay.php?f=33)
-   -   robots.txt and URL (http://www.phpdig.net/forum/showthread.php?t=1725)

djavet 01-10-2005 10:53 PM

robots.txt and URL
 
Hello,

I've a robot.txt file wich work perfectly with PhpDig.
Notice the forum folder into:
Code:

User-agent: *
Disallow: /cgi-bin/
Disallow: /flash/
Disallow: /forum/

Into my forum (PhpBB), i've a FAQ section wich I'd index with PhpDig.
It is possible to force a few URLs (in fact 19 entries) when the forum is exclude form indexing?

Thx for your help and time.
Regards; Dominique

Dave A 01-10-2005 11:27 PM

From what I can see your are asking how to make PHP spider a directory that has been excluded in the robots.txt file.
From my experience of the software it is built into PHPDIG to read the robots.txt file and obey it, ethically if the web master has excluded the directory from robots would it be right to try and index it?
No doubt people with more experience than me may well answer your question or know how it's done but I would ask the question "Wouldn't that make the robot spider little more than a hacking device?"
Hopefuly someone with more knowledge will help you.

djavet 01-11-2005 12:18 AM

No ambigous way in my question, but I understand what you mean and I've not think about until you talk about.
I've a forum (www.john-howe.com/forum) with a lot of section and one is about FAQ wich I wisch to include into indexing.
My question is: How can I do that?
I don't want to list into robots.txt my thousand treads :)

It is possible to specified at robot.txt wich URL index? I've found nothing about. Is "Allow:" supported in phpdig?

Regards, Dom

Charter 01-11-2005 02:05 AM

First, remove "Disallow: /forum/" from your robots.txt file.

Next, go to the PhpDig admin panel and copy down the "update sites" values for your site.

Then, enter each FAQ-type URL that you want to index, one per line, in the PhpDig textbox like so:
Code:

http://www.john-howe.com/forum/phpbb/viewtopic.php?t=XXXX
http://www.john-howe.com/forum/phpbb/viewtopic.php?t=YYYY
http://www.john-howe.com/forum/phpbb/viewtopic.php?t=ZZZZ

Now, set the radio button to no, search depth to zero, links per to zero, and click the dig button.

Once PhpDig is done, go to "update sites" and edit the values back to their original settings.

Remember to add "Disallow: /forum/" back to your robots.txt file.

PhpDig currently does not understand robots.txt "allow" lines.

Also, read this documentation for further information.

djavet 01-11-2005 03:19 AM

Super tricks. Thx a lots.
I will try it tonigh.

Regards, Dom


All times are GMT -8. The time now is 07:44 AM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.