PhpDig.net

What is PhpDig?
PhpDig is a PHP MySQL based
Web Spider & Search Engine.




PhpDig is a PHP and MySQL web spider and search engine, released under the GNU General Public License, and may be downloaded here.
What sites has PhpDig.net crawled with PhpDig?
Only small portions of PhpDig.net and some other sites were indexed for the online demo. If you wish to see if PhpDig.net has crawled a portion of your site, you may search the online demo for apache server and look at the sites listed in the dropdown box on the results page. On rare occassions PhpDig.net has used PhpDig to crawl other sites for testing purposes, but this is usually done if someone posts a problem about the code on the forums.

But why is the PhpDig robot crawling my site?
As PhpDig is released under GNU GPL, the code is open source, so other individuals may have downloaded the PhpDig script and used the robot to crawl your site for inclusion in their PhpDig search engine. On rare occassions PhpDig.net does use the PhpDig robot to crawl portions of sites when testing the code. However, if PhpDig.net does crawl portions of other sites, the index is kept to a minimum.

PhpDig, the script, outputs a user-agent [e.g.: PhpDig/1.8.x (+http://www.phpdig.net/robot.php)] but this does not automatically imply PhpDig.net, the site, is using PhpDig, the script, to crawl your website. The user-agent is simply default text, informing you of the robot's presence and giving you a link to this page. In general, if you notice PhpDig crawling your site, it is probably not PhpDig.net performing the index.

What does PhpDig do with my site content?
PhpDig is an open source PHP and MySQL web spider and search engine. The robot portion of PhpDig retrieves site content. The search engine portion of PhpDig searches content and displays results with links to the relevant webpages.

How can I stop PhpDig from crawling my site?
PhpDig should obey a robots.txt file, and if set, PhpDig should not crawl your site when the following text is included in a robots.txt file.
User-agent: PhpDig
Disallow: /
Alternatively, or in conjuction with, a robots.txt file, you can use a .htaccess file with the following content, assuming you have mod_rewrite capabilities.
RewriteEngine on
RewriteBase /
RewriteCond %{REMOTE_ADDR} ^123\.123\.123\.123$ [OR]
RewriteCond %{REMOTE_ADDR} ^234\.234\.234\.234$ [OR]
RewriteCond %{REQUEST_METHOD} !^(GET|POST) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PhpDig [NC]
RewriteRule ^.* - [F]
You may also add a META tag to your webpages to prevent robots in general.
<meta name="robots" content="noindex,nofollow">
Note that while the PhpDig robot attempts to read and follow robots.txt files and META tags, not all robots do the same. Further, as the PhpDig script is open source, it is possible for someone to modify the code such that the PhpDig robot acts differently than originally coded. In the latter case, a .htaccess file is probably your best bet.

Who do I contact if I have any questions?
For specific questions on how to prevent PhpDig from crawling your site, please visit the PhpDig Forum or contact the administrator.

For all other PhpDig support questions, please visit the PhpDig Forum.


Powered by: vBulletin Version 3.0.7
Copyright ©2000 - 2005, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.