|
|
PhpDig.net
|
What is PhpDig?
PhpDig is a PHP MySQL based
Web Spider & Search Engine.
|
PhpDig is a PHP and MySQL web spider and search engine, released under the
GNU General Public License, and may be downloaded here.
What sites has PhpDig.net crawled with PhpDig?
Only small portions of PhpDig.net and some other sites were indexed for
the online demo. If you wish to see if PhpDig.net has crawled a portion of
your site, you may search the online demo for
apache server
and look at the sites listed in the dropdown box on the results page. On
rare occassions PhpDig.net has used PhpDig to crawl other sites for
testing purposes, but this is usually done if someone posts a
problem about the code on the forums.
But why is the PhpDig robot crawling my site?
As PhpDig is released under GNU GPL, the code is open source, so other
individuals may have downloaded
the PhpDig script and used the robot to crawl your site for inclusion in
their PhpDig search engine. On rare occassions PhpDig.net does use the
PhpDig robot to crawl portions of sites when testing the code. However,
if PhpDig.net does crawl portions of other sites, the index is kept
to a minimum.
PhpDig, the script, outputs a user-agent
[e.g.: PhpDig/1.8.x (+http://www.phpdig.net/robot.php)]
but this does not automatically imply PhpDig.net, the site, is using
PhpDig, the script, to crawl your website. The user-agent is simply
default text, informing you of the robot's presence and giving you a link
to this page. In general, if you notice PhpDig crawling your site, it is
probably not PhpDig.net performing the index.
What does PhpDig do with my site content?
PhpDig is an open source PHP and MySQL web spider and search engine. The
robot portion of PhpDig retrieves site content. The search engine portion
of PhpDig searches content and displays results with links to the relevant
webpages.
How can I stop PhpDig from crawling my site?
PhpDig should obey a robots.txt file, and if set, PhpDig should not crawl
your site when the following text is included in a robots.txt file.
User-agent: PhpDig
Disallow: /
Alternatively, or in conjuction with, a robots.txt file, you can use a
.htaccess file with the following content, assuming you have
mod_rewrite
capabilities.
RewriteEngine on
RewriteBase /
RewriteCond %{REMOTE_ADDR} ^123\.123\.123\.123$ [OR]
RewriteCond %{REMOTE_ADDR} ^234\.234\.234\.234$ [OR]
RewriteCond %{REQUEST_METHOD} !^(GET|POST) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PhpDig [NC]
RewriteRule ^.* - [F]
You may also add a META tag to your webpages to prevent robots in general.
<meta name="robots" content="noindex,nofollow">
Note that while the PhpDig robot attempts to read and follow
robots.txt files and META tags, not all robots do the same. Further, as
the PhpDig script is open source, it is possible for someone to modify the
code such that the PhpDig robot acts differently than originally
coded. In the latter case, a .htaccess file is probably your best
bet.
Who do I contact if I have any questions?
For specific questions on how to prevent PhpDig from crawling your site,
please visit the PhpDig Forum or
contact the
administrator.
For all other PhpDig support questions, please visit the
PhpDig Forum.
|