PhpDig.net - View Single Post

Andreas_Wien · 04-24-2004, 12:33 PM

Ja, right, sorry about that - The site is in some respects not finished yet, nevertheless it should be searchable already.

I hope that doesnt affect phpdig in any way. I'm not troubled if phpdig doesnt index a page that doesnt exist. It seems difficult enough to get the existing pages indexed ;-) !

Some additional info about that site:
Every page has several modes of appearance, controled by the S-parameter. I intended to hide this apparent duplicate pages from phpdig by dynamically adding a line:
<meta name="robots" content="noindex,nofollow,none">
iff an S-parameter is passed to the page. So only the simple page (without any S-parameter) should be indexed, they carry a line:
<meta name="robots" content="index,follow">

And even if phpdig hangs in one branch, why doesn't it finish spidering the other branches of the site? And why does it change it's behavior (number of pages successfully indexed) every time I dig the site?

Still confused ... are my assumptions in the initial posting correct?

And the main point is: portal.node/ is EXCLUDED in the DB and in robots.txt. the URL of uups.php lies on that path. Which precautions do I have to take on such pages in order to have phpdig spider the rest of the site that is not explicitly excluded?

Greets from Vienna, Andreas

04-24-2004, 12:33 PM	#9
Andreas_Wien Green Mole Join Date: Apr 2004 Posts: 4	Ja, right, sorry about that - The site is in some respects not finished yet, nevertheless it should be searchable already. I hope that doesnt affect phpdig in any way. I'm not troubled if phpdig doesnt index a page that doesnt exist. It seems difficult enough to get the existing pages indexed ;-) ! Some additional info about that site: Every page has several modes of appearance, controled by the S-parameter. I intended to hide this apparent duplicate pages from phpdig by dynamically adding a line: <meta name="robots" content="noindex,nofollow,none"> iff an S-parameter is passed to the page. So only the simple page (without any S-parameter) should be indexed, they carry a line: <meta name="robots" content="index,follow"> And even if phpdig hangs in one branch, why doesn't it finish spidering the other branches of the site? And why does it change it's behavior (number of pages successfully indexed) every time I dig the site? Still confused ... are my assumptions in the initial posting correct? And the main point is: portal.node/ is EXCLUDED in the DB and in robots.txt. the URL of uups.php lies on that path. Which precautions do I have to take on such pages in order to have phpdig spider the rest of the site that is not explicitly excluded? Greets from Vienna, Andreas Last edited by Andreas_Wien; 04-24-2004 at 12:42 PM.