![]() |
First thanks for all your help.
Real Web Host I can remove that because I have files and directories that can not be crawled. On the other it is crawling now, but even though it has a redirect in it there are still directories in there for that domain. It is not looking at them at all still. it just jumped over that domain and went to the others. So maybe just have to do those sub directories manually like I did before I guess. |
Hi. Add the following to the top of the robots.txt file and then make the code change listed in this thread.
Code:
User-agent: PhpDig |
Any way around the other problem with that domain not reading because it gets redirected
|
Hi. Post fifteen on the first page of this thread should deal with the redirect.
|
Ok made that change and I put in the main domain it looks like this notice it does not even try to get sub directories under main domain it does not get anything then goes to number 2 which is the redirect domain name so it gets nothing from main domain name.
SITE : http://www.mansfield-tx.gov/ Exclude paths : - @NONE@ 1:http://www.mansfield-tx.gov/ (time : 00:00:00) Ok for http://www.ci.mansfield.tx.us/ (site_id:49) No link in temporary table -------------------------------------------------------------------------------- links found : 1 http://www.mansfield-tx.gov/ -------------------------------------------------------------------------------- SITE : http://www.ci.mansfield.tx.us/ Exclude paths : - @NONE@ 2:http://www.ci.mansfield.tx.us/ it is still running as we speak over 50 minutes now and on number 41. |
Hi. PhpDig can't index subdirectories/files if there are no links to such. The only thing PhpDig sees at http://www.mansfield-tx.gov/ is the below so, with the changes made in this thread, the only place PhpDig can go to is http://www.ci.mansfield.tx.us and then follow the links from there.
Code:
<html> |
Ok understand that
|
Just to follow up. I made the temp index.html file itworks getting pages now, for some reason when it got done with domain name it got the same pages using the ip.
|
Hi. Are you crawling shell or from the browser interface, with FTP on or off? Is there a link somewhere that uses the IP instead of the domain name?
|
From IE Browser and FTP ON.
Figure there are links with ip in his files, not sure well let him look at them. |
All times are GMT -8. The time now is 10:25 AM. |
Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.