PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 12-24-2003, 04:42 PM   #16
rwh
Green Mole
 
Join Date: Dec 2003
Posts: 16
First thanks for all your help.
Real Web Host I can remove that because I have files and directories that can not be crawled.

On the other it is crawling now, but even though it has a redirect in it there are still directories in there for that domain.
It is not looking at them at all still. it just jumped over that domain and went to the others.
So maybe just have to do those sub directories manually like I did before I guess.
rwh is offline   Reply With Quote
Old 12-24-2003, 05:32 PM   #17
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Add the following to the top of the robots.txt file and then make the code change listed in this thread.
Code:
User-agent: PhpDig
Disallow:
# whatever else below this
This should let PhpDig follow all the links it finds, and then you can go and delete/exclude certain links/directories from the admin panel.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-24-2003, 05:57 PM   #18
rwh
Green Mole
 
Join Date: Dec 2003
Posts: 16
Any way around the other problem with that domain not reading because it gets redirected
rwh is offline   Reply With Quote
Old 12-24-2003, 06:00 PM   #19
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Post fifteen on the first page of this thread should deal with the redirect.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-24-2003, 06:06 PM   #20
rwh
Green Mole
 
Join Date: Dec 2003
Posts: 16
Ok made that change and I put in the main domain it looks like this notice it does not even try to get sub directories under main domain it does not get anything then goes to number 2 which is the redirect domain name so it gets nothing from main domain name.

SITE : http://www.mansfield-tx.gov/
Exclude paths :
- @NONE@
1:http://www.mansfield-tx.gov/
(time : 00:00:00)
Ok for http://www.ci.mansfield.tx.us/ (site_id:49)

No link in temporary table

--------------------------------------------------------------------------------

links found : 1
http://www.mansfield-tx.gov/

--------------------------------------------------------------------------------
SITE : http://www.ci.mansfield.tx.us/
Exclude paths :
- @NONE@
2:http://www.ci.mansfield.tx.us/

it is still running as we speak over 50 minutes now and on number 41.
rwh is offline   Reply With Quote
Old 12-24-2003, 06:20 PM   #21
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. PhpDig can't index subdirectories/files if there are no links to such. The only thing PhpDig sees at http://www.mansfield-tx.gov/ is the below so, with the changes made in this thread, the only place PhpDig can go to is http://www.ci.mansfield.tx.us and then follow the links from there.
Code:
<html>
<head>
<meta http-equiv="refresh" content="0;url=http://www.ci.mansfield.tx.us"> 
</head>
</html>
You could setup a temp page with links to the subdirectories/files that you want indexed, and after the index is done, then go to the admin panel, click the site holding the temp page, click the update button, click a blue arrow if needed, and then delete the temp page.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-24-2003, 07:04 PM   #22
rwh
Green Mole
 
Join Date: Dec 2003
Posts: 16
Ok understand that
rwh is offline   Reply With Quote
Old 12-24-2003, 07:26 PM   #23
rwh
Green Mole
 
Join Date: Dec 2003
Posts: 16
Just to follow up. I made the temp index.html file itworks getting pages now, for some reason when it got done with domain name it got the same pages using the ip.
rwh is offline   Reply With Quote
Old 12-26-2003, 12:04 PM   #24
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Are you crawling shell or from the browser interface, with FTP on or off? Is there a link somewhere that uses the IP instead of the domain name?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-26-2003, 12:16 PM   #25
rwh
Green Mole
 
Join Date: Dec 2003
Posts: 16
From IE Browser and FTP ON.
Figure there are links with ip in his files, not sure well let him look at them.
rwh is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
pdftotext no solution Art External Binaries 7 04-11-2005 04:39 AM
Dynamic Link Bug with Short Tags (and solution) Zee How-to Forum 0 12-10-2004 07:41 AM
someone help me diggin a solution please nitril Troubleshooting 2 12-24-2003 05:47 AM
Add PDF files to be indexed - Solution chazter Mod Submissions 0 10-07-2003 06:42 AM


All times are GMT -8. The time now is 05:28 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.