PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 12-24-2003, 08:29 AM   #1
rwh
Green Mole
 
Join Date: Dec 2003
Posts: 16
Need Solution Please

I have installed the new version 1.6.5, I have set the correct permissions on the named folders in the instructions to 777. Inside the admin panel I can place the ip of my account in the field and the spider program works great, however if I place my domain name in the field it does not. I have tried a empty robots.txt file and also a file with only protecting the cgi-bin.
Still I get no pages spidered using the domain name. Here is what I get when running the domain name .

SITE : http://www.yourdomain.com/
Exclude paths :
- @NONE@
1:http://www.yourdomain.com/
(time : 00:00:00)

No link in temporary table

--------------------------------------------------------------------------------

links found : 1
http://www.yourdomain.com/
Optimizing tables...
Indexing complete !

We are currently running PHP 4.3.4 , Apache Linux 9.0 and MySql 4.0.15

My current path is /home/username/public_html/search/

search being the phpdig installed directory

Any ideas anyone?
rwh is offline   Reply With Quote
Old 12-24-2003, 09:49 AM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Perhaps something in this thread might help.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-24-2003, 09:57 AM   #3
rwh
Green Mole
 
Join Date: Dec 2003
Posts: 16
No we are not behind any firewall.
rwh is offline   Reply With Quote
Old 12-24-2003, 11:04 AM   #4
rwh
Green Mole
 
Join Date: Dec 2003
Posts: 16
Ok looking at his dns zone we have this in place for ftp.
ftp CNAME domain.com.

--------------------------------------------------------------------------------
rwh is offline   Reply With Quote
Old 12-24-2003, 12:04 PM   #5
rwh
Green Mole
 
Join Date: Dec 2003
Posts: 16
Some more information for you.
I went here search/includes/config.php file


And changed these settings to as follows:


//---------FTP SETTINGS
define('FTP_ENABLE',1);//enable ftp content for distant PhpDig
define('FTP_HOST','mydomainname.com'); //if distant PhpDig, ftp host;
define('FTP_PORT',21); //ftp port
define('FTP_PASV',1); //passive mode
define('FTP_PATH','/home/username/public_html/'); //distant path from the ftp root
define('FTP_TEXT_PATH','text_content');//ftp path to text-content directory
define('FTP_US

And when we try the domain name now we get this.



Warning: ftp_chdir(): Can't change directory to /home/username/public_html/: No such file or directory in /home/username/public_html/search/admin/robot_functions.php on line 1204
Error : Ftp connect failed !
Warning: Cannot modify header information - headers already sent by (output started at /home/username/public_html/search/admin/robot_functions.php:1204) in /home/username/public_html/search/admin/update_frame.php on line 69



Hoping this will help someone to help us.
rwh is offline   Reply With Quote
Old 12-24-2003, 12:30 PM   #6
rwh
Green Mole
 
Join Date: Dec 2003
Posts: 16
ok I step further. I placed this in for ftp settings. I changed this 1 line to this
Define('FTP_PATH',''); //distant path from the ftp root


Now when I run say http://www.domain.com I still do not get any links,but If I run something like this.


http://www.domain.com/testfolder/

It will grab and record everything under that directory.

So getting closer think it is just a setting somewhere. Hopfully someone has run into this before and can help out.
rwh is offline   Reply With Quote
Old 12-24-2003, 01:48 PM   #7
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Are text files showing up in the text_content/ directory? What files are in the / directory? Also, how many text_content directories do you have?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-24-2003, 02:35 PM   #8
rwh
Green Mole
 
Join Date: Dec 2003
Posts: 16
Ok right now I have one text folder called text_content, it has one file in it called keepalive.txt
Now I go to admin and run the domain name by itself on / or sub dir in path. I run it I select depth 5.

I get this

SITE : http://www.mansfield-tx.gov/
Exclude paths :
- @NONE@
1:http://www.mansfield-tx.gov/
(time : 00:00:00)

No link in temporary table
--------------------------------------------------------------------------------
links found : 1
http://www.mansfield-tx.gov/
Optimizing tables...
Indexing complete !


Now I look in the text_content folder and I have 1 file called 387.txt and it is empty. Back to admin interface panel I show now 1 host and 1 page.

Now if I redo this and I add a / and a sub dir to domain name ir works, adds files, keywords and such. It just will not run the domain like this http://www.domain.com

works fine with http://www.domain.com/subdir or the ip http://ip

Last edited by rwh; 12-24-2003 at 02:39 PM.
rwh is offline   Reply With Quote
Old 12-24-2003, 02:49 PM   #9
rwh
Green Mole
 
Join Date: Dec 2003
Posts: 16
Ok found 1 problem the client had a index.html file in his dir that was nothing more than a redirect file very short. So for a test I uploaded a index.html file with alot of text in it. I re ran the test and it picked up the keywords for that index.html page and stored them., It only got the 1 page though it did not attempt to get the other filles or sub dir files.
rwh is offline   Reply With Quote
Old 12-24-2003, 02:52 PM   #10
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Is http://www.mansfield-tx.gov/ the domain you are trying to crawl? If so, the redirect has JavaScript in the middle and end of the HTML so try changing define('CHUNK_SIZE',2048); to define('CHUNK_SIZE',200); in the config file. Does this change make it pick up the links?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-24-2003, 02:58 PM   #11
rwh
Green Mole
 
Join Date: Dec 2003
Posts: 16
Samething no change and yes that is the site, it only gets index.html
rwh is offline   Reply With Quote
Old 12-24-2003, 03:12 PM   #12
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. What happens if you crawl http://www.ci.mansfield.tx.us directly?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-24-2003, 03:44 PM   #13
rwh
Green Mole
 
Join Date: Dec 2003
Posts: 16
It crawled it without any problems. I placed realwebhost.net in there and it did not crawl it either same as the other one
rwh is offline   Reply With Quote
Old 12-24-2003, 03:51 PM   #14
rwh
Green Mole
 
Join Date: Dec 2003
Posts: 16
What should be in the robots file to make this work?
rwh is offline   Reply With Quote
Old 12-24-2003, 04:29 PM   #15
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Temporarily remove the robots.txt file from realwebhost.net and then PhpDig should crawl it. Some tweaks need to be done when PhpDig reads the robots.txt file, as it's too restrictive now, but there isn't a list of tweaks ready.

The deal with the Mansfield site is that PhpDig won't follow the redirect. To fix this, change define('PHPDIG_IN_DOMAIN',false); to define('PHPDIG_IN_DOMAIN',true); in the config.php file, and also make the change listed in this thread.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
pdftotext no solution Art External Binaries 7 04-11-2005 04:39 AM
Dynamic Link Bug with Short Tags (and solution) Zee How-to Forum 0 12-10-2004 07:41 AM
someone help me diggin a solution please nitril Troubleshooting 2 12-24-2003 05:47 AM
Add PDF files to be indexed - Solution chazter Mod Submissions 0 10-07-2003 06:42 AM


All times are GMT -8. The time now is 01:43 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.