PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   Troubleshooting (http://www.phpdig.net/forum/forumdisplay.php?f=22)
-   -   PHPDig indexing certain pages (http://www.phpdig.net/forum/showthread.php?t=1327)

asanad 09-18-2004 11:05 PM

PHPDig indexing certain pages
 
Hi there,

I have a website that has more than 65 pages, the website is on Server A, this server is a Windows 200 Advanced Server behind a proxy on the Internet.
Currently PHPDig is installed on a Server B a Unix Server in the LAN.
Server A and B have a trust relation begind the proxy, but, for some reason PHPDig indexes only 7 pages from the website.
By the way PHPDig has no problem in converting any documents, since it indexs my Intranet website on Server C (+900 documents) perfectly.

Could you please give me a hint on what to do, or what troubleshooting is needed.

Thanks for your continous help and support. :confused:

asanad 09-20-2004 01:40 AM

More clarifications
 
Hi there,
Currently my log file shows me that the indexed documents are the one that had GET statement, all the others that only have HEAD and no GET have not been indexed, an HEAD 200 error has been displayed.

This is the messages and errors I get with my logfile:

2004-09-20 06:07:48 HEAD /robots.txt - 404 PhpDig/1.8.3+(+http://www.phpdig.net/robot.php)
2004-09-20 06:07:48 HEAD /internet/College-of-Medicine.doc_cvt.htm - 200 PhpDig/1.8.3+(+http://www.phpdig.net/robot.php)
2004-09-20 06:07:48 GET /internet/College-of-Medicine.doc_cvt.htm - 200 PhpDig/1.8.3+(+http://www.phpdig.net/robot.php)
2004-09-20 06:07:53 HEAD /internet/CSS/style_internet.css - 200 PhpDig/1.8.3+(+http://www.phpdig.net/robot.php)
2004-09-20 06:07:53 HEAD /internet/index.htm - 200 PhpDig/1.8.3+(+http://www.phpdig.net/robot.php)

dell_10 09-20-2004 01:46 PM

have same problem !!!!
 
Hi all
I have the same problem, any hint about it

vinyl-junkie 09-20-2004 07:23 PM

There's a thread about this same sort of issue that was posted not too long ago. I don't remember the outcome of it though. You might want to do a forum search and see what turns up.

asanad 09-21-2004 01:41 AM

vinyl-junkie,

thank you for your co-operation, could I please as you for a favour, I tried to search for it and find it, I had no luck, could you give me a tip, where to find it

Thanks

dell_10 09-21-2004 02:00 PM

thanks vinyl-junkie
but still can't find any thread can help me to solve this problem ...
if you could gave me a link of any thread talk about same problem

dell_10 09-21-2004 02:05 PM

if you could try index our website and tel me about the result
The Website
thanks

vinyl-junkie 09-21-2004 06:32 PM

Quote:

Originally Posted by dell_10
thanks vinyl-junkie
but still can't find any thread can help me to solve this problem ...
if you could gave me a link of any thread talk about same problem

If I knew what keywords to tell you to search for, I'd have posted the link. :) All I can tell you is that the other post was sometime within the last two or three months, and probably in the Troubleshooting forum.

You can speed up your search a little by hovering your mouse over the subject of each post and viewing the first few lines of it. That might give you a clue as to whether you'd want to read that post further, if searching by keywords isn't finding it for you.

vinyl-junkie 09-21-2004 06:55 PM

Quote:

Originally Posted by dell_10
if you could try index our website and tel me about the result
The Website
thanks

How many pages does your website have? I've indexed 122 pages in the duration of 1:34 hours! That's awfully slow. Will zip the spider log when I'm done.

Would you like for me to PM that to you or post it here?

dell_10 09-21-2004 09:43 PM

thanks vinyl-junkie
yes it's include around 122 pages.
if you could PM spider log to me or post here
thanks again

asanad 09-21-2004 11:39 PM

i am really puzzled
 
I am a bit puzzled now,

if PhpDig could search the website from Internet connection, why doesn't it index it locally on my machines. Currently in my LAN Phpdig searches many websites in the Intranet, but for the Internet website (behind a proxy), it only indexes the documents under the root, for example, "hompage/test.htm" could be indexed, but, "homepage/about-us/test.htm" will not be indexed.

Why does it follow only the links for the documents under the root of the website only ?

Do you have any hints ??

vinyl-junkie 09-22-2004 03:37 AM

Quote:

Originally Posted by dell_10
thanks vinyl-junkie
yes it's include around 122 pages.
if you could PM spider log to me or post here
thanks again

Just a progress report. It's been (time : 10:15:55) since I started trying to spider you site, and it's still going. 434 documents spidered. I suppose the discrepancy in the numbers you gave is because of the dynamic content.

I'm not gonna keep posting like this, but I did want to point out that the reason it's so slow is most likely because you're on a Windows server. Phpdig just doesn't perform very well there, in my experience.

dell_10 09-22-2004 02:48 PM

Thanks vinyl-junkie
I'm really confused why you could index it and I could not ,, my partner give you our situation we have 2 servers intranet and internet both (windows servers) phpdig already index intranet (900 pages ) but it's only index 7 pages (only the root folder ) on internet server
Maybe cause internet server behind a proxy !!

vinyl-junkie 09-22-2004 07:31 PM

1 Attachment(s)
Quote:

Originally Posted by dell_10
Thanks vinyl-junkie
I'm really confused why you could index it and I could not ,, my partner give you our situation we have 2 servers intranet and internet both (windows servers) phpdig already index intranet (900 pages ) but it's only index 7 pages (only the root folder ) on internet server
Maybe cause internet server behind a proxy !!

I don't know. I'm not very knowledgeable at all concerning server issues. Sorry.

I've attached the zip file for your spider log. Hope it helps. :)

dell_10 09-23-2004 09:02 AM

Thanks vinyl-junkie
if you can help me and post your config.php and robots_function.php files so I can compare it with my files.
Thank You


All times are GMT -8. The time now is 06:12 PM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.