View Full Version : PHPDig indexing certain pages
asanad
09-18-2004, 11:05 PM
Hi there,
I have a website that has more than 65 pages, the website is on Server A, this server is a Windows 200 Advanced Server behind a proxy on the Internet.
Currently PHPDig is installed on a Server B a Unix Server in the LAN.
Server A and B have a trust relation begind the proxy, but, for some reason PHPDig indexes only 7 pages from the website.
By the way PHPDig has no problem in converting any documents, since it indexs my Intranet website on Server C (+900 documents) perfectly.
Could you please give me a hint on what to do, or what troubleshooting is needed.
Thanks for your continous help and support. :confused:
asanad
09-20-2004, 01:40 AM
Hi there,
Currently my log file shows me that the indexed documents are the one that had GET statement, all the others that only have HEAD and no GET have not been indexed, an HEAD 200 error has been displayed.
This is the messages and errors I get with my logfile:
2004-09-20 06:07:48 HEAD /robots.txt - 404 PhpDig/1.8.3+(+http://www.phpdig.net/robot.php)
2004-09-20 06:07:48 HEAD /internet/College-of-Medicine.doc_cvt.htm - 200 PhpDig/1.8.3+(+http://www.phpdig.net/robot.php)
2004-09-20 06:07:48 GET /internet/College-of-Medicine.doc_cvt.htm - 200 PhpDig/1.8.3+(+http://www.phpdig.net/robot.php)
2004-09-20 06:07:53 HEAD /internet/CSS/style_internet.css - 200 PhpDig/1.8.3+(+http://www.phpdig.net/robot.php)
2004-09-20 06:07:53 HEAD /internet/index.htm - 200 PhpDig/1.8.3+(+http://www.phpdig.net/robot.php)
dell_10
09-20-2004, 01:46 PM
Hi all
I have the same problem, any hint about it
vinyl-junkie
09-20-2004, 07:23 PM
There's a thread about this same sort of issue that was posted not too long ago. I don't remember the outcome of it though. You might want to do a forum search and see what turns up.
asanad
09-21-2004, 01:41 AM
vinyl-junkie,
thank you for your co-operation, could I please as you for a favour, I tried to search for it and find it, I had no luck, could you give me a tip, where to find it
Thanks
dell_10
09-21-2004, 02:00 PM
thanks vinyl-junkie
but still can't find any thread can help me to solve this problem ...
if you could gave me a link of any thread talk about same problem
dell_10
09-21-2004, 02:05 PM
if you could try index our website and tel me about the result
The Website (http://www.ngha.med.sa/internet/)
thanks
vinyl-junkie
09-21-2004, 06:32 PM
thanks vinyl-junkie
but still can't find any thread can help me to solve this problem ...
if you could gave me a link of any thread talk about same problemIf I knew what keywords to tell you to search for, I'd have posted the link. :) All I can tell you is that the other post was sometime within the last two or three months, and probably in the Troubleshooting forum.
You can speed up your search a little by hovering your mouse over the subject of each post and viewing the first few lines of it. That might give you a clue as to whether you'd want to read that post further, if searching by keywords isn't finding it for you.
vinyl-junkie
09-21-2004, 06:55 PM
if you could try index our website and tel me about the result
The Website (http://www.ngha.med.sa/internet/)
thanks
How many pages does your website have? I've indexed 122 pages in the duration of 1:34 hours! That's awfully slow. Will zip the spider log when I'm done.
Would you like for me to PM that to you or post it here?
dell_10
09-21-2004, 09:43 PM
thanks vinyl-junkie
yes it's include around 122 pages.
if you could PM spider log to me or post here
thanks again
asanad
09-21-2004, 11:39 PM
I am a bit puzzled now,
if PhpDig could search the website from Internet connection, why doesn't it index it locally on my machines. Currently in my LAN Phpdig searches many websites in the Intranet, but for the Internet website (behind a proxy), it only indexes the documents under the root, for example, "hompage/test.htm" could be indexed, but, "homepage/about-us/test.htm" will not be indexed.
Why does it follow only the links for the documents under the root of the website only ?
Do you have any hints ??
vinyl-junkie
09-22-2004, 03:37 AM
thanks vinyl-junkie
yes it's include around 122 pages.
if you could PM spider log to me or post here
thanks again
Just a progress report. It's been (time : 10:15:55) since I started trying to spider you site, and it's still going. 434 documents spidered. I suppose the discrepancy in the numbers you gave is because of the dynamic content.
I'm not gonna keep posting like this, but I did want to point out that the reason it's so slow is most likely because you're on a Windows server. Phpdig just doesn't perform very well there, in my experience.
dell_10
09-22-2004, 02:48 PM
Thanks vinyl-junkie
I'm really confused why you could index it and I could not ,, my partner give you our situation we have 2 servers intranet and internet both (windows servers) phpdig already index intranet (900 pages ) but it's only index 7 pages (only the root folder ) on internet server
Maybe cause internet server behind a proxy !!
vinyl-junkie
09-22-2004, 07:31 PM
Thanks vinyl-junkie
I'm really confused why you could index it and I could not ,, my partner give you our situation we have 2 servers intranet and internet both (windows servers) phpdig already index intranet (900 pages ) but it's only index 7 pages (only the root folder ) on internet server
Maybe cause internet server behind a proxy !!I don't know. I'm not very knowledgeable at all concerning server issues. Sorry.
I've attached the zip file for your spider log. Hope it helps. :)
dell_10
09-23-2004, 09:02 AM
Thanks vinyl-junkie
if you can help me and post your config.php and robots_function.php files so I can compare it with my files.
Thank You
vinyl-junkie
09-23-2004, 06:25 PM
I haven't modified robot_functions.php, but here is my config.php file (with a .txt file extension):
http://search.napathon.net/includes/config.txt
You'll notice that I have some mods in it. I removed the database connection lgoic and made a separate file. Same thing with the admin username and password. Both files are in a directory that is not publicly accessible. Everything else should be pretty close to the "out of the box" version.
Hope this helps. :)
dell_10
09-24-2004, 03:14 PM
Thank You Very Much vinyl-junkie
i will compare it with my config file to see what different
:)
vBulletin® v3.7.3, Copyright ©2000-2025, Jelsoft Enterprises Ltd.