PDA

View Full Version : Spider test for me


DrKamikaze83
02-17-2004, 01:13 AM
Can somebody spider http://www.ebay.com and http://www.dovebid.com and show me the result.


Spidering doesn't work for myself.

thanks
Alex

DrKamikaze83
02-17-2004, 05:36 AM
hi,

i have tried it for many sites in the Inet and it doesn't work.

As a last i tried it on my localhost.
On my localhost everything works wonderful.


I don't know what the problem is.

Charter
02-17-2004, 11:16 AM
Hi. Do you get any errors when trying to index online? Is safe_mode set to on?

DrKamikaze83
02-17-2004, 11:08 PM
in phpinfo safe mode is off, but maybe there is something in the script tat i have forgotten to change.

Online there is no spidering possible. At any site in the internet he only detect the host like www.ebay.com and no pages. It is reagardlessof which page. It's always the same.

hi have read and tried all threads for safe_mode, but i can't arrive to get it work.
Please help me.


thanks
Alex

Charter
02-17-2004, 11:49 PM
Hi. Setup a small three page demo like below and then index the main.html page using a search depth of one, and then wait several minutes before touching the browser. What do you see onscreen after several minutes?

http://www.domain/testdir/main.html

<html>
<body>
main page
<a href="page1.html">page1</a>
<a href="page2.html">page2</a>
</body>
</html>

http://www.domain/testdir/page1.html

<html>
<body>
page one
</body>
</html>

http://www.domain/testdir/page2.html

<html>
<body>
page two
</body>
</html>

DrKamikaze83
02-18-2004, 12:39 AM
hi i tried it, but it didn't work. I atarted phpdig from my localhost.


site (spidering): http://maggiv8.funpic/Test/main.html


result:
Spidering in progress...

--------------------------------------------------------------------------------
SITE : http://maggiv8.funpic.de/
Exclude paths :
- @NONE@
No link in temporary table

--------------------------------------------------------------------------------

links found : 0
...Was recently indexed
Optimizing tables...
Indexing complete !
--------------------------------------------------------------------------------
[Back] to admin interface.


What can i try next?


Regards
Alex

Charter
02-18-2004, 12:54 AM
Hi. Did you configure the connect.php file that is online and try to crawl http://maggiv8.funpic.de/Test/main.html from online? The database variables in the online connect.php file need to match the online database.

DrKamikaze83
02-18-2004, 12:58 AM
i don't understand, what i should do know.

I have only loaded the 3 Test-files up. The other things, like database and phpdig, are on my localhost on my PC.

Regards
Alex

Charter
02-18-2004, 01:06 AM
Hi. Perhaps try editing your hosts file like in this (http://www.phpdig.net/showthread.php?threadid=514) thread or in this (http://www.phpdig.net/showthread.php?threadid=310) thread.

DrKamikaze83
02-18-2004, 01:28 AM
hii charter,

i looked the two at threads.

i think, this on is the problem. http://www.phpdig.net/showthread.php?threadid=514
I didn't understand, what oscure is mentioning.

Can you give me a exact description what i have to do.


Thanks
Alex

DrKamikaze83
02-19-2004, 03:53 AM
hi,

i have uploaded now all to this site http://maggiv8.funpic.de/
from that site i spidered www.ebay.com.

Results:

Warning: set_time_limit,getmyuid,getmypid,dl,leak() has been disabled for security reasons in /usr/export/www/vhosts/funnetwork/hosting/maggiv8/admin/spider.php on line 16


Spidering in progress...

Warning: set_time_limit,getmyuid,getmypid,dl,leak() has been disabled for security reasons in /usr/export/www/vhosts/funnetwork/hosting/maggiv8/admin/robot_functions.php on line 97

--------------------------------------------------------------------------------
SITE : http://www.ebay.com/
Exclude paths :
- help/confidence/
- help/policies/
- disney/

Warning: getmypid,dl,leak() has been disabled for security reasons in /usr/export/www/vhosts/funnetwork/hosting/maggiv8/admin/robot_functions.php on line 655
1:http://www.ebay.com/
(time : 00:00:08)
+ +
level 1...

Warning: getmypid,dl,leak() has been disabled for security reasons in /usr/export/www/vhosts/funnetwork/hosting/maggiv8/admin/robot_functions.php on line 655
2:http://www.ebay.com/mainc1.html?ssPageName=VisitorPage
(time : 00:00:20)
+

Warning: getmypid,dl,leak() has been disabled for security reasons in /usr/export/www/vhosts/funnetwork/hosting/maggiv8/admin/robot_functions.php on line 655
3:http://www.ebay.com/PayPal/
(time : 00:00:27)

level 2...

Warning: getmypid,dl,leak() has been disabled for security reasons in /usr/export/www/vhosts/funnetwork/hosting/maggiv8/admin/robot_functions.php on line 655
4:http://www.ebay.com/es/
(time : 00:00:38)

No link in temporary table

--------------------------------------------------------------------------------

links found : 4
http://www.ebay.com/
http://www.ebay.com/mainc1.html?ssPageName=VisitorPage
http://www.ebay.com/PayPal/
http://www.ebay.com/es/
Optimizing tables...
Indexing complete !




Now i need to get it work on my PC. Help me please.


Thanks
Alex

Charter
02-19-2004, 06:40 AM
Hi. The warnings from your online account are because your host has disabled certain functions. You can remove set_time_limit from line 16 of spider.php and from line 97 of robot_functions.php and remove the commented out line 655 in robot_functions.php.

As to crawling from your PC, perhaps try editing your Hosts file. Just do a search for the Hosts file and then add a line to the file with a text editor, something like the following:

127.0.0.1 localhost
put.the.ip.here maggiv8.funpic.de

DrKamikaze83
02-19-2004, 06:56 AM
are the host data

HOST-RESOURCES-(TYPES/MIB)

or are it the http_vhost files?


is it import where it have to be written in the files?


Thanks
Alex

Charter
02-19-2004, 07:05 AM
Hi. I've seen it as just Hosts, no extension, but I'm not sure with your OS/setup. The first entry should probably be the localhost one, but again it might depend on your OS/setup.

DrKamikaze83
02-19-2004, 07:13 AM
what do you mean with OS? Operating System? i have Win2000 and Apache Server 1.3.29 !

Charter
02-19-2004, 07:33 AM
Hi. Yes, OS = operating system. Perhaps try Google (http://www.google.com/search?q=hosts+file+windows+2000) for help with your OS/setup.

DrKamikaze83
02-20-2004, 04:08 AM
hi,

i have do this know, but it doesn't work.

Problem Report
There was a communication problem.

Message ID
TCP_ERROR

Problem Description
The system was unable to communicate with the server.

Possible Problem Cause

The Web server may be down.
The Web server may be too busy.
The Web server may be experiencing other problems, preventing it from responding to clients.
The communication path may be experiencing problems.


Possible Solution
Try connecting to this server later.







When i go back to the index-site there are listed know 20 hosts, but www.ebay.com is already locked.

I have the right IP. I don't know what the problem know is.

Charter
02-20-2004, 04:30 PM
Hi. Unfortunately I don't see the problem either. :(

Maybe someone who has this kind of OS/setup can provide additional help.

DrKamikaze83
02-22-2004, 11:11 PM
hi,

i don't know if that is a problem, but i want to try it.

First i have no robot.txt file. How should i do this?
Second i have no idea if the text-content and temp are set to chmod 777. How and where can i set this up?
Thirdly, can you spider www.ebay.com for me, to see the result?


Are thes possible reasons that it doesn
t work?

Greetings
Alex

Charter
02-23-2004, 01:45 PM
Hi. For question one, you don't need a robots.txt file to use PhpDig, but if you want a robots.txt file, a tutorial can be found here (http://www.searchengineworld.com/robots/robots_tutorial.htm). For question two, basically chmod is a *nix command. To check file permissions on Win machine perhaps try using the file manager. However, if you were having permission problems, then you would probably see a "Can't open directory: Permission denied" error. For question three, the output looks like that you posted, except there are no warning messages. For question four, I don't think so.

DrKamikaze83
02-24-2004, 06:34 AM
hi Charter,

i think i find the problem. I looked at my Phpinfo and i found that the thread safety mode was enabled instead of disabled.

Secondly the register_globales are on to.

How can i change this? I looked, but i found nothing usefull.

I have windows 2000.


Greetings
Alex

Charter
02-24-2004, 12:43 PM
Hi. Information about configuring PHP can be found here (http://www.php.net/configuration).

DrKamikaze83
02-27-2004, 04:33 AM
hi

i have found my problem. My settings, configuration and so on was everthing allright.

The problem is, i am working for a firm. In the Intranet i can spider all hosts. But when i spider a site on the internet, i can't get out of the local network. I think there is now way to change something in phpdig, that it is possible?

Greetings
Alex

Charter
02-28-2004, 04:03 PM
Hi. If the Intranet is preventing access to the Internet, then PhpDig shouldn't access the Internet.