PDA

View Full Version : User Submitted URLs Mod Version 1


jerrywin5
03-30-2004, 09:28 AM
This mod allows users to submit URLs to be indexed. URLs are partially verified upon submission. The URLs are stored in a new table titled 'newsites' for approval. The admin can review all submissions and choose which to delete and which to index. URLs selected for indexing are placed in a text file for the spider to index either from a shell session or via a cron job.

Tested on Linux only but should work on a Windows server as well.

If you have any problems, suggestions, and/or questions regarding this mod, please post them in this thread. All feedback welcome.

jerrywin5
03-30-2004, 09:33 AM
File wasn't attached for some reason. Hopefully it will work this time.

firestarter
03-31-2004, 05:48 AM
Hmm, looks good, but i receive the following Error when i want to add an Url.:

Unknown column 'date_added' in 'field list'

??

Edit: And i get the same Error when i want to access the newurls.php!

jerrywin5
03-31-2004, 07:50 AM
Sorry about that. I added a field to the table and forgot to update DDL. Here is an updated version.

firestarter
03-31-2004, 08:14 AM
No Prob ;)

Ill check it and let you know - Thanks very much!

Frank

davey147
04-26-2004, 12:09 PM
Has anybody managed to get this working. Ive installed it but cant get anything to go into the newsites table.

snorkpants
04-26-2004, 03:22 PM
Originally posted by davey147
Has anybody managed to get this working. Ive installed it but cant get anything to go into the newsites table.

I got it working straight away...

Dumb question.. :) did u submit a site using the addurl page? Would you be a bit more specific with the problem and perhaps we can solve it.


snorkpants.

davey147
04-27-2004, 12:41 AM
I did use the addurl page. But the new sites table in the database always stays empty. Any Ideas?

jerrywin5
04-30-2004, 08:31 AM
The mod uses the same connection file. Are you able to add records to the table manually? If so, it could be a matter of how php is set up on your server.

ChadK
05-12-2004, 02:11 PM
This doesn't work if you have anything other than the default table prefix. My tables are phpdig prefixed so when adding a url I get the error that dbname.sites doesn't exist.. well of course not because it's phpdig_sites.
Anyway.. doesn't work because it doesn't use the prefix.

Pulsar-san
05-13-2004, 01:18 AM
Great, but needs some changes:

1- replace the short tag on first line by standard tag:
replace <? with <?php

2- replace $HTTP_SERVER_VARS with $_SERVER
3- replace $HTTP_POST_VARS with $_POST
4- replace $HTTP_GET_VARS with $_GET

$HTTP_*_VARS are not globals by default, that's also why some of you have problems with it.
Since php 4.1.0 $_xxx should be used instead of $HTTP_*_VARS.
Also, in the lasts php register_globals is set "off" by default for security reason. That's why $HTTP_*_VARS return empty values.

ChadK
08-19-2004, 12:06 PM
Does this work on 1.8.3?

chrisoverly
08-23-2004, 02:04 PM
i will need to use my isp's smtp server and i cant figure it out in the php file

here are the problems it gives me:

Warning: Failed to connect to mailserver, verify your "SMTP" setting in php.ini in c:\appserv\www\search\urlfiles\addurl.php on line 14

Warning: Cannot add header information - headers already sent by (output started at c:\appserv\www\search\urlfiles\addurl.php:14) in c:\appserv\www\search\urlfiles\addurl.php on line 15

rispbiz
08-27-2004, 01:34 PM
Due to many problems trying to index urls from a text file I made this little script from JWSmythe's build.searchimages.pl.

This script will pull the url from the database and index it, delete it from the db, and then optimize tables.

#!/usr/bin/perl

use DBI;
use MIME::Base64;


$db = DBI->connect("DBI:mysql:database:localhost", username, 'password') || die "$!";

$source_query = "SELECT new_site_url FROM newsites ";

$source = $db->prepare("$source_query") || die "$!, error on source prepare\n";
$source->execute || print "Error on source execute\n";

while (@curarray = $source->fetchrow_array){
$req_url = $curarray[0];
$req_url =~ s/\;//g;
$outfile = $req_url;
chop ($outfile);
$outfile =~ s/\n//g;
$outfile = "$outfile";
print "Indexing $req_url -> .....\n";
$sysstring = "php -f /path/to/admin/spider.php $req_url";
system(`$sysstring`);
print "Finished Indexing $req_url -> ...Complete\n";
$db->do("DELETE FROM newsites WHERE new_site_url = '$curarray[0]'");
$db->do("OPTIMIZE TABLE newsites");
$db->do("OPTIMIZE TABLE tempspider");
};



Then I can run this script from shell or cron with this command with no problem.

perl /path/to/cgi-bin/newurls.pl

Note: If running cron be sure to allow enough time to index new urls before starting new cron. So don't set your cron up for evey minute.

davey147
08-29-2004, 03:56 PM
Would somebody please help me?

I am strugeling to get the script to spider the sites i have added automatically. Could someone provide me with an idiots guide ?

Thanks

davey147
08-29-2004, 04:19 PM
I got the above working, well kind of,

I set the cron job up, and set it to send me the results, which come back to me and the say successfull.

But when i look to see if they have actually been added, they havent, so i have no extra sites in my search engine,

Any Ideas

rispbiz
08-29-2004, 06:58 PM
What happens if you try to run your cron in a shell.

The best way to make sure any cron is going to work is to first try the command in a shell. If that works then use the command in a cron

I have been running the above code in a shell for about 3 days non-stop with no problem. I am still unsure of why I still can not do the same thing from a txt file but this seems to work well. There is no doubt that every server has there own personality its all a matter of finding out what works best on your server. Which sometimes can be nerve wrecking.

Thank You
2-surf.net

davey147
08-30-2004, 03:13 AM
Hi,

Unfortionatly i dont have shell access i dont think, i dont really know what it is,

But the cron is working as i get the report email saying that is has done what it should be doing.

But nevermind thanks anyway

cr0bar
08-31-2004, 09:15 AM
Hey.. i'm having problems with both of these scripts. At the moment i'm trying to get the .pl one to work
it executes and says its completed with no error.. but it comes up instantly.. does this mean its not working. Also.. the url's aren't been added to the site.

cr0bar
08-31-2004, 09:15 AM
ps.. i have shell axx

rispbiz
08-31-2004, 10:35 AM
The pl above will not show errors if there is errors like doucment already indexed or reasons why phpdig does not completely index sites according to your config file they will not show with this pl.

Its shows finished indexing when the spider.php is finished with the url reguardless of what the spider did.

On thing you can do to test the script is run this commmand php -f /path/to/admin/spider.php http://www.somedomain.com

If the results are good then the script should be working fine.

Just for referance I am currently indexing 1 level and 1 link. and it takes average of 30 seconds to complete one site.

Thanks
2-surf.net

cr0bar
08-31-2004, 10:53 AM
ok.. it takes like .5 of a second. lol

something is up.. but i don't have a clue what. i've added sites to the db.. but when it runs.. it just deletes them and doesn't add to the site =/

rispbiz
08-31-2004, 11:04 AM
Yes,

There is a problem indexing if it is doing it in that small amount of time.

Sounds like the pl is working if it is deleting the urls.

What happens when you index from shell command

php -f /path/to/admin/spider.php http://www.somedomain.com

Thanks
2-Surf.net

cr0bar
08-31-2004, 11:16 AM
aphrodite# php -f /usr/nfs/domains/f/****MASKED****/html/admin/spider.php http://www.url.com
aphrodite#

straight away

cr0bar
08-31-2004, 11:25 AM
my provider installed php on that box for me.. just basic pack. cos this shell server is nfs'd to another box for web.

So.. that line is executed on one machine.. but the website runs off another.

Do i need any extentions added to php on that machine?

rispbiz
08-31-2004, 01:41 PM
That response does not look correct.

when you index in the shell it should look like the same results as if you were indexing it through the web browser.


Thank you,
2-surf.net