User Submitted URLs Mod Version 1
This mod allows users to submit URLs to be indexed. URLs are partially verified upon submission. The URLs are stored in a new table titled 'newsites' for approval. The admin can review all submissions and choose which to delete and which to index. URLs selected for indexing are placed in a text file for the spider to index either from a shell session or via a cron job.
Tested on Linux only but should work on a Windows server as well. If you have any problems, suggestions, and/or questions regarding this mod, please post them in this thread. All feedback welcome. |
1 Attachment(s)
File wasn't attached for some reason. Hopefully it will work this time.
|
Hmm, looks good, but i receive the following Error when i want to add an Url.:
Unknown column 'date_added' in 'field list' ?? Edit: And i get the same Error when i want to access the newurls.php! |
User Submitted URLs Mod Version 1.1
1 Attachment(s)
Sorry about that. I added a field to the table and forgot to update DDL. Here is an updated version.
|
No Prob ;)
Ill check it and let you know - Thanks very much! Frank |
Has anybody managed to get this working. Ive installed it but cant get anything to go into the newsites table.
|
Quote:
Dumb question.. :) did u submit a site using the addurl page? Would you be a bit more specific with the problem and perhaps we can solve it. snorkpants. |
I did use the addurl page. But the new sites table in the database always stays empty. Any Ideas?
|
The mod uses the same connection file. Are you able to add records to the table manually? If so, it could be a matter of how php is set up on your server.
|
This doesn't work if you have anything other than the default table prefix. My tables are phpdig prefixed so when adding a url I get the error that dbname.sites doesn't exist.. well of course not because it's phpdig_sites.
Anyway.. doesn't work because it doesn't use the prefix. |
Great, but needs some changes:
1- replace the short tag on first line by standard tag: replace <? with <?php 2- replace $HTTP_SERVER_VARS with $_SERVER 3- replace $HTTP_POST_VARS with $_POST 4- replace $HTTP_GET_VARS with $_GET $HTTP_*_VARS are not globals by default, that's also why some of you have problems with it. Since php 4.1.0 $_xxx should be used instead of $HTTP_*_VARS. Also, in the lasts php register_globals is set "off" by default for security reason. That's why $HTTP_*_VARS return empty values. |
Does this work on 1.8.3?
|
how to get the email working?
i will need to use my isp's smtp server and i cant figure it out in the php file
here are the problems it gives me: Warning: Failed to connect to mailserver, verify your "SMTP" setting in php.ini in c:\appserv\www\search\urlfiles\addurl.php on line 14 Warning: Cannot add header information - headers already sent by (output started at c:\appserv\www\search\urlfiles\addurl.php:14) in c:\appserv\www\search\urlfiles\addurl.php on line 15 |
Problem with newurls.txt
Due to many problems trying to index urls from a text file I made this little script from JWSmythe's build.searchimages.pl.
This script will pull the url from the database and index it, delete it from the db, and then optimize tables. #!/usr/bin/perl use DBI; use MIME::Base64; $db = DBI->connect("DBI:mysql:database:localhost", username, 'password') || die "$!"; $source_query = "SELECT new_site_url FROM newsites "; $source = $db->prepare("$source_query") || die "$!, error on source prepare\n"; $source->execute || print "Error on source execute\n"; while (@curarray = $source->fetchrow_array){ $req_url = $curarray[0]; $req_url =~ s/\;//g; $outfile = $req_url; chop ($outfile); $outfile =~ s/\n//g; $outfile = "$outfile"; print "Indexing $req_url -> .....\n"; $sysstring = "php -f /path/to/admin/spider.php $req_url"; system(`$sysstring`); print "Finished Indexing $req_url -> ...Complete\n"; $db->do("DELETE FROM newsites WHERE new_site_url = '$curarray[0]'"); $db->do("OPTIMIZE TABLE newsites"); $db->do("OPTIMIZE TABLE tempspider"); }; Then I can run this script from shell or cron with this command with no problem. perl /path/to/cgi-bin/newurls.pl Note: If running cron be sure to allow enough time to index new urls before starting new cron. So don't set your cron up for evey minute. |
Would somebody please help me?
I am strugeling to get the script to spider the sites i have added automatically. Could someone provide me with an idiots guide ? Thanks |
All times are GMT -8. The time now is 03:58 PM. |
Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.