PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 01-22-2006, 01:23 AM   #1
Dave A
Purple Mole
 
Dave A's Avatar
 
Join Date: Aug 2004
Location: North Island New Zealand
Posts: 170
Grep command for finding Saga zero length

Charter I am still having problems with Zero content descriptions for websites that have been indexed.
It appears the site Id and Spider id are out of sequence, would you know a way of bringing them back indo sequence?

I have tried to use the Grep Command to look for files with the name SAGA or that contain Saga.
Within the text_content directory but as yet nothing has shown itself.
When listing the files in the text_content directory a few have the file length of one.
Shall I deleted to files that have just one bite file length or is there an easier way to get the database running okay again.

Heaps of regards
Dave A
Dave A is offline   Reply With Quote
Old 02-03-2006, 02:27 AM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Try running the following query:
Code:
$query = "
	SELECT ".PHPDIG_DB_PREFIX."spider.spider_id AS spider_spider_id,
	".PHPDIG_DB_PREFIX."spider.site_id AS spider_site_id,
	".PHPDIG_DB_PREFIX."sites.site_id AS site_site_id
	FROM ".PHPDIG_DB_PREFIX."spider
	LEFT JOIN ".PHPDIG_DB_PREFIX."sites
	ON (".PHPDIG_DB_PREFIX."sites.site_id = ".PHPDIG_DB_PREFIX."spider.site_id)
	ORDER BY ".PHPDIG_DB_PREFIX."spider.spider_id ASC
";
The output should look like below:
Code:
+------------------+----------------+--------------+
| spider_spider_id | spider_site_id | site_site_id |
+------------------+----------------+--------------+
|                1 |              1 |            1 |
|                2 |              1 |            1 |
|                3 |              2 |            2 |
|                4 |              2 |            2 |
|                5 |              2 |            2 |
|                6 |              1 |            1 |
|                7 |              1 |            1 |
|                8 |              1 |            1 |
|                9 |              1 |            1 |
|               10 |              1 |            1 |
|               11 |              1 |            1 |
|               12 |              1 |            1 |
|               13 |              1 |            1 |
|               14 |              1 |            1 |
|               15 |              1 |            1 |
|               16 |              1 |            1 |
+------------------+----------------+--------------+
16 rows in set (0.00 sec)
The files in the text_content directory match with spider_id:
Code:
text_content> ls

1.txt   11.txt  13.txt  15.txt  2.txt  4.txt  6.txt  8.txt  keepalive.txt
10.txt  12.txt  14.txt  16.txt  3.txt  5.txt  7.txt  9.txt
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 02-06-2006, 02:00 PM   #3
Dave A
Purple Mole
 
Dave A's Avatar
 
Join Date: Aug 2004
Location: North Island New Zealand
Posts: 170
Thanks Charter!

Thanks for that Charter,

You gave Brilliant support with that reply to my question.
It did take an age to process but then the database is quite large now.
The problem is now resolved.
I did upgrade the server with an extra Gig of RAM and the speed of the searches has increased by around 30% which is really good.

I now run a cron job on the server every 60mins which reports back to me the state of the file system on the server, often a few TMP files need removing and the speed comes back up.
The speed issue with Phpdig can be helped with an increase in the memory, with the RAID system using Linux it can produce large TMP files if the memory usage rises a heap.
Memory usage in my server was often around 90% or higher which made it write temp files to the hard disk and the increase of available RAM has assisted it's speed a heap.
Linux Strike needed to have it's kernel changed and updated to see the free memory increase but it is now flying.

Thanks for your great help and assistance.
All the best
Dave A
Dave A is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
zero length files! Dave A Troubleshooting 0 01-10-2006 01:09 AM
Num_page zero length Dave A Troubleshooting 3 11-18-2005 03:37 AM
Using phpdig for finding copyright infringements leto How-to Forum 0 09-22-2005 12:31 AM
Spider not finding anything. nvahalik Troubleshooting 2 01-25-2005 01:41 PM
finding dead links manute How-to Forum 2 01-14-2004 03:00 PM


All times are GMT -8. The time now is 05:42 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.