PDA

View Full Version : Grep command for finding Saga zero length


Dave A
01-22-2006, 01:23 AM
Charter I am still having problems with Zero content descriptions for websites that have been indexed.
It appears the site Id and Spider id are out of sequence, would you know a way of bringing them back indo sequence?

I have tried to use the Grep Command to look for files with the name SAGA or that contain Saga.
Within the text_content directory but as yet nothing has shown itself.
When listing the files in the text_content directory a few have the file length of one.
Shall I deleted to files that have just one bite file length or is there an easier way to get the database running okay again.

Heaps of regards
Dave A

Charter
02-03-2006, 02:27 AM
Try running the following query:

$query = "
SELECT ".PHPDIG_DB_PREFIX."spider.spider_id AS spider_spider_id,
".PHPDIG_DB_PREFIX."spider.site_id AS spider_site_id,
".PHPDIG_DB_PREFIX."sites.site_id AS site_site_id
FROM ".PHPDIG_DB_PREFIX."spider
LEFT JOIN ".PHPDIG_DB_PREFIX."sites
ON (".PHPDIG_DB_PREFIX."sites.site_id = ".PHPDIG_DB_PREFIX."spider.site_id)
ORDER BY ".PHPDIG_DB_PREFIX."spider.spider_id ASC
";

The output should look like below:

+------------------+----------------+--------------+
| spider_spider_id | spider_site_id | site_site_id |
+------------------+----------------+--------------+
| 1 | 1 | 1 |
| 2 | 1 | 1 |
| 3 | 2 | 2 |
| 4 | 2 | 2 |
| 5 | 2 | 2 |
| 6 | 1 | 1 |
| 7 | 1 | 1 |
| 8 | 1 | 1 |
| 9 | 1 | 1 |
| 10 | 1 | 1 |
| 11 | 1 | 1 |
| 12 | 1 | 1 |
| 13 | 1 | 1 |
| 14 | 1 | 1 |
| 15 | 1 | 1 |
| 16 | 1 | 1 |
+------------------+----------------+--------------+
16 rows in set (0.00 sec)

The files in the text_content directory match with spider_id:

text_content> ls

1.txt 11.txt 13.txt 15.txt 2.txt 4.txt 6.txt 8.txt keepalive.txt
10.txt 12.txt 14.txt 16.txt 3.txt 5.txt 7.txt 9.txt

Dave A
02-06-2006, 02:00 PM
Thanks for that Charter,

You gave Brilliant support with that reply to my question.
It did take an age to process but then the database is quite large now.
The problem is now resolved.
I did upgrade the server with an extra Gig of RAM and the speed of the searches has increased by around 30% which is really good.

I now run a cron job on the server every 60mins which reports back to me the state of the file system on the server, often a few TMP files need removing and the speed comes back up.
The speed issue with Phpdig can be helped with an increase in the memory, with the RAID system using Linux it can produce large TMP files if the memory usage rises a heap.
Memory usage in my server was often around 90% or higher which made it write temp files to the hard disk and the increase of available RAM has assisted it's speed a heap.
Linux Strike needed to have it's kernel changed and updated to see the free memory increase but it is now flying.

Thanks for your great help and assistance.
All the best
Dave A