PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   Troubleshooting (http://www.phpdig.net/forum/forumdisplay.php?f=22)
-   -   Grep command for finding Saga zero length (http://www.phpdig.net/forum/showthread.php?t=2347)

Dave A 01-22-2006 01:23 AM

Grep command for finding Saga zero length
 
Charter I am still having problems with Zero content descriptions for websites that have been indexed.
It appears the site Id and Spider id are out of sequence, would you know a way of bringing them back indo sequence?

I have tried to use the Grep Command to look for files with the name SAGA or that contain Saga.
Within the text_content directory but as yet nothing has shown itself.
When listing the files in the text_content directory a few have the file length of one.
Shall I deleted to files that have just one bite file length or is there an easier way to get the database running okay again.

Heaps of regards
Dave A

Charter 02-03-2006 02:27 AM

Try running the following query:
Code:

$query = "
        SELECT ".PHPDIG_DB_PREFIX."spider.spider_id AS spider_spider_id,
        ".PHPDIG_DB_PREFIX."spider.site_id AS spider_site_id,
        ".PHPDIG_DB_PREFIX."sites.site_id AS site_site_id
        FROM ".PHPDIG_DB_PREFIX."spider
        LEFT JOIN ".PHPDIG_DB_PREFIX."sites
        ON (".PHPDIG_DB_PREFIX."sites.site_id = ".PHPDIG_DB_PREFIX."spider.site_id)
        ORDER BY ".PHPDIG_DB_PREFIX."spider.spider_id ASC
";

The output should look like below:
Code:

+------------------+----------------+--------------+
| spider_spider_id | spider_site_id | site_site_id |
+------------------+----------------+--------------+
|                1 |              1 |            1 |
|                2 |              1 |            1 |
|                3 |              2 |            2 |
|                4 |              2 |            2 |
|                5 |              2 |            2 |
|                6 |              1 |            1 |
|                7 |              1 |            1 |
|                8 |              1 |            1 |
|                9 |              1 |            1 |
|              10 |              1 |            1 |
|              11 |              1 |            1 |
|              12 |              1 |            1 |
|              13 |              1 |            1 |
|              14 |              1 |            1 |
|              15 |              1 |            1 |
|              16 |              1 |            1 |
+------------------+----------------+--------------+
16 rows in set (0.00 sec)

The files in the text_content directory match with spider_id:
Code:

text_content> ls

1.txt  11.txt  13.txt  15.txt  2.txt  4.txt  6.txt  8.txt  keepalive.txt
10.txt  12.txt  14.txt  16.txt  3.txt  5.txt  7.txt  9.txt


Dave A 02-06-2006 02:00 PM

Thanks Charter!
 
Thanks for that Charter,

You gave Brilliant support with that reply to my question.
It did take an age to process but then the database is quite large now.
The problem is now resolved.
I did upgrade the server with an extra Gig of RAM and the speed of the searches has increased by around 30% which is really good.

I now run a cron job on the server every 60mins which reports back to me the state of the file system on the server, often a few TMP files need removing and the speed comes back up.
The speed issue with Phpdig can be helped with an increase in the memory, with the RAID system using Linux it can produce large TMP files if the memory usage rises a heap.
Memory usage in my server was often around 90% or higher which made it write temp files to the hard disk and the increase of available RAM has assisted it's speed a heap.
Linux Strike needed to have it's kernel changed and updated to see the free memory increase but it is now flying.

Thanks for your great help and assistance.
All the best
Dave A


All times are GMT -8. The time now is 07:14 PM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.