SLES 11 SP1 x864
My Customer is using a NAS device for backup.
192.168.0.y:/mnt/HD_a2 on /external_disk type nfs (rw,addr=192.168.0.y)
also the the directory they backed up is on a local file system “/backup”
/dev/cciss/c0d0p8 on /backup type ext3 (rw,acl,user_xattr)
this /backup file system contains some very large file like 129 GB.
Problem is that when we try to copy a very large file “/backup/19sep/large-file” which is about 129 GB in size to the NAS we found that
1 - blocks in(bi) and blocks out(bo) remains very low
[HTML]
procs -----------memory---------- —swap-- -----io---- -system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 1 0 11230940 91200 19138368 0 0 102 7 18 156 0 0 98 2 0
0 1 0 11231312 91200 19139952 0 0 0 0 1116 1321 0 0 96 4 0
0 1 0 11231296 91200 19140744 0 0 0 0 350 475 0 0 96 4 0
2 1 0 11231296 91200 19141592 0 0 0 0 767 876 0 0 95 5 0
0 1 0 11231544 91208 19143008 0 0 0 12 801 961 0 0 97 3 0
0 1 0 11231544 91208 19144184 0 0 0 268 727 990 0 0 95 5 0
1 1 0 11232108 91208 19145680 0 0 0 16 1020 1498 0 0 96 3 0
0 1 0 11231744 91208 19148480 0 0 0 0 1125 1311 1 0 93 5 0
0 1 0 11231496 91208 19150044 0 0 0 0 863 1034 0 0 96 4 0
0 1 0 11231744 91216 19152192 0 0 32 0 1318 1965 0 0 94 5 0
1 1 0 11231480 91216 19154592 0 0 0 944 1385 1563 0 0 96 4 0
0 1 0 11231736 91216 19155932 0 0 0 0 1050 1091 0 0 96 4 0
0 1 0 11231116 91308 19157936 0 0 516 136 1176 2495 0 0 93 6 0
2 1 0 11231116 91308 19158368 0 0 0 0 217 343 0 0 95 5 0
0 1 0 11227396 91308 19164352 0 0 0 0 1504 1717 1 0 95 4 0
0 2 0 11227396 91324 19166280 0 0 80 372 1373 1706 0 0 95 5 0
0 1 0 11227520 91324 19167488 0 0 8 0 885 1121 0 0 96 4 0
1 1 0 11227528 91324 19169596 0 0 0 0 1254 1473 0 0 95 4 0
0 1 0 11226784 91324 19173440 0 0 0 0 1479 2022 1 0 96 4 0
0 1 0 11225372 91324 19176268 0 0 0 0 1265 1545 0 0 94 6 0
1 1 0 11225372 91336 19178540 0 0 12 468 1067 1220 0 0 96 4 0
[/HTML]
2 - then after few hours like 2 hours we found that
[HTML]
procs -----------memory---------- —swap-- -----io---- -system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 4 0 11230940 91200 19138368 0 0 108 12 40 131 0 0 95 5 0
1 3 0 11231312 91200 19139952 0 0 576 288 1571 2068 0 0 85 14 0
1 3 0 11231296 91200 19140744 0 0 216 344 1585 3430 1 0 86 13 0
[/HTML]
i.e block jobs(b), in, cs, and wa are all high, while bi and bo remains low. Also the copy operation becomes uninterruptable (D+)
“ps aux” shows
and we have to reboot the server to recover.
My questions
1 - why it start copying very slow(‘bi’ and ‘bo’ values of vmstat, remains too low when we copy the 129 Gig file from local disk to the NAS)
2 - then why the copy operation becomes hang/free or D+
and where is the problem… is there something wrong with the NAS device or with our local disk/file system(/backup)
please help