Hi,
This is what I get from the server
[root@cs3839 home]# smartctl -a /dev/sda | head -20
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.49.1.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: XA1920ME10063
Serial Number: HKS022NC
LU WWN Device Id: 5 000c50 03ea270a6
Firmware Version: SF442147
User Capacity: 1,920,383,410,176 bytes [1.92 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: Solid State Device
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-4, ACS-2 T13/2015-D revision 3
SATA Version is: SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sat Aug 20 03:39:43 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
[root@cs3839 home]# smartctl -a /dev/sdb | head -20
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.49.1.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: XA1920ME10063
Serial Number: HKS0201P
LU WWN Device Id: 5 000c50 03ea2528d
Firmware Version: SF442147
User Capacity: 1,920,383,410,176 bytes [1.92 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: Solid State Device
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-4, ACS-2 T13/2015-D revision 3
SATA Version is: SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sat Aug 20 03:39:53 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
[root@cs3839 home]# cat /proc/mdstat
Personalities : [raid0] [raid1]
md0 : active raid1 sdb2[1] sda2[0]
1048512 blocks super 1.0 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk
md1 : active raid1 sdb3[1] sda3[0]
1865803776 blocks super 1.2 [2/2] [UU]
bitmap: 14/14 pages [56KB], 65536KB chunk
md2 : active raid0 sda5[0] sdb5[1]
16766976 blocks super 1.2 512k chunks
unused devices: <none>
Here is the IO test's result:
[root@cs3839 /]# fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.7
Starting 1 process
test: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [m(1)][100.0%][r=225MiB/s,w=74.1MiB/s][r=57.7k,w=18.0k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=27818: Sat Aug 20 03:59:12 2022
read: IOPS=55.8k, BW=218MiB/s (229MB/s)(3070MiB/14087msec)
bw ( KiB/s): min=46192, max=283408, per=100.00%, avg=223220.93, stdev=43123.84, samples=28
iops : min=11548, max=70854, avg=55805.21, stdev=10781.05, samples=28
write: IOPS=18.6k, BW=72.8MiB/s (76.4MB/s)(1026MiB/14087msec)
bw ( KiB/s): min=15896, max=94632, per=100.00%, avg=74599.25, stdev=14309.26, samples=28
iops : min= 3974, max=23658, avg=18649.79, stdev=3577.31, samples=28
cpu : usr=12.33%, sys=82.11%, ctx=6514, majf=0, minf=597
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=218MiB/s (229MB/s), 218MiB/s-218MiB/s (229MB/s-229MB/s), io=3070MiB (3219MB), run=14087-14087msec
WRITE: bw=72.8MiB/s (76.4MB/s), 72.8MiB/s-72.8MiB/s (76.4MB/s-76.4MB/s), io=1026MiB (1076MB), run=14087-14087msec
Disk stats (read/write):
md1: ios=785792/266033, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=397174/264090, aggrmerge=34/4934, aggrticks=86736/29942, aggrin_queue=116572, aggrutil=94.20%
sda: ios=416314/264089, merge=24/4935, ticks=88723/31225, in_queue=119823, util=94.20%
sdb: ios=378035/264091, merge=44/4933, ticks=84750/28659, in_queue=113321, util=93.47%
The disks itself are fine, performance is fine as well. High disk utilization is being produced by tons of php-fpm processes
[root@cs3839 /]# ps -waux | grep php-fpm | wc -l
7992
I don't know why PHP processes cause such high IO, maybe KVS could help to debug this behavior. Maybe they could check/optimize the file caching?