High IO disk utilization+ website unavailable at the moment

Rocco · August 18, 2022

Hello,

I have multiple KVS projects on a single server( 16cores/32threads, 256G RAM) with cumulated traffic around 50-60k daily.

I have a huge LA (35-40) but I got told by the guys from the server that this is due to the High IO disk utilization.

I increased the caching on some blocks as the guys from KVS suggested, but still getting these issue.

the server uses Nginx with php-fpm pool.

A change from MyISAM to InnoDB will allow faster executions or the issue is in fact on the Disk as there is slow read/write speed and creates a bottleneck?

Not a tech guy, so I do my best to explain the issue.

Cheers

Tech Support · August 18, 2022

The caching won't help for high disk utilization. I don't know why our support recommended increasing cache. Also I don't think that changing from MyISAM to InnoDB would help with reducing disk load.

Please return back to your support ticket and escalate why you were recommended to increase cache for disk IO issue.

Rocco · August 19, 2022

Hello,

This was the email sent to the support

We have an issue with the server where we run multiple KVS projects

05:40:02 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
05:50:01 PM     all     26.90      0.00     22.46     33.12      0.00     17.52
06:00:01 PM     all     22.19      0.00     22.81     38.44      0.00     16.57
06:10:01 PM     all     20.73      0.00     22.65     37.88      0.00     18.74
Average:        all     22.93      0.00     19.02     25.01      0.00     33.04

This what I got from the guys from MojoHost

Here, the reason is Mysql and, I think, queries like this one:

| 1125696 | pornbrb_db      | localhost       | pornbrb_db      | Query   |    0 | Creating sort index | select SQL_CALC_FOUND_ROWS ktvs_videos.video_id from ktvs_videos inner join (select distinct video_id from ktvs_categories_videos where category_id in (148,466,990,228,525,358)) table1 on table1.video_id=ktvs_videos.video_id where ktvs_videos.status_id=1 and ktvs_videos.relative_post_date<=10000 and ktvs_videos.post_date<='2022-08-12 09:29:41' and ktvs_videos.video_id<>34179 order by random1 limit 120            |    0.000 |

There are tons of such queries. Why they are here and why so many, it's a question to the app developers.

Tech Support · August 19, 2022

I see, so your email clearly indicates the issue is MySQL. So the recommendation was correct to increase caching.

But now you are saying this is due to the High IO disk utilization? Where is the truth then?

elpadre · August 19, 2022

Hi,

%iowait 20-30% very high...

Mysql is also slow because the disks are loaded. The only good solution for this is if you use ssd/nvme drives or raid 10 sata hdd arrays. If everything runs only from a sata hdd, it will slow down very quickly.

Rocco · August 19, 2022

3 hours ago, Tech Support said:

I see, so your email clearly indicates the issue is MySQL. So the recommendation was correct to increase caching.

But now you are saying this is due to the High IO disk utilization? Where is the truth then?

Hi,

That was said by the guys from MojoHost, but as you can see in the Email, there is a high IOwait, so I guessed that is obvious once I attached this stats as well.

05:40:02 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
05:50:01 PM     all     26.90      0.00     22.46     33.12      0.00     17.52
06:00:01 PM     all     22.19      0.00     22.81     38.44      0.00     16.57
06:10:01 PM     all     20.73      0.00     22.65     37.88      0.00     18.74
Average:        all     22.93      0.00     19.02     25.01      0.00     33.04

Sorry for misunderstanding,

hbarnetworks · August 19, 2022

Sort indexes barely use any IO. it seems the problem is that either your storage solution just sucks. Or your you SQL config lacks memory to process the queries in memory. Either get SSD's in that thing.

There is different Config needs with INNODB vs MyISAM they dont use the same cache settings.

I use a 6 core 12 thread server with 32GB of ram. and i can pull 400k daily users. 4-7 LA something is clearly screwed in your setup.

Edited August 19, 2022 by hbarnetworks

Rocco · August 20, 2022

Hi,

This is what I get from the server

[root@cs3839 home]# smartctl -a /dev/sda | head -20
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.49.1.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     XA1920ME10063
Serial Number:    HKS022NC
LU WWN Device Id: 5 000c50 03ea270a6
Firmware Version: SF442147
User Capacity:    1,920,383,410,176 bytes [1.92 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-4, ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Aug 20 03:39:43 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
[root@cs3839 home]# smartctl -a /dev/sdb | head -20
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.49.1.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     XA1920ME10063
Serial Number:    HKS0201P
LU WWN Device Id: 5 000c50 03ea2528d
Firmware Version: SF442147
User Capacity:    1,920,383,410,176 bytes [1.92 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-4, ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Aug 20 03:39:53 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
[root@cs3839 home]# cat /proc/mdstat
Personalities : [raid0] [raid1]
md0 : active raid1 sdb2[1] sda2[0]
      1048512 blocks super 1.0 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md1 : active raid1 sdb3[1] sda3[0]
      1865803776 blocks super 1.2 [2/2] [UU]
      bitmap: 14/14 pages [56KB], 65536KB chunk

md2 : active raid0 sda5[0] sdb5[1]
      16766976 blocks super 1.2 512k chunks

unused devices: <none>

Here is the IO test's result:

[root@cs3839 /]# fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.7
Starting 1 process
test: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [m(1)][100.0%][r=225MiB/s,w=74.1MiB/s][r=57.7k,w=18.0k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=27818: Sat Aug 20 03:59:12 2022
   read: IOPS=55.8k, BW=218MiB/s (229MB/s)(3070MiB/14087msec)
   bw (  KiB/s): min=46192, max=283408, per=100.00%, avg=223220.93, stdev=43123.84, samples=28
   iops        : min=11548, max=70854, avg=55805.21, stdev=10781.05, samples=28
  write: IOPS=18.6k, BW=72.8MiB/s (76.4MB/s)(1026MiB/14087msec)
   bw (  KiB/s): min=15896, max=94632, per=100.00%, avg=74599.25, stdev=14309.26, samples=28
   iops        : min= 3974, max=23658, avg=18649.79, stdev=3577.31, samples=28
  cpu          : usr=12.33%, sys=82.11%, ctx=6514, majf=0, minf=597
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=218MiB/s (229MB/s), 218MiB/s-218MiB/s (229MB/s-229MB/s), io=3070MiB (3219MB), run=14087-14087msec
  WRITE: bw=72.8MiB/s (76.4MB/s), 72.8MiB/s-72.8MiB/s (76.4MB/s-76.4MB/s), io=1026MiB (1076MB), run=14087-14087msec

Disk stats (read/write):
    md1: ios=785792/266033, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=397174/264090, aggrmerge=34/4934, aggrticks=86736/29942, aggrin_queue=116572, aggrutil=94.20%
  sda: ios=416314/264089, merge=24/4935, ticks=88723/31225, in_queue=119823, util=94.20%
  sdb: ios=378035/264091, merge=44/4933, ticks=84750/28659, in_queue=113321, util=93.47%

The disks itself are fine, performance is fine as well. High disk utilization is being produced by tons of php-fpm processes

[root@cs3839 /]# ps -waux | grep php-fpm | wc -l
7992

I don't know why PHP processes cause such high IO, maybe KVS could help to debug this behavior. Maybe they could check/optimize the file caching?

Tech Support · August 22, 2022

Please create support ticket.

Sign In

High IO disk utilization+ website unavailable at the moment

Recommended Posts

Rocco

Link to comment

Share on other sites

Tech Support

Link to comment

Share on other sites

Rocco

Link to comment

Share on other sites

Tech Support

Link to comment

Share on other sites

elpadre

Link to comment

Share on other sites

Rocco

Link to comment

Share on other sites

hbarnetworks

Link to comment

Share on other sites

Rocco

Link to comment

Share on other sites

Tech Support

Link to comment

Share on other sites

Join the conversation

Forum

Activity

Support