Jump to content

High IO disk utilization+ website unavailable at the moment


Rocco

Recommended Posts

Hello,

I have multiple KVS projects on a single server( 16cores/32threads, 256G RAM) with cumulated traffic around 50-60k daily.

I have a huge LA (35-40) but I got told by the guys from the server that this is due to the High IO disk utilization.

I increased the caching on some blocks as the guys from KVS suggested, but still getting these issue.

the server uses Nginx with php-fpm pool.

A change from MyISAM to InnoDB will allow faster executions or the issue is in fact on the Disk as there is slow read/write speed and creates a bottleneck?

Not a tech guy, so I do my best to explain the issue.

Cheers

Link to comment
Share on other sites

The caching won't help for high disk utilization. I don't know why our support recommended increasing cache. Also I don't think that changing from MyISAM to InnoDB would help with reducing disk load.

Please return back to your support ticket and escalate why you were recommended to increase cache for disk IO issue.

Link to comment
Share on other sites

Hello,

This was the email sent to the support

We have an issue with the server where we run multiple KVS projects

05:40:02 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
05:50:01 PM     all     26.90      0.00     22.46     33.12      0.00     17.52
06:00:01 PM     all     22.19      0.00     22.81     38.44      0.00     16.57
06:10:01 PM     all     20.73      0.00     22.65     37.88      0.00     18.74
Average:        all     22.93      0.00     19.02     25.01      0.00     33.04


This what I got from the guys from MojoHost

Here, the reason is Mysql and, I think, queries like this one:

| 1125696 | pornbrb_db      | localhost       | pornbrb_db      | Query   |    0 | Creating sort index | select SQL_CALC_FOUND_ROWS ktvs_videos.video_id from ktvs_videos inner join (select distinct video_id from ktvs_categories_videos where category_id in (148,466,990,228,525,358)) table1 on table1.video_id=ktvs_videos.video_id where ktvs_videos.status_id=1 and ktvs_videos.relative_post_date<=10000 and ktvs_videos.post_date<='2022-08-12 09:29:41'  and ktvs_videos.video_id<>34179 order by random1 limit 120            |    0.000 |

There are tons of such queries. Why they are here and why so many, it's a question to the app developers.

 

Link to comment
Share on other sites

Hi,

 

%iowait 20-30% very high...

Mysql is also slow because the disks are loaded. The only good solution for this is if you use ssd/nvme drives or raid 10 sata hdd arrays. If everything runs only from a sata hdd, it will slow down very quickly.

 

Link to comment
Share on other sites

3 hours ago, Tech Support said:

I see, so your email clearly indicates the issue is MySQL. So the recommendation was correct to increase caching.

But now you are saying this is due to the High IO disk utilization? Where is the truth then?

Hi,

 

That was said by the guys from MojoHost, but as you can see in the Email, there is a high IOwait, so I guessed that is obvious once I attached this stats as well. 

05:40:02 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
05:50:01 PM     all     26.90      0.00     22.46     33.12      0.00     17.52
06:00:01 PM     all     22.19      0.00     22.81     38.44      0.00     16.57
06:10:01 PM     all     20.73      0.00     22.65     37.88      0.00     18.74
Average:        all     22.93      0.00     19.02     25.01      0.00     33.04

 

Sorry for misunderstanding, 

Link to comment
Share on other sites

Sort indexes barely use any IO. it seems the problem is that either your storage solution just sucks. Or your you SQL config lacks memory to process the queries in memory. Either get SSD's in that thing. 

There is different Config needs with INNODB vs MyISAM they dont use the same cache settings.

I use a 6 core 12 thread server with 32GB of ram. and i can pull 400k daily users. 4-7 LA something is clearly screwed in your setup.

Edited by hbarnetworks
Link to comment
Share on other sites

Hi,

This is what I get from the server

[root@cs3839 home]# smartctl -a /dev/sda | head -20
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.49.1.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     XA1920ME10063
Serial Number:    HKS022NC
LU WWN Device Id: 5 000c50 03ea270a6
Firmware Version: SF442147
User Capacity:    1,920,383,410,176 bytes [1.92 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-4, ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Aug 20 03:39:43 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
[root@cs3839 home]# smartctl -a /dev/sdb | head -20
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1160.49.1.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     XA1920ME10063
Serial Number:    HKS0201P
LU WWN Device Id: 5 000c50 03ea2528d
Firmware Version: SF442147
User Capacity:    1,920,383,410,176 bytes [1.92 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-4, ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Aug 20 03:39:53 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
[root@cs3839 home]# cat /proc/mdstat
Personalities : [raid0] [raid1]
md0 : active raid1 sdb2[1] sda2[0]
      1048512 blocks super 1.0 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md1 : active raid1 sdb3[1] sda3[0]
      1865803776 blocks super 1.2 [2/2] [UU]
      bitmap: 14/14 pages [56KB], 65536KB chunk

md2 : active raid0 sda5[0] sdb5[1]
      16766976 blocks super 1.2 512k chunks

unused devices: <none>

Here is the IO test's result:

[root@cs3839 /]# fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.7
Starting 1 process
test: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [m(1)][100.0%][r=225MiB/s,w=74.1MiB/s][r=57.7k,w=18.0k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=27818: Sat Aug 20 03:59:12 2022
   read: IOPS=55.8k, BW=218MiB/s (229MB/s)(3070MiB/14087msec)
   bw (  KiB/s): min=46192, max=283408, per=100.00%, avg=223220.93, stdev=43123.84, samples=28
   iops        : min=11548, max=70854, avg=55805.21, stdev=10781.05, samples=28
  write: IOPS=18.6k, BW=72.8MiB/s (76.4MB/s)(1026MiB/14087msec)
   bw (  KiB/s): min=15896, max=94632, per=100.00%, avg=74599.25, stdev=14309.26, samples=28
   iops        : min= 3974, max=23658, avg=18649.79, stdev=3577.31, samples=28
  cpu          : usr=12.33%, sys=82.11%, ctx=6514, majf=0, minf=597
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=218MiB/s (229MB/s), 218MiB/s-218MiB/s (229MB/s-229MB/s), io=3070MiB (3219MB), run=14087-14087msec
  WRITE: bw=72.8MiB/s (76.4MB/s), 72.8MiB/s-72.8MiB/s (76.4MB/s-76.4MB/s), io=1026MiB (1076MB), run=14087-14087msec

Disk stats (read/write):
    md1: ios=785792/266033, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=397174/264090, aggrmerge=34/4934, aggrticks=86736/29942, aggrin_queue=116572, aggrutil=94.20%
  sda: ios=416314/264089, merge=24/4935, ticks=88723/31225, in_queue=119823, util=94.20%
  sdb: ios=378035/264091, merge=44/4933, ticks=84750/28659, in_queue=113321, util=93.47%

The disks itself are fine, performance is fine as well. High disk utilization is being produced by tons of php-fpm processes

[root@cs3839 /]# ps -waux | grep php-fpm | wc -l
7992

I don't know why PHP processes cause such high IO, maybe KVS could help to debug this behavior. Maybe they could check/optimize the file caching?

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...