Critical performance drop on newly created large file
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-14-2016 01:40 PM
- NVMe drive model: Intel SSD DC P3700 U.2 NVMe SSD
- Capacity: 764G
- FS: XFS
- Other HW:
- AIC SB122A-PH
- 8 Intel NVMe DC P3700 2 on CPU 0, 6 on CPU 1
- 128 GiB RAM (8 x 16 DDR4 2400Mhz DIMMs)
- 2 x Intel E5-2620v3 2.4Ghz CPUs
- 2 x Intel DC S2510 SATA SSDs (one is used a system drive).
- Note that both are engineering samples provided by Intel NSG. But all have had the latest firmware updated using isdct 3.0.0.
- OS: CentOS Linux release 7.2.1511 (Core)
- Kernel: Linux fs00 3.10.0-327.22.2.el7.x86_64 # 1 SMP Thu Jun 23 17:05:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
We have been testing two Intel DC P3700 U.2 800GB NVMe SSDs to see the impact of the emulated sector size to throughput (512 vs 4096). Using fio 2.12, we observed a puzzling collapse of performance. The steps are given below.
Steps:
1. Copy or write sequentially single large file (300G or larger)
2. Start fio test with the following config:
[readtest]
thread=1
blocksize=2m
filename=/export/beegfs/data0/file_000000
rw=randread
direct=1
buffered=0
ioengine=libaio
nrfiles=1
gtod_reduce=0
numjobs=32
iodepth=128
runtime=360
group_reporting=1
percentage_random=90
3. Observe extremely slow performance:
fio-2.12
Starting 32 threads
readtest: (groupid=0, jobs=32): err= 0: pid=5097: Thu Jul 14 13:00:25 2016
read : io=65536KB, bw=137028B/s, iops=0, runt=489743msec
slat (usec): min=4079, max=7668, avg=5279.19, stdev=662.80
clat (msec): min=3, max=25, avg=18.97, stdev= 6.16
lat (msec): min=8, max=31, avg=24.25, stdev= 6.24
clat percentiles (usec):
| 1.00th=[ 3280], 5.00th=[ 4320], 10.00th=[ 9664], 20.00th=[17536],
| 30.00th=[18816], 40.00th=[20352], 50.00th=[20608], 60.00th=[21632],
| 70.00th=[21632], 80.00th=[22912], 90.00th=[25472], 95.00th=[25472],
| 99.00th=[25472], 99.50th=[25472], 99.90th=[25472], 99.95th=[25472],
| 99.99th=[25472]
lat (msec) : 4=3.12%, 10=9.38%, 20=25.00%, 50=62.50%
cpu : usr=0.00%, sys=74.84%, ctx=792583, majf=0, minf=16427
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=32/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=128
Run status group 0 (all jobs):
READ: io=65536KB, aggrb=133KB/s, minb=133KB/s, maxb=133KB/s, mint=489743msec, maxt=489743msec
Disk stats (read/write):
nvme0n1: ios=0/64317, merge=0/0, ticks=0/1777871, in_queue=925406, util=0.19%
4. Repeat the test
5. Performance is much higher:
fio-2.12
Starting 32 threads
readtest: (groupid=0, jobs=32): err= 0: pid=5224: Thu Jul 14 13:11:58 2016
read : io=861484MB, bw=2389.3MB/s, iops=1194, runt=360564msec
slat (usec): min=111, max=203593, avg=26742.15, stdev=21321.98
clat (msec): min=414, max=5176, avg=3391.05, stdev=522.29
lat (msec): min=414, max=5247, avg=3417.79, stdev=524.75
clat percentiles (msec):
| 1.00th=[ 1614], 5.00th=[ 2376], 10.00th=[ 2802], 20.00th=[ 3097],
| 30.00th=[ 3228], 40.00th=[ 3359], 50.00th=[ 3458], 60.00th=[ 3556],
| 70.00th=[ 3654], 80.00th=[ 3785], 90.00th=[ 3949], 95.00th=[ 4080],
| 99.00th=[ 4359], 99.50th=[ 4424], 99.90th=[ 4752], 99.95th=[ 4883],
<e...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-15-2016 08:11 AM
Since we deal with similar situation, I tried the above steps and confirmed on our machine this issue. In fact, I also tried it with both XFS and EXT4. The symptom showed up regardless.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-15-2016 08:54 AM
AlexNZ,
Thanks for bringing this situation to our attention, we would like to verify this and provide a solution as fast as possible. Please allow us some time to check on this and we will keep you all posted.NC- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-20-2016 06:37 AM
Hello,
After reviewing the settings, we would like to verify the following:For the read test, could you please try: fio –output=test_result.txt –name=myjob –filename=/dev/nvme0n1 –ioengine=libaio –direct=1 –norandommap –randrepeat=0 –runtime=600 –blocksize=4K –rw=randread –iodepth=32 –numjobs=4 –group_reporting.It is important to notice that we normally run the tests with 4 threads and iodepth=32, for the blocksize=4K.Please let us know as we may need to keep researching about this.NC- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-20-2016 07:32 AM
Hello,
With proposed settings I received the following result:
myjob: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32
...
fio-2.12
Starting 4 processes
myjob: (groupid=0, jobs=4): err= 0: pid=23560: Wed Jul 20 07:06:08 2016
read : io=1092.2GB, bw=1863.1MB/s, iops=477156, runt=600001msec
slat (usec): min=1, max=63, avg= 2.76, stdev= 1.57
clat (usec): min=14, max=3423, avg=260.81, stdev=90.86
lat (usec): min=18, max=3426, avg=263.68, stdev=90.84
clat percentiles (usec):
| 1.00th=[ 114], 5.00th=[ 139], 10.00th=[ 157], 20.00th=[ 185],
| 30.00th=[ 207], 40.00th=[ 229], 50.00th=[ 251], 60.00th=[ 274],
| 70.00th=[ 298], 80.00th=[ 326], 90.00th=[ 374], 95.00th=[ 422],
| 99.00th=[ 532], 99.50th=[ 588], 99.90th=[ 716], 99.95th=[ 788],
| 99.99th=[ 1048]
bw (KB /s): min= 5400, max=494216, per=25.36%, avg=484036.11, stdev=14017.77
lat (usec) : 20=0.01%, 50=0.01%, 100=0.23%, 250=49.61%, 500=48.54%
lat (usec) : 750=1.55%, 1000=0.06%
lat (msec) : 2=0.01%, 4=0.01%
cpu : usr=15.00%, sys=41.78%, ctx=77056567, majf=0, minf=264
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued : total=r=286294132/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
READ: io=1092.2GB, aggrb=1863.1MB/s, minb=1863.1MB/s, maxb=1863.1MB/s, mint=600001msec, maxt=600001msec
Disk stats (read/write):
nvme0n1: ios=286276788/29109, merge=0/0, ticks=72929877/10859607, in_queue=84848144, util=99.33%
But in this case it was the test for raw device (/dev/nvme0n1), whereas in our case it was file on XFS on NVMe drive.
Also during latest tests we determined that flushing page cache (echo 1 > /proc/sys/vm/drop_caches) solves the problem.
Why does page cache affect direct IO - is still the question.
Can it be something specific to NVMe drivers?
AlexNZ
