07-14-2016 01:40 PM
We have been testing two Intel DC P3700 U.2 800GB NVMe SSDs to see the impact of the emulated sector size to throughput (512 vs 4096). Using fio 2.12, we observed a puzzling collapse of performance. The steps are given below.
Steps:
1. Copy or write sequentially single large file (300G or larger)
2. Start fio test with the following config:
[readtest]
thread=1
blocksize=2m
filename=/export/beegfs/data0/file_000000
rw=randread
direct=1
buffered=0
ioengine=libaio
nrfiles=1
gtod_reduce=0
numjobs=32
iodepth=128
runtime=360
group_reporting=1
percentage_random=90
3. Observe extremely slow performance:
fio-2.12
Starting 32 threads
readtest: (groupid=0, jobs=32): err= 0: pid=5097: Thu Jul 14 13:00:25 2016
read : io=65536KB, bw=137028B/s, iops=0, runt=489743msec
slat (usec): min=4079, max=7668, avg=5279.19, stdev=662.80
clat (msec): min=3, max=25, avg=18.97, stdev= 6.16
lat (msec): min=8, max=31, avg=24.25, stdev= 6.24
clat percentiles (usec):
| 1.00th=[ 3280], 5.00th=[ 4320], 10.00th=[ 9664], 20.00th=[17536],
| 30.00th=[18816], 40.00th=[20352], 50.00th=[20608], 60.00th=[21632],
| 70.00th=[21632], 80.00th=[22912], 90.00th=[25472], 95.00th=[25472],
| 99.00th=[25472], 99.50th=[25472], 99.90th=[25472], 99.95th=[25472],
| 99.99th=[25472]
lat (msec) : 4=3.12%, 10=9.38%, 20=25.00%, 50=62.50%
cpu : usr=0.00%, sys=74.84%, ctx=792583, majf=0, minf=16427
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=32/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=128
Run status group 0 (all jobs):
READ: io=65536KB, aggrb=133KB/s, minb=133KB/s, maxb=133KB/s, mint=489743msec, maxt=489743msec
Disk stats (read/write):
nvme0n1: ios=0/64317, merge=0/0, ticks=0/1777871, in_queue=925406, util=0.19%
4. Repeat the test
5. Performance is much higher:
fio-2.12
Starting 32 threads
readtest: (groupid=0, jobs=32): err= 0: pid=5224: Thu Jul 14 13:11:58 2016
read : io=861484MB, bw=2389.3MB/s, iops=1194, runt=360564msec
slat (usec): min=111, max=203593, avg=26742.15, stdev=21321.98
clat (msec): min=414, max=5176, avg=3391.05, stdev=522.29
lat (msec): min=414, max=5247, avg=3417.79, stdev=524.75
clat percentiles (msec):
| 1.00th=[ 1614], 5.00th=[ 2376], 10.00th=[ 2802], 20.00th=[ 3097],
| 30.00th=[ 3228], 40.00th=[ 3359], 50.00th=[ 3458], 60.00th=[ 3556],
| 70.00th=[ 3654], 80.00th=[ 3785], 90.00th=[ 3949], 95.00th=[ 4080],
| 99.00th=[ 4359], 99.50th=[ 4424], 99.90th=[ 4752], 99.95th=[ 4883],
<e...
08-09-2016 07:10 AM
Hi AlexNZ,
Please share any findings from the kernel communities, we will be waiting for your response.NC08-09-2016 10:23 AM
Hello
Today I managed to reproduce the same issue on simple SSD (Intel DC S2510 SATA) with the same behaviour. It indirectly proves my version which blames Linux Kernel.
But I doubt I'll escalate this issue to Kernel community because of the reasons I mentioned above.
Thanks,
Alex
08-10-2016 05:36 AM
AlexNZ,
Thanks a lot for sharing all your findings. We encourage other customers to post any other questions as well.Have a nice day.NC