Solidigm

IS2 · ‎01-29-2018

Hello,

I am testing Intel DC P3600 SSD on Linux (kernel 4.9) with fio-2.16.

I am running performance tests using sync ioengine with 4K blocksize, direct

access and queue depth of 1. Below is the fio config file:

[global]

iodepth=1

direct=1

ioengine=sync

group_reporting

time_based

blocksize=4k

[job1]

rw=randread

;rw=read

filename=/dev/nvme0n1

name=raw=sequential-read

numjobs=1

runtime=60

I see a huge performance difference between sequential and random read workload:

sync sequential: 75K IOPS

sync random: 10K IOPS

Using other synchronous ioengines situation is the same:

pvsync sequential: 75K IOPS

pvsync random: 10K IOPS

pvsync2 sequential: 75K IOPS

pvsync2 random: 10K IOPS

Why the difference is so huge? What makes sequential much faster than random?

idata · ‎01-30-2018

Hello NFN,

Thank you for your interest in the Intel® SSD DC P3600 Series. I understand that you would like to gain more knowledge regarding the reason why sequential reads are much faster than random reads. In order to provide a better explanation, I'll share the terms involved: Sequential read speeds: are usually reported in megabytes per second (MB/S), although they can be converted to IOPS, and indicate how fast the SSD will be at completing tasks like accessing large multimedia files, transcoding, game level loading, some type of game play, watching and editing video. So, this is the speed at which the drive can read data from contiguous memory spaces. Random read speeds: are usually reported in Input/Output Operations Per Second (IOPS), and indicate how fast the SSD will be completing tasks like antivirus scans, searching for email, web browsing, application loading, PC booting, or working on a word processor. That means that this is the speed at which the drive can read data from non-contiguous memory spaces. Basically, the big difference stands in the fact that seeking non-contiguous data will always take considerably more time than seeking data that is adjacent. This seeking time grows as the block size in the SSD increases, because there are more places where the system needs to look up to find the requested information. If you have any other question, don't hesitate to contact us. Regards,Andres V.

IS2 · ‎01-30-2018

Hi Andres,

Thanks for the info. I agree that looking for large non-contiguous memory takes more time in rotational media, I don't understand why it takes more time to SSD to read a single 4k block synchronously based on history of previous requests.

Also, these are synchronous read requests of a single block, there is no queue of requests, single request submitted to device each time.

Is there an SSD's internal cache involved? And SSD does some kind of pre-fetching of chunks of blocks into this cache on each read request?

idata · ‎01-30-2018

Hello NSN,

One of the components in an SSD is the DRAM buffer, and for certain tasks it is used as cache. And even though this cache is intended to hide the seek time of random writes/reads as well as sequential writes/reads, its effects are more evident when handling contiguous data. And it turns out that the use of a DRAM buffer for cache is an excellent way to improve both endurance and performance of the program/read/erase operations. The algorithm (which is usually kept private by manufacturers) associated to the DRAM cache aggregates small data blocks into patterns that make the most efficient use of every Flash cycle (synchronous writes and reads) by writing and reading optimized data lengths. In this way caching techniques can reduce the number of program/read cycles, thus improving both performance and reliability. As you pointed out, this is applied to single blocks, which is why the difference in speeds is eminent even though there is no request queues.

This kind of seeking techniques and algorithms will tend to make the faster process even faster, which explains the difference in speed between the read values for contiguous and non-contiguous data.

Regards,Andres V.

IS2 · ‎01-30-2018

Hi Andres,

Thanks for clarification. I have another question on this issue.

The disk in this question is SSDPE2ME800G4D, the one that performs better with synchronous sequential reads than with synchronous random reads (QD=1, blocksize=4k, direct, as described in my first post in this thread)).

I have another Intel NVMe disk, SSDPE2MD800G4, and this one performs equally slow in both cases: about 14K IOPS for synchronous sequential reads and for synchronous random reads.

I run the same test on both disks.

So my question is why SSDPE2MD800G4 disk doesn't have the capability of higher sequential reads?

Thanks

Solidigm

Intel DC P3600 SSD random vs sequential reads performance difference