There's two types of "data" in the pool: the actual data of whatever it is you're storing, and the metadata, which is all the tables, properties, indexes and other "stuff" that defines the pool structure, the datasets, and the pointers that tell ZFS where on disk to find the actual data.
Normally this is all mixed in together on the regular pool vdevs (mirror, raidz, etc). If you add a special vdev to your pool, then ZFS will prefer to store the metadata there, and send the data proper to the regular vdevs. The main reason for doing this is if you have "slow" data vdevs; adding a special vdev of a SSD mirror can speed up access times, as ZFS can look to the SSDs to know where on the data vdevs to find its data and go directly there, rather than loading the metadata off the slow vdevs and then needing another access to get the real data.
There's another possible advantage: ZFS can store "small" files on the special vdev, leaving larger ones for the regular data vdevs.
One important thing to remember is that special vdevs are a proper part of the pool, not an add-on - just like the regular data vdevs, if the special vdev fails, the pool is lost. An SSD mirror is typical for this vdev.
This is only a rough explanation, see also:
zpoolconcepts(7)- Level1Techs writeup
zpool list -v -H -P
-v verbose
-P show full paths, not just the last component
-H script mode - no headings, fields separated by tab character
That will get you a lot closer.
The zpool-status(8) command support JSON output in more recent OpenZFS versions.
Here's an example:
$ zpool status --json | jq -r '.. | select(.vdev_type? == "disk").name'
usb-QEMU_QEMU_HARDDISK_1-0000:00:04.0-4.2-0:0
usb-QEMU_QEMU_HARDDISK_1-0000:00:04.0-4.5-0:0
Racking my brain trying to figure out it's actual behavior. No documentation anywhere I can find actually comments on this.
The special vdev is used to store metadata from the pool, so operations like directory listing are at the speed of an NVME drive, which will also improve the performance of spinning rust by reducing the load of small IO required to lookup the pool metadata. This makes sense.
What confuses me is the behavior of the small block allocation class when the special vdev is at capacity. There seems to be a few scenarios that aren't talked about anywhere in the documentation that would be important for performance in various scenarios.
From how i've seen it talked about my understanding is that it acts as a write back cache for small IO based on the special_small_blocks value of the dataset, and when the special vdev is at capacity or needs more space for metadata, small block allocation is offloaded back to the data vdevs.
However i've never actually seen this mentioned anywhere. From the closest thing to a confirmation is in the TrueNas documentation on a fusion pool saying
If the special class becomes full, then allocations spill back into the normal class.
By spill, does that mean, removed from the special class so new incoming small block writes are added to the cache. Or does that mean when the special class is full small IO is sent directly back to the normal class bypassing the special vdev.
I am planning out my ZFS setup as I'm moving from snapraid. I bought a few sticks of 118gb optane to play around with and am considering using them or some high end SSD's mirrored in a special vdev. I'm considering using some 2tb sn850x's instead of optane to be able to store small blocks on the special vdev. I store mostly video and photos in my server and plan on having a 5x20tb raidz2 and a 3x8tb mirror vdev in my pool. I have 64gb of non ecc ram and my server is on a 1gbit nic. The performance improvement I want to see is faster loading of my folders as it currently takes 10-20 seconds to load the file structure and thumbnails in the worse case. Would a special vdev suit my needs or would arc and l2arc be fine enough for my needs? I would appreciate any advice on my setup.
I have 100 TB raidz1 pool that I am about to create, and I want to store the metadata for it on mirrored SSDs.
How do I create the metadata storage for only the 100TB pool (not the OS)?
zpool create -f -o ashift=12 -m /media storage \
-o recordsize=1M \
-o primarycache=metadata -o secondarycache=none \
raidz \
ata-ST3000DM001-9YN166_HWID \
ata-ST3000DM001-9YN166_HWID \
ata-ST3000DM001-9YN166_HWID \
ata-ST3000DM001-9YN166_HWIDzpool add storage -o ashift=12 special mirror /dev/ssd0n1 /dev/ssd1n1
zfs set special_small_blocks=128K storage