There's two types of "data" in the pool: the actual data of whatever it is you're storing, and the metadata, which is all the tables, properties, indexes and other "stuff" that defines the pool structure, the datasets, and the pointers that tell ZFS where on disk to find the actual data.

Normally this is all mixed in together on the regular pool vdevs (mirror, raidz, etc). If you add a special vdev to your pool, then ZFS will prefer to store the metadata there, and send the data proper to the regular vdevs. The main reason for doing this is if you have "slow" data vdevs; adding a special vdev of a SSD mirror can speed up access times, as ZFS can look to the SSDs to know where on the data vdevs to find its data and go directly there, rather than loading the metadata off the slow vdevs and then needing another access to get the real data.

There's another possible advantage: ZFS can store "small" files on the special vdev, leaving larger ones for the regular data vdevs.

One important thing to remember is that special vdevs are a proper part of the pool, not an add-on - just like the regular data vdevs, if the special vdev fails, the pool is lost. An SSD mirror is typical for this vdev.

This is only a rough explanation, see also:

  • zpoolconcepts(7)
  • Level1Techs writeup
🌐
OpenZFS
openzfs.github.io › openzfs-docs › Basic Concepts › VDEVs.html
VDEVs — OpenZFS documentation
December 2, 2025 - When there is still enough redundancy ... failed vdev with a new one and ZFS will automatically resilver (rebuild) the data onto the new vdev to return the pool to a healthy state. Vdevs are managed using the zpool(8) command-line utility....
🌐
Klara Systems
klarasystems.com › home › openzfs – understanding zfs vdev types
OpenZFS - Understanding ZFS vdev Types - Klara Systems
May 9, 2023 - Confused about how to set up your ZFS pool? This in-depth guide breaks down the building blocks of a zpool—explaining vdev types like mirror, RAIDz, dRAID, and support classes such as LOG, CACHE, and SPECIAL. Learn how each configuration affects performance, fault tolerance, and scalability—so ...
🌐
GitHub
openzfs.github.io › openzfs-docs › man › master › 7 › zpoolconcepts.7.html
zpoolconcepts.7 — OpenZFS documentation
But, when an active device fails, it is automatically replaced by a hot spare. To create a pool with hot spares, specify a spare vdev with any number of devices. For example, ... Spares can be shared across multiple pools, and can be added with the zpool add command and removed with the zpool ...
🌐
Proxmox
forum.proxmox.com › home › forums › proxmox virtual environment › proxmox ve: installation and configuration
ZFS Metadata Special Device | Proxmox Support Forum
June 16, 2023 - Thanks in advance! Click to expand... It is more or less not much more than: zpool add POOLNAME special mirror /dev/sdX /dev/sdY and for the blocksize: zfs set special_small_blocks=1M POOLNAME
🌐
Reddit
reddit.com › r/zfs › guidance on how the special vdev performs.
r/zfs on Reddit: Guidance on how the special vdev performs.
November 18, 2021 -

Racking my brain trying to figure out it's actual behavior. No documentation anywhere I can find actually comments on this.

The special vdev is used to store metadata from the pool, so operations like directory listing are at the speed of an NVME drive, which will also improve the performance of spinning rust by reducing the load of small IO required to lookup the pool metadata. This makes sense.

What confuses me is the behavior of the small block allocation class when the special vdev is at capacity. There seems to be a few scenarios that aren't talked about anywhere in the documentation that would be important for performance in various scenarios.

From how i've seen it talked about my understanding is that it acts as a write back cache for small IO based on the special_small_blocks value of the dataset, and when the special vdev is at capacity or needs more space for metadata, small block allocation is offloaded back to the data vdevs.

However i've never actually seen this mentioned anywhere. From the closest thing to a confirmation is in the TrueNas documentation on a fusion pool saying

If the special class becomes full, then allocations spill back into the normal class.

By spill, does that mean, removed from the special class so new incoming small block writes are added to the cache. Or does that mean when the special class is full small IO is sent directly back to the normal class bypassing the special vdev.

Top answer
1 of 2
11
I have answered my own question eventually, partially from some old forum posts and practical testing. Once the special VDEV is full it stays full. The standard small block allocation is 75% of the drive space(default but adjustable), after that further blocks will be sent directly to the backing storage. It can be expanded after the fact, and also striped to another vdev if space becomes constrained, but there is no way to rebalance. Furthermore, the special vdev hits it's allocation limit and later expanded, the small IO will have to be re-written to move it back to it's allocation class. As long as you're using all striped mirrors in your pool with the same ashift(I am), you can "flush" the metadata and small IO back to your main pool by removing the special vdev. I.E zpool remove pool mirror- with x being the mirrored SSD's vdev name This will write all metadata and small blocks back to the main pool. But it also prevents the special vdev from being useful again if re-added without re-writing all your data as new metadata is only added on write.
2 of 2
2
What confuses me is the behavior of the small block allocation class when the special vdev is at capacity. There seems to be a few scenarios that aren't talked about anywhere in the documentation that would be important for performance in various scenarios. When it reaches 75% full, all further small blocks writes go to regular vdevs like normal. Current blocks stay where they are. The remaining space will be reserved for just metadata. Adjust the percentage here: /etc/modprobe.d/zfs.conf by adding zfs_special_class_metadata_reserve_pct=10% for example and rebooting. There might be a way to do it live (only lasts until reboot) I think, but I forget right now. You can search the source code here for "zfs_special_class_metadata_reserve_pct" and find what it touches. Sometimes there are comments you might find helpful. Searching the github issue and pull requests can also be helpful. You're probably thinking that all of this is horribly documented and scattered around like marbles. You're right. As long as you're using all striped mirrors in your pool with the same ashift(I am), you can "flush" the metadata and small IO back to your main pool by removing the special vdev Make absolutely certain you have things backed up and confirmed good (scrubbed) before you try this. This has rarely resulted in problems, mercenary_sysadmin saw corruption I believe when he did some testing when special vdevs first came out. The issue was never resolved and I don't know if he's revisited it. I personally won't trust vdev removal for a long time. Also I do highly recommend a triple mirror as the smallest you consider. Preferably on old enterprise ssds with real PLP (visible capacitors), which are cheap enough to find on ebay. Here's some other references for those curious about special vdevs. Various findings: https://forum.level1techs.com/t/zfs-metadata-special-device-z/159954 Generate a histogram of block sizes Calculate data from ZDB output Clarifies behavior of what goes where when setting small block size: https://github.com/openzfs/zfs/issues/9131#issuecomment-528562601 Small blocks can now be set to anything you can set recordsize to (512B-1M) and even bigger if you enable larger recordsizes (up to 16M): https://github.com/openzfs/zfs/pull/9355
🌐
GitHub
openzfs.github.io › openzfs-docs › man › 7 › zpoolconcepts.7.html
Github
You should have been redirected · If not, click here to continue
🌐
Reddit
reddit.com › r/zfs › advice on the special vdev in my zfs setup
r/zfs on Reddit: Advice on the special VDEV in my ZFS setup
March 6, 2023 -

I am planning out my ZFS setup as I'm moving from snapraid. I bought a few sticks of 118gb optane to play around with and am considering using them or some high end SSD's mirrored in a special vdev. I'm considering using some 2tb sn850x's instead of optane to be able to store small blocks on the special vdev. I store mostly video and photos in my server and plan on having a 5x20tb raidz2 and a 3x8tb mirror vdev in my pool. I have 64gb of non ecc ram and my server is on a 1gbit nic. The performance improvement I want to see is faster loading of my folders as it currently takes 10-20 seconds to load the file structure and thumbnails in the worse case. Would a special vdev suit my needs or would arc and l2arc be fine enough for my needs? I would appreciate any advice on my setup.

Find elsewhere
🌐
TrueNAS Community
truenas.com › forums › developer's corner
Redundancy necessary for special Metadata vdev? | TrueNAS Community
June 11, 2020 - Is there a way to set a different ashift for each vdev? Click to expand... TrueNAS should default to ashift=12 for all devices including the special mirror, and honestly you shouldn't use anything less than that with how common 512e drives are. Are you looking to increase the ashift value? Increasing TrueNAS SCALE ARC Size beyond the default 50% Do you have an SLOG device, or think you need one? Check benchmarks in and add more data to this thread. ... zpool create library raidz2 /zfs/disk[1-8] -o ashift=12 # PLATTER DISKS zpool add library special mirror /zfs/meta[1-2] -o ashift=13 -f #Mirrored SSDs zpool add library cache /zfs/cache1 -o ashift=13 #NVME zpool add library log /zfs/slog11 -o ashift=13 #NVME And then add the below to put small blocks on the faster SSDs?
🌐
TrueNAS
truenas.com › docs › core › 13.0 › coretutorials › storage › pools › fusionpool
Fusion Pools | TrueNAS Documentation Hub
April 25, 2025 - A special VDEV can store metadata such as file locations and allocation tables. The allocations in the special class are dedicated to specific block types. By default, this includes all metadata, the indirect blocks of user data, and any deduplication tables. The class can also be provisioned to accept small file blocks. This is a great use case for high performance but smaller sized solid-state storage. Using a special vdev drastically speeds up random I/O and cuts the average spinning-disk I/Os needed to find and access a file by up to half.
🌐
Tadeu Bento
tadeubento.com › 2024 › aarons-zfs-guide-vdevs
Aaron’s ZFS Guide: VDEVs
In Linux software RAID, you might have a “/dev/md0” device that represents a RAID-5 array of 4 disks. In this case, “/dev/md0” would be your “VDEV”. ... It’s important to note that VDEVs are always dynamically striped. This will make more sense as we cover the commands below. However, suppose there are 4 disks in a ZFS ...
🌐
Reddit
reddit.com › r/zfs › how do i make a metadata special device vdev?
r/zfs on Reddit: How do I make a metadata special device vdev?
December 30, 2022 -

I have 100 TB raidz1 pool that I am about to create, and I want to store the metadata for it on mirrored SSDs.

How do I create the metadata storage for only the 100TB pool (not the OS)?

zpool create -f -o ashift=12  -m /media storage \
			-o recordsize=1M \
		    -o primarycache=metadata -o secondarycache=none \
               raidz \
                  ata-ST3000DM001-9YN166_HWID \
                  ata-ST3000DM001-9YN166_HWID \
                  ata-ST3000DM001-9YN166_HWID \
                  ata-ST3000DM001-9YN166_HWID
zpool add storage -o ashift=12 special mirror /dev/ssd0n1 /dev/ssd1n1
zfs set special_small_blocks=128K storage 
🌐
Level1Techs
forum.level1techs.com › l1 articles & video-related
ZFS Metadata Special Device: Z - L1 Articles & Video-related - Level1Techs Forums
March 24, 2024 - Introduction ZFS Allocation Classes: It isn’t storage tiers or caching, but gosh darn it, you can really REALLY speed up your zfs pool. From the manual: Special Allocation Class The allocations in the special class ar…
🌐
Quagmyre
info.quagmyre.com › xwiki › bin › view › Tech-Tips › ZFS-The-Aaron-Topponce-Archive › ZFS-Administration-Part-I-VDEVs
ZFS Administration - Part I - VDEVs - XWiki
VDEVs can be nested. A perfect example is a standard RAID-1+0 (commonly referred to as "RAID-10"). This is a stripe of mirrors. In order to specify the nested VDEVs, I just put them on the command line in order (emphasis mine):
🌐
Proxmox
forum.proxmox.com › home › forums › proxmox backup server › proxmox backup: installation and configuration
PBS and ZFS Special Allocation Class VDEV ... aka Fusion Drive | Proxmox Support Forum
June 14, 2024 - ===================================================== Add a special vdev. zpool add rpool -f -o ashift=12 special mirror scsi-<>-part3 scsi-<>-part3 scsi-<>-part3 Configure it. zfs set recordsize=1M rpool zfs set special_small_blocks=512K rpool ===================================================== Test results
🌐
FreeBSD
forums.freebsd.org › base system › storage
ZFS - ZFS special device on shared drive | The FreeBSD Forums
September 7, 2022 - With those two slots I have to create mirror vdevs that will host the OS/swap & special device ... Click to expand... What's this "special device" you keep mentioning? Why do you need 7 TB of metadata? Metadata of what exactly? ... SirDice "ZFS special allocation class" (I think).
🌐
YouTube
youtube.com › watch
Boost ZFS Performance with a Special VDEV in TrueNAS - YouTube
Curious about how a special metadata VDEV can boost ZFS performance, especially with spinning disks? In this video, I walk through what it is, why it matters...
Published   May 10, 2025
🌐
Proxmox
forum.proxmox.com › home › forums › proxmox virtual environment › proxmox ve: installation and configuration
ZFS Special VDEV | Proxmox Support Forum
January 1, 2025 - Hi, On the beginning of 2024, I set up a new storage server for my work using ZFS and Samba on Proxmox. I added to the ZFS pool a "special" vdev, which gives a really good performance when the 40TB (around 50 Million files) of data are backed up, as the file metadata can be analysed very quickly...