In order to use again the same drive in ZFS, if you are sure the disk is not faulty, doing DD to the first 10 GB of the drive is a good start, but you need to do it also at the end of the drive. But there is no need to DD 10 GB, I believe the first and last MB is enough.

I solved it this way:

dd bs=512 if=/dev/zero of=/dev/sdk count=2048 seek=$(($(blockdev --getsz /dev/sdk) - 2048))
dd bs=512 if=/dev/zero of=/dev/sdk count=2048 

Then just add the "new" disk back to the ZFS pool. No need to labelclear, scrub, wipe the disk. DD is all you need. You also should not partition the disk after DD. ZFS will not accept any previously partitioned disk.

Then just zpool replace <pool-name> <old-device> <new-device>

normally is the disk-by-id and is the device listed in lsblk.

Answer from LincolnP on Stack Exchange
🌐
Stack Exchange
unix.stackexchange.com › questions › 346713 › zfs-ubuntu-16-04-replace-drive-with-itself
ZFS Ubuntu 16.04 Replace drive with itself - Unix & Linux Stack Exchange

In order to use again the same drive in ZFS, if you are sure the disk is not faulty, doing DD to the first 10 GB of the drive is a good start, but you need to do it also at the end of the drive. But there is no need to DD 10 GB, I believe the first and last MB is enough.

I solved it this way:

dd bs=512 if=/dev/zero of=/dev/sdk count=2048 seek=$(($(blockdev --getsz /dev/sdk) - 2048))
dd bs=512 if=/dev/zero of=/dev/sdk count=2048 

Then just add the "new" disk back to the ZFS pool. No need to labelclear, scrub, wipe the disk. DD is all you need. You also should not partition the disk after DD. ZFS will not accept any previously partitioned disk.

Then just zpool replace <pool-name> <old-device> <new-device>

normally is the disk-by-id and is the device listed in lsblk.

Answer from LincolnP on unix.stackexchange.com
🌐
Ask Ubuntu
askubuntu.com › questions › 305830 › replacing-a-dead-disk-in-a-zpool
server - Replacing a dead disk in a zpool - Ask Ubuntu

After digging endlessly this night I finally found the solution. The short answer is that you can use the disks' GUIDs (which persist even after disconnecting a drive) with the zpool command.

Long answer: I got the disk's GUID using the zdb command which gave me the following output

root@zeus:/dev# zdb
hermes:
    version: 28
    name: 'hermes'
    state: 0
    txg: 162804
    pool_guid: 14829240649900366534
    hostname: 'zeus'
    vdev_children: 1
    vdev_tree:
        type: 'root'
        id: 0
        guid: 14829240649900366534
        children[0]:
            type: 'raidz'
            id: 0
            guid: 5355850150368902284
            nparity: 1
            metaslab_array: 31
            metaslab_shift: 32
            ashift: 9
            asize: 791588896768
            is_log: 0
            create_txg: 4
            children[0]:
                type: 'disk'
                id: 0
                guid: 11426107064765252810
                path: '/dev/disk/by-id/ata-ST3300620A_5QF0MJFP-part2'
                phys_path: '/dev/gptid/73b31683-537f-11e2-bad7-50465d4eb8b0'
                whole_disk: 1
                create_txg: 4
            children[1]:
                type: 'disk'
                id: 1
                guid: 15935140517898495532
                path: '/dev/disk/by-id/ata-ST3300831A_5NF0552X-part2'
                phys_path: '/dev/gptid/746c949a-537f-11e2-bad7-50465d4eb8b0'
                whole_disk: 1
                create_txg: 4
            children[2]:
                type: 'disk'
                id: 2
                guid: 7183706725091321492
                path: '/dev/disk/by-id/ata-ST3200822A_5LJ1CHMS-part2'
                phys_path: '/dev/gptid/7541115a-537f-11e2-bad7-50465d4eb8b0'
                whole_disk: 1
                create_txg: 4
            children[3]:
                type: 'disk'
                id: 3
                guid: 17196042497722925662
                path: '/dev/disk/by-id/ata-ST3200822A_3LJ0189C-part2'
                phys_path: '/dev/gptid/760a94ee-537f-11e2-bad7-50465d4eb8b0'
                whole_disk: 1
                create_txg: 4
    features_for_read:

The GUID I was looking for is 15935140517898495532 which enabled me to do

root@zeus:/dev# zpool offline hermes 15935140517898495532
root@zeus:/dev# zpool status
  pool: hermes
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
  scan: scrub repaired 0 in 2h4m with 0 errors on Sun Jun  9 00:28:24 2013
config:

        NAME                         STATE     READ WRITE CKSUM
        hermes                       DEGRADED     0     0     0
          raidz1-0                   DEGRADED     0     0     0
            ata-ST3300620A_5QF0MJFP  ONLINE       0     0     0
            ata-ST3300831A_5NF0552X  OFFLINE      0     0     0
            ata-ST3200822A_5LJ1CHMS  ONLINE       0     0     0
            ata-ST3200822A_3LJ0189C  ONLINE       0     0     0

errors: No known data errors

and then

root@zeus:/dev# zpool replace hermes 15935140517898495532 /dev/disk/by-id/ata-ST3500320AS_9QM03ATQ
root@zeus:/dev# zpool status
  pool: hermes
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sun Jun  9 01:44:36 2013
    408M scanned out of 419G at 20,4M/s, 5h50m to go
    101M resilvered, 0,10% done
config:

        NAME                            STATE     READ WRITE CKSUM
        hermes                          DEGRADED     0     0     0
          raidz1-0                      DEGRADED     0     0     0
            ata-ST3300620A_5QF0MJFP     ONLINE       0     0     0
            replacing-1                 OFFLINE      0     0     0
              ata-ST3300831A_5NF0552X   OFFLINE      0     0     0
              ata-ST3500320AS_9QM03ATQ  ONLINE       0     0     0  (resilvering)
            ata-ST3200822A_5LJ1CHMS     ONLINE       0     0     0
            ata-ST3200822A_3LJ0189C     ONLINE       0     0     0

errors: No known data errors

After resilvering had been completed everything worked well again. It would have been nice to include this information, that you can use a disk's GUID obtained through zdb with the zpool command, with the manpage of zpool.

Edit

As pointed out by durval below the zdb command may not output anything. Then you may try to use

zdb -l /dev/<name-of-device>

to explicitly list information about the device (even if it already is missing from the system).

Answer from Marcus on askubuntu.com
🌐
FreeBSD
forums.freebsd.org › base system › storage
How to replace zfs failed drive? | The FreeBSD Forums
When the pool was healthy, a scrub ... do something regularly to catch corrupt files and fix them? ... Click to expand... do a search on the forum for "zfs scrub"....
🌐
Tritondatacenter
docs.tritondatacenter.com › private-cloud › troubleshooting › disk-replacement
Understanding and Resolving ZFS Disk Failure
Because scrubbing and resilvering ... scrub" command returns an error. By enabling ZFS autoreplace on a pool (a property disabled by default) you will enable your system to automatically use a spare drive to replace FAULTED/UNAVAIL drives....
🌐
Oracle
docs.oracle.com › cd › E19253-01 › 819-5461 › gazgd › index.html
Oracle
On some systems, such as the Sun Fire x4500, you must unconfigure a disk before you can take it offline. If you are replacing a disk in the same slot position on this system, then you can just run the zpool replace command as described in the first example in this section.
🌐
Server Fault
serverfault.com › questions › 1147733 › zfs-on-linux-disk-replacement-best-practices
zfsonlinux - ZFS on Linux disk replacement best practices - Server Fault

There's no difference between the options when all is said and done — drives are interchangeable. "Just add sdi as the new spare" requires fewer steps and minimizes the amount of time that you spend in a resyncing state, so it's the natural choice.

Answer from hobbs on serverfault.com
🌐
Archlinux
bbs.archlinux.org › viewtopic.php
[SOLVED] How to replace a failed drive in a ZFS pool / Kernel & Hardware / Arch Linux Forums
Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://zfsonlinux.org/msg/ZFS-8000-4J scan: resilvered 1.36M in 0h19m with 0 errors on Wed Oct 1 03:59:16 2014 config: NAME STATE READ WRITE CKSUM zfsdatapool ...
Find elsewhere
🌐
Oracle
docs.oracle.com › cd › E19253-01 › 819-5461 › gbcet › index.html
Replacing a Device in a ZFS Storage Pool
Physically replace the disk (c1t3d0). Ensure that the blue Ready to Remove LED is illuminated before you physically remove the faulted drive.
🌐
Reddit
reddit.com › r/zfs › help! faulted drive. what do i do now?
r/zfs on Reddit: Help! Faulted drive. What do I do now?

Probably have to look into the details. I haven't used whatever produces that UI. But somewhere you should be able to get the SMART values for the drive. Or more generally you have to dig into the details to figure out what is a actually wrong.

🌐
Dlford
dlford.io › linux-zfs-raid-disk-replacement-procedure
5 Steps to Safely Replace a Drive in a Linux ZFS Array | dlford.io
Let’s lay out an example scenario - say we have a mirrored (RAID1) array, and I just got an Email alert from smartmontools telling me that a drive /dev/sdf is failing in my ZFS RAID 10 array. Before I configure an array, I like to make sure all drive bays are labelled with the corresponding ...
🌐
GitHub
github.com › openzfs › zfs › issues › 2076
RAIDZ1: unable to replace a drive with itself · Issue #2076 · openzfs/zfs
Trying to simulate failure scenarios with a 3+1 RAIDZ1 array in order to prepare for eventualities. # create spfstank raidz1 -o ashift=12 sda sdb sdc sdd # zfs create spfstank/part # dd if=/dev/ran...
🌐
Reddit
reddit.com › r/zfs › drive become faulted and then unavailable after a successful replacement
r/zfs on Reddit: Drive become FAULTED and then UNAVAILABLE after a successful replacement

export the pool and import it with zpool import -d /dev/disk/by-id/ instead. If using an sdX name is all that was wrong, it should find the correct disk and use that.

🌐
Stack Exchange
unix.stackexchange.com › questions › 273620 › replacing-a-failed-disk-in-a-zfs-pool
centos - Replacing a failed disk in a ZFS pool - Unix & Linux Stack Exchange

ZFS detects disks not by their name in the filesystem, but by their UUID that is written onto the disk (or at least something similar -- not 100% sure that it's actually a UUID). When zpool import runs, the disks are enumerated, ZFS rebuilds all the pools, and then uses the device name (without actually including any directory IME, usually it's something like sda rather than /dev/sda) in the zpool status output. As such, if you move the drives around (or if the kernel moves the drives around, which can happen with modern kernels on modern hardware), zpool will still detect the disks in the same order as it did before; disks that appeared first in the output will again appear first in the output, even if the kernel doesn't enumerate them in that output anymore.

What happened to you here is probably that due to the fact that the original zpool import didn't work, the kernel could complete its boot, udev did a lot more work, and then by the time you did the manual zpool import, the default enumeration of all your disks turned out to have the serial number-based ones first, rather than the sdX-based ones. Most likely, the next time you reboot the machine, the used names will be back to the sdX scheme.

Luckily, resolving the names from one naming scheme to the other is fairly straightforward:

wouter@gangtai:/dev/disk/by-id$ ls -l
total 0
lrwxrwxrwx. 1 root root  9 Mar 31 18:15 ata-SAMSUNG_MZ7TE256HMHP-00004_S1RKNSAFC04685 -> ../../sda
lrwxrwxrwx. 1 root root 10 Mar 31 18:15 ata-SAMSUNG_MZ7TE256HMHP-00004_S1RKNSAFC04685-part1 -> ../../sda1
lrwxrwxrwx. 1 root root  9 Mar 31 18:15 wwn-0x50025388a089e89c -> ../../sda
lrwxrwxrwx. 1 root root 10 Mar 31 18:15 wwn-0x50025388a089e89c-part1 -> ../../sda1

There are multiple naming schemes (by-id, by-uuid, and by-path), all of which can be found under /dev/disk.

Having said all that, I must say I don't agree with your claim that it would be easier to figure out which disk is which by looking at the sdX names. Modern kernels no longer assign static device names to particular devices; this is why modern distributions use UUID-based fstab files, rather than sdX-based ones. The serial number, in fact, is a far more reliable way to figure out which is the broken disk; after all, it's written on the actual disk, in contrast to the sdX name, which may differ from boot to boot (I've actually encountered that on a ZFS box with sixteen hard disks). Any one of the other methods (by-uuid, by-id, and especially by-path in the enterprise-level multi-disk enclosures) is much more reliable than that.

Answer from Wouter Verhelst on unix.stackexchange.com
🌐
Reddit
reddit.com › r/zfs › replacing possible failing drives
r/zfs on Reddit: replacing possible failing drives

If you have failing drives, I would not attempt to replace them one-by-one. Buy a new disk shelf, build a new pool with the new disks and copy your data over ONCE.

🌐
Proxmox
forum.proxmox.com › home › forums › proxmox virtual environment › proxmox ve: installation and configuration
[SOLVED] - Need help replacing disk in ZFS | Proxmox Support Forum
Hi guys. I run PBS with ZFS on a pool named RPOOL that contains 4 drives of 4Tb. /dev/sdb was phasing out and gave tons of errors. I did a "ls -a /dev/disk/by-id/", and here is the output concerning ths serial number K4KJ220L: ata-HGST_HUS726040ALA610_K4KJ220L...
🌐
45drives
knowledgebase.45drives.com › home › kb450412 – replacing drives in zfs pool on ubuntu 20.04
KB450412 - Replacing Drives in ZFS Pool on Ubuntu 20.04 - 45Drives Knowledge Base
September 10, 2021 - You are here: KB Home Ubuntu KB450412 – Replacing Drives in ZFS Pool on Ubuntu 20.04 Table of Contents Scope/DescriptionPrerequisitesStepsThrough Houston UIThrough Command LineVerificationTroubleshooting Scope/Description This article will walk through the steps to replace a failed drive ...
🌐
45drives
knowledgebase.45drives.com › home › kb450265 – replacing drives in zfs pool on rocky linux
KB450265 - Replacing Drives in ZFS Pool on Rocky Linux - 45Drives Knowledge Base
April 28, 2022 - You are here: KB Home Rocky Linux KB450265 – Replacing Drives in ZFS Pool on Rocky Linux Table of Contents Scope/DescriptionPrerequisitesThrough HoustonThrough Command LineVerificationTroubleshooting Scope/Description This article will walk through the steps to replace a failed drive in a ...
🌐
Reddit
reddit.com › r/zfs › need help with first time replacing a drive.
r/zfs on Reddit: Need help with first time replacing a drive.

why don't you just use zpool replace as suggested on the screen? replace the faulty drive with the new one and let zfs create the partitions and resilver the drive? maybe I didn't get the question, sorry

🌐
Nodinrogers
nodinrogers.com › post › 2022-12-12-replace-drive-in-zfs-pool
Replacing a drive in a ZFS array in Ubuntu | No D in Rogers
Sooner or later, every system will have a drive failure. ZFS was designed with this in mind. I noticed in my /var/log/kern.log file error messages about one of …