Best products from r/zfs

We found 30 comments on r/zfs discussing the most recommended products. We ran sentiment analysis on each of these comments to determine how redditors feel about different products. We found 51 products and ranked them based on the amount of positive reactions they received. Here are the top 20.

Top comments mentioning products on r/zfs:

u/txgsync · 6 pointsr/zfs

Linking OP's problem here...

Chances are 9/10 that the CPU is not "busy", but instead bumping up against a mutex lock. Welcome to the world of high-performance ZFS, where pushing forward the state-of-the-art is often a game of mutex whac-a-mole!

Here's the relevant CPU note from the post:

> did a perf top and it shows most of the kernel time spent in _raw_spin_unlock_irqrestore in z_wr_int_4 and osq_lock in z_wr_iss.

Seeing "lock" in the name of any kernel process is often a helpful clue. So let's do some research: what is "z_wr_iss"? What is "osq_lock"?

I decided to pull down the OpenZFS source code and learn by searching/reading. Lots more reading than I can outline here.

txgsync: ~/devel$ git clone https://github.com/openzfs/openzfs.git
txgsync: ~/devel$ cd openzfs/
txgsync: ~/devel/openzfs$ grep -ri z_wr_iss
txgsync: ~/devel/openzfs$ grep -ri osq_lock


Well, that was a bust. It's not in the upstream OpenZFS code. What about the zfsonlinux code?

txgsync: ~/devel$ git clone https://github.com/zfsonlinux/zfs.git
txgsync: ~/devel$ cd zfs
txgsync: ~/devel/zfs$ grep -ri z_wr_iss
txgsync: ~/devel/zfs$ grep -ri osq_lock


Still no joy. OK, time for the big search: is it in the Linux kernel source code?

txgsync: ~/devel$ cd linux-4.4-rc8/
txgsync: ~/devel/linux-4.4-rc8$ grep -ri osq_lock

Time for a cup of coffee; even on a pair of fast, read-optimized SSDs, digging through millions of lines of code with "grep" takes several minutes.

include/linux/osq_lock.h:#ifndef LINUX_OSQ_LOCK_H
include/linux/osq_lock.h:#define
LINUX_OSQ_LOCK_H
include/linux/osq_lock.h:#define OSQ_LOCK_UNLOCKED { ATOMIC_INIT(OSQ_UNLOCKED_VAL) }
include/linux/osq_lock.h:static inline void osq_lock_init(struct optimistic_spin_queue
lock)
include/linux/osq_lock.h:extern bool osq_lock(struct optimistic_spin_queue lock);
include/linux/rwsem.h:#include <linux/osq_lock.h>
include/linux/rwsem.h:#define __RWSEM_OPT_INIT(lockname) , .osq = OSQ_LOCK_UNLOCKED, .owner = NULL
include/linux/mutex.h:#include <linux/osq_lock.h>
kernel/locking/Makefile:obj-$(CONFIG_LOCK_SPIN_ON_OWNER) += osq_lock.o
kernel/locking/rwsem-xadd.c:#include <linux/osq_lock.h>
kernel/locking/rwsem-xadd.c: osq_lock_init(&sem->osq);
kernel/locking/rwsem-xadd.c: if (!osq_lock(&sem->osq))
kernel/locking/mutex.c:#include <linux/osq_lock.h>
kernel/locking/mutex.c: osq_lock_init(&lock->osq);
kernel/locking/mutex.c: if (!osq_lock(&lock->osq))
kernel/locking/osq_lock.c:#include <linux/osq_lock.h>
kernel/locking/osq_lock.c:bool osq_lock(struct optimistic_spin_queue
lock)

For those who don't read C well -- and I number myself among that distinguished group! -- here's a super-quick primer: if you see a file with ".h" at the end of the name, that's a "Header" file. Basically, it defines variables that are used elsewhere in the code. It's really useful to look at headers, because often they have helpful comments to tell you what the purpose of the variable is. If you see a file with ".c" at the end, that's the code that does the work rather than just defining stuff.

It's z_wr_iss that's driving the mutex lock; there's a good chance I can ignore the locking code itself (which is probably fine; at least I hope it is, because ZFS on Linux is probably easier to push through a fix than core kernel IO locking semantics) if I can figure out why we're competing over the lock (which is the actual problem). Back to grep...

txgsync: ~/devel/linux-4.4-rc8$ grep -ri z_wr_iss

MOAR COFFEE! This takes forever. Next hobby project: grok up my source code trees in ~devel; grep takes way too long.

...

...

And the search came up empty. Hmm. Maybe _iss is a structure that's created only when it's running, and doesn't actually exist in the code? I probably should understand what I'm pecking at a little better. Let's go back to the ZFS On Linux code:

mbarnson@txgsync: ~/devel/zfs$ grep -r z_wr

module/zfs/zio.c: "z_null", "z_rd", "z_wr", "z_fr", "z_cl", "z_ioctl"

Another clue! We've figured out the Linux Kernel name of the mutex we're stuck on, and that z_wr is a structure in "zio.c". Now this code looks pretty familiar to me. Let's go dive into the ZFS On Linux code and see why z_wr might be hung up on a mutex lock of type "_iss".

txgsync: ~/devel/zfs$ cd module/zfs/
txgsync: ~/devel/zfs/module/zfs$ vi zio.c

z_wr is a type of IO descriptor:

  • ==========================================================================
  • I/O type descriptions
  • ==========================================================================
    /
    const char
    zio_type_name[ZIO_TYPES] = {
    "z_null", "z_rd", "z_wr", "z_fr", "z_cl", "z_ioctl"
    };

    What about that z_wr_iss thing? And competition with z_wr_int_4? I've gotta leave that unanswered for now, because it's Saturday and I have a lawn to mow.

    It seems there are a few obvious -- if tentative -- conclusions:

  1. You're hung up on a mutex lock. This is probably not something that "tuning" will usually eliminate; double-check that you're not using compression, encryption, deduplication, or other obvious resource hogs.
  2. The name of the mutex lock is osq_lock in the Linux kernel. The name seems obvious: it's a queue of some sort. Could it be a write queue to access the device? A parallel test to all your devices -- without ZFS, just simultaneous writes across the stripe in some kind of raw fashion -- might turn up if this mutex is being held due to IO in general, or if it is specific to ZFS.
  3. The mutex competition appears to be between z_wr_int_4 (the write queue for 4k blocks, perhaps?) and z_wr_iss. You might be able to determine if z_wr_int_4 is what I described by re-running your test to see if the new competition is between z_wr_iss with something like z_wr_int_8 for 8k blocks instead.
  4. If I were the OP, I'd evaluate the disks one-by-one. Create a zpool of just the one drive, and run the IO test on just that drive first. If performance is good with a single-drive zpool, nuke the pool and use two drives in a stripe. Try again. See what the scale tipping point is with three drives, four drives, etc. Xen historically had challenging IO queueing when managing more than four block devices; I wonder if some legacy of this remains?
  5. You really need to see if you can reproduce this on bare metal. It seems likely that this is an artifact of virtualization under Xen. Even with paravirtualization of IO, any high-performance filesystem is really sensitive to latency in the data path. Seems more a Xen bug than a ZFS bug, but it might be work-around-able.
  6. Xen -- if I understand correctly -- uses a shared, fixed-size ring buffer and notification mechanism for I/O, just one per domU. So although you're throwing more drives at it, this moves the bottleneck from the drives to the ring buffer. If I were to pursue this further, I'd look to competition for this shared ring buffer resource as a likely candidate imposing a global throttle on all IO to the domU under your hypervisor:
    • you've filled the ring buffer,
    • Xen has to empty it and make room for more data before the lock can clear,
    • this suggests that the real governor is how long the Linux kernel mutex has to wait for Xen to poll the ring buffer again.
    • You might not observe this with forked processes in a paravirtualized kernel. ZFS is a multithreaded kernel process, so I wonder if it's being forced to use a single ring buffer for I/O in a Xen environment.

      It's just a hypothesis, but I think it may have some legs and needs to be ruled out before other causes can be ruled in.

      I was willing to dive into this a bit because I'm in the midst of some similar tests myself, and am also puzzled why the IO performance of Solaris zones so far out-strips ZFSoL under Xen; even after reading Brendan Gregg's explanation of Zones vs. KVM vs. Xen I obviously don't quite "get it" yet. I probably need to spend more time with my hands in the guts of things to know what I'm talking about.

      TL;DR: You're probably tripping over a Linux kernel mutex lock that is waiting on a Xen ring buffer polling cycle; this might not have much to do with ZFS per se. Debugging Xen I/O scheduling is hard. Please file a bug.

      ADDENDUM: The Oracle Cloud storage is mostly on the ZFS Storage Appliances. Why not buy a big IaaS instance from Oracle instead and know that it's ZFS under the hood at the base of the stack? The storage back-end systems have 1.5TB RAM, abundant L2ARC, huge & fast SSD SLOG, and lots of 10K drives as the backing store. We've carefully engineered our storage back-ends for huge IOPS. We're doubling-down on that approach with Solaris Zones and Docker in the Cloud with Oracle OpenStack for Solaris and Linux this year, and actively disrupting ourselves to make your life better. I administer the architecture & performance of this storage for a living, so if you're not happy with performance in the Oracle Cloud, your problem is right in my wheelhouse.

      Disclaimer: I'm an Oracle employee. My opinions do not necessarily reflect those of Oracle or its affiliates.
u/fryfrog · 3 pointsr/zfs

Good old, nothing special Seagate [ST8000AS0002] (https://www.amazon.com/gp/product/B00XS423SC/). I've only had the pool online for a couple months at this point, so I can't comment on reliability... but so far I'm happy.

As long as you know what you're getting into with SMR disks, I think you can live w/ them fairly well. I am glad I have a normal pool too though. All my stuff lands there first and then moves to the SMR pool.

Some things worth mentioning... the disks are 5900 rpm, so they're not going to be great at random io. They do have ~25G of PMR area on each disk, so if your work load isn't entirely ideal for SMR disks... it can still work. A copy on write filesystem seems particularly well paired with SMR disks since they don't need to re-shingle to modify a file. They do streaming, linear writes very well to the SMR portion of the disk, I think.

I wouldn't want them to be my only disks in my NAS, but they make a good write once, read many pool.

u/EchoGecko795 · 1 pointr/zfs

The Tyan S7012 are a good build. I make a few ZFS file servers out of them every year, but few notes.

-They originally came with series 5500 intel support only. To get 5600 support, you will have to upgrade the firmware. So I recommend you buy a pair of L5520 CPUs when you get the board, they are super cheap quad core CPUs that a pair sells for around $5 now.

-The South bridge gets hot, some boards come with high profile heat sinks, but not most of them. If your build is not in a high air flow case consider placing a 40mm fan on the heat sink. This will help with system stability.

-It comes with only 5 PCIe x8 open ended slots, (some are only x4) It is nice since you can place larger cards into the slots (all PCIe should be open ended) but be careful in slot one, the rear components may stop you from placing a large card in them.


>passmark of 26,104

No wonder you need water cooling, a 150W CPU needs that. Over all nice. Far more than I would spend, but I'm a bit on the cheap side.

I Purchased https://www.amazon.com/Asus-Hyper-M-2-x16-Card/dp/B0753JTJTG, for the heck of it. If it does not suit my needs I figured I can return or resell it and get most of my money back. My main issue is going to be installing it. Both my PCIe 3.0 x16 slots are full of video cards, and my only other option is the x4 slot that my NVMe card is currently sitting in. I am guessing a I may end up upgrading my board soon or I will most likely pull out my sound card and end up installing a PCI one instead. I have a few LGA 2011 boards, but I just spent a week building my current setup and I don't want to rip it apart. -_-

u/zfsbest · 2 pointsr/zfs

I did a bit of searching on your behalf and obviously I haven't tested it, (so please don't hold me responsible) but this looks like 99% the same thing as the Probox:

https://www.amazon.co.uk/RaidSonic-ICY-BOX-IB-3640SU3-drive/dp/B009DH5Q2S/ref=sr_1_35?ie=UTF8&qid=1504622540&sr=8-35&keywords=4+bay+esata

RaidSonic ICY BOX IB-3640SU3 - hard drive array
The Icybox External 4-bay JBOD enclosure for 4x 3.5” SATA l/ll/lll HDDs easy assembling by tray less design, HDD capacity unlimited, supports: Windows XP/Vista/Win7, MAC OS X. Plug & Play and Hot Swap. JBOD (Just a Bunch of Discs) JBOD, USB 3.0, eSATA

The reviews aren't too bad either from what I saw, so please let us know if you get one and it works well for you. :)

u/killmasta93 · 1 pointr/zfs

So after a few hours of benchmarking these are a few questions

  1. what is the rule of thumb when creating a slog? depending on the RAM or size of the disks or the zfs pool? If i have 4 disks of 4tb what is the size of the SSD i need to get?
  2. would this be sufficient? the 58gigs ssd https://www.amazon.com/Intel-Optane-800P-58GB-XPoint/dp/B078ZJSD6F
  3. Currently bench marking with FIO and yes i have pretty bad results but i was reading also depends what type physical disk im using 512 and on the vm storage volblocksize by default is 8k. not sure changing would help?

    cat /sys/block/sda/queue/hw_sector_size
    512

  4. currently I have arc max to 2gigs which i think might be too low but currently have 32 gigs with 26gigs using for the Vms (prob need to add more ram) but what is the rule of thumb for the ARC max
  5. Can setting compression off help?
  6. setting the atime value to off would it also help on the writes because the VMs are RAW inside of proxmox


    zfs set atime=off rpool

    Thank you
u/Fiberton · 2 pointsr/zfs

Best thing to do is to buy a new case. Either this https://www.amazon.com/SilverStone-Technology-Mini-Itx-Computer-DS380B-USA/dp/B07PCH47Z2/ref=sr_1_15?keywords=silverstone+hotswap&qid=1566943919&s=gateway&sr=8-15 Which a quite a lot of folks I know who are using mini iTX are using something like this. 8 hotswap 3.5 and 4 x 2.5 https://www.silverstonetek.com/product.php?pid=452 or if you want to use ALL your drives and a cheaper alternative https://www.amazon.com/dp/B0091IZ1ZG/ref=twister_B079C7QGNY?_encoding=UTF8&th=1 You can fit 15 x 3.5 in that. or get some 2x2.5 to 1x3.5 to shove some SSDs in there too. https://www.amazon.com/Inateck-Internal-Mounting-Included-ST1002S/dp/B01FD8YJB4/ref=sr_1_11?keywords=2.5+x+3.5&qid=1566944571&s=electronics&sr=1-11 There are various companies I looked quickly on Amazon. That way you can have 12 drives rather than just 6. The cheap sata cards will fix you up or shove this in there https://www.amazon.com/Crest-Non-RAID-Controller-Supports-FreeNAS/dp/B07NFRXQHC/ref=sr_1_1?keywords=I%2FO+Crest+8+Port+SATA+III+Non-RAID+PCI-e+x4+Controller+Card+Supports+FreeNAS+and+ZFS+RAID&qid=1566944762&s=electronics&sr=1-1 . Hope this helps :)

u/mercenary_sysadmin · 1 pointr/zfs

Can you link me to a good example? Preferably one suited for a homelab, ie not ridicu-enterprise-priced to the max? This is something I'd like to play with.

edit: is something like this a good example? How is the initial configuration done - BIOS-style interface accessed at POST, or is a proprietary application needed in the OS itself to configure it, or...?

u/hab136 · 1 pointr/zfs

Current: (6-1) x 4 TB = 20 TB

New:
(3-1) x 6 TB = 12 TB
(3-1) x 4 TB = 8 TB
20 TB total

You don't gain any space by doing this, though you do prepare for the future.

Are you able to add more drives to your system, perhaps externally? I've personally used these Mediasonic 4-bay enclosures along with an eSATA controller (though the enclosures also support USB3). Get some black electrical tape though, because the blue lights on the enclosure are brighter than the sun. The only downside with port-splitter enclosures is that if one drive fails and knocks out the SATA bus, the other 3 drives will drop offline too. The infamous 3 TB Seagates did that, but I had other drives (both 3 TB WD and 2 TB Seagates) fail without interfering with the other drives. Nothing was permanently damaged; just had to remove the failed drive before the other 3 started working again. Also, the enclosure is not hot-swap; you have to power down to replace drives. But hey, it's $99 for 4 drive bays.

6 TB Red drives are $200 right now ($33/TB); 8 TB are $250 ($31/TB), and 10 TB are $279 ($28/TB).

Instead of spending $600 (three 6 TB drives) and getting nothing, spend $672 ($558 for two 10 TB drives, $100 for enclosure, $30 for controller, $4 for black electrical tape) and get +10 TB by adding a pair of 10 TB drives in a mirror in an enclosure, and have another 2 bays free for future expansion.

(6-1) x 4 TB = 20 TB
(2-1) x 10 TB = 10 TB
30 TB total, $668 for +10 TB

Later buy another two 10 TB drives and put them in the two empty slots:

(6-1) x 4 TB = 20 TB
(2-1) x 10 TB = 10 TB
(2-1) x 10 TB = 10 TB
40 TB total, $558 for +10 TB

Then in the future you only have to upgrade two drives at a time, and you can replace your smallest drives with the now-replaced drives.

You can repeat this with a second enclosure, of course. :)

Don't forget that some of your drives will fail outside of warranty, which can speed your replacement plans. If a 4 TB drive fails, go ahead and replace it with a 10 TB drive. You won't see any immediate effect, but you'll turn that 20 TB RAIDz1 into 50 TB that much quicker.

Oh, and make sure you've set your recordsize to save some space! For datasets where you're mainly storing large video files, set your recordsize to 1 MB: "zfs set recordsize=1M poolname/datasetname". This only takes effect on new writes, so you'd have to re-write your existing files to see any difference. You can rewrite files with "cp -a filename tmpfile; mv tmpfile filename" for all files, or a much easier way is just create a new dataset with the proper recordsize, move all files over, then delete the old dataset and rename the new dataset.

See this spreadsheet. With 6 disks in RAIDz1 and the default 128K record size (16 sectors on the chart) you're losing 20% to parity. With 1M record size (256 sectors on the chart) you're losing only 17% to parity. 3% for free!

https://www.reddit.com/r/zfs/comments/9pawl7/zfs_space_efficiency_and_performance_comparison/
https://www.reddit.com/r/zfs/comments/b931o0/zfs_recordsize_faq/

u/zeblods · 1 pointr/zfs

You know if there's a list of supported chipset on the ROCKPro64 PCI-e?

With something like this I could connect up to 16 SATA discs and run ZFS with it.

I currently have an old server: Tyan motherboard, intel core2duo, 8GB DDR2 ram, PCI-X (not express, the old PCI-X format), 10 3TB hard drives in RAID-Z2... It's massive, heat a lot for not so much perfs, and I'd like to replace it with a lighter config like a ROCKPro64...

Performance-wise, I only need gigabit speed, about 100MB/s top.

u/kohenkatz · 1 pointr/zfs

The card I got is "LSI LSI00244 (9201-16i) PCI-Express 2.0 x8 SATA / SAS Host Bus Adapter Card". I got it from NewEgg.
The backplane and the card both use standard (multi-lane SAS) SFF-8087 connectors. The backplane requires right-angle connectors due to the tight fit, so I got these cables: https://www.amazon.com/Internal-SFF-8087-Right-Angle-Cable/dp/B0769FWJJP/

Note that the backplane uses port multipliers to allow 12 drives to be hooked up to 8 SAS/SATA lanes using two SFF-8087 cables (each cable is four lanes). The only issue with this is that the order your system lists the drives in may not be the same as the physical order of the slots, but I have not found that to be an issue so far. It just means you should record the serial numbers of which drive is in which slot so that you can still do hot-swap replacements. It's really just a minor inconvenience.

Also, I have, and highly recommend getting, the extra two 2.5" rear drive bays. ZFS allows you to install SSDs to operate as cache drives, and those two slots are a great place for them. If using the Dell built-in connections, the rear bays hook up through the front backplane's port multiplier, but I ran a third SFF-8087 cable directly from the LSI card because there's no reason to waste the two extra ports on this four-port card.

My data drives are 10x 8TB Seagate Enterprise ST8000NM0055 in raidz6 configuration. I chose them from the BackBlaze quarterly Hard Drive Reliability data. I highly recommend looking there for the most up to date data on drives they use and failure rates. I have 2x 512GB Samsung 860 PRO SSDs in the rear slots, configured as L2ARC cache for ZFS. The OS (I'm using Ubuntu 18.04) is on 2x 1TB Western Digital Black in a raidz1, chosen just because they were cheap.

I am pretty happy with this setup, but there are two changes I plan to see if I can make. First, I am going to see if I can move the OS raidz1 onto two more SSDs, likely M.2 SSDs on PCIe adapters. Second, if I do that, is to put in two more 8TB drives and add them as hot spares so ZFS will automatically start rebuilding onto them if other drives fail.

u/mayhempk1 · 1 pointr/zfs

Hi,

I had a power outage yesterday, and my UPS shut down at 75% battery remaining with no warning (I don't think I was writing to my ZFS array). I have a ZFS RAID 0 array with 3 WD Red 8TBs (yes, I understand it is 100% temporary storage and it WILL fail, two of the disks have 6k power on hours and the other has like 600, but I expect it to be good for at least a bit longer and not have a triple failure) with this device: https://www.amazon.com/Mediasonic-ProBox-HF2-SU3S2-SATA-Enclosure/dp/B003X26VV4/ and it failed. I destroyed the zpool and created it from scratch, copied all data over, and there was STILL a ton of checksum errors (not sure if I did zpool clear at any point, or if I needed to?)... does that mean all my drives are bad, or my enclosure is bad, or is it possible it was just a temporary issue? I turned that box off, rebooted my computer, turned it back on and created my zpool.

I am going to replace my UPS because I didn't trust it before and now I definitely don't, but I'm hoping my disks are okay? I don't have ANY SMART issues, offline_uncorrectable, reallocated_event_count, current_pending_sector, reallocated_sector_ct, etc are all fine and all have a raw_value of 0.

I would certainly like to hear your thoughts as always.

u/ChrisOfAllTrades · 1 pointr/zfs

No, the SAS back panel will also have the single SFF-8087 port - it will look the same as on the Dell H200.

You just need a regular cable like this:

https://www.amazon.com/Cable-Matters-Internal-Mini-Mini-SAS/dp/B011W2F626/