Skip to content

Fix minor information leakage in gfss secret sharing#10284

Merged
andrewjstone merged 2 commits intooxidecomputer:mainfrom
trail-of-forks:fix_gfss_leading_zero
Apr 23, 2026
Merged

Fix minor information leakage in gfss secret sharing#10284
andrewjstone merged 2 commits intooxidecomputer:mainfrom
trail-of-forks:fix_gfss_leading_zero

Conversation

@tjade273
Copy link
Copy Markdown
Contributor

@tjade273 tjade273 commented Apr 17, 2026

The existing implementation leaks about 0.005 bits of information about the 8-bit secret.

In the Oxide use case, this is not a critical security issue as a single secret is not re-split multiple times. However future use cases might allow re-sharing of the same secret, which would reveal the secret to an attacker with only k-1 shares after about 2000 re-splits.

See for example
https://www.zkdocs.com/docs/zkdocs/protocol-primitives/shamir/
https://privy.io/blog/shamir-secret-sharing-deep-dive

The existing implementation leaks about 0.005 bits of information about the 8-bit secret.
@tjade273 tjade273 force-pushed the fix_gfss_leading_zero branch from e7edcb4 to 0cd2b64 Compare April 17, 2026 20:17
@andrewjstone andrewjstone self-requested a review April 20, 2026 14:43
@andrewjstone
Copy link
Copy Markdown
Contributor

Hi @tjade273. Thanks for the contribution. This is interesting. I had actually read the original report from cure53 for privy's implementation before writing this code, which is why I ended up making this change.

I read the linked articles and have spent a bit of time thinking about this this morning. You are correct that from a purely information theoretic perspective, ensuring the high order coefficient is non-zero does leak information. However, we are not operating in purely information theoretic model. An attacker doesn't have to guess if K-1 shares can recompute the secret, they can try directly in our model.

Specifically, the data being protected lives on U.2 drives in an Oxide rack. The key shares live on M.2 drives on the same rack. The M.2 drives require removal of the sleds from the rack for access, but are easily stolen. If we made the change suggested here then an attacker would only have to steal k-1 M.2 drives in the event that all polynomials of the secret had a high-order coefficient of 0. This practically weakens our security for a 32 byte secret, although again, not by much as the chance of randomly generating all zeroes for that leading coefficient is (1/256)^32.

Importantly, the attacker doesn't have to guess that the leading coefficients are all zero. If for some reason they only had time to steal k-1 M.2s rather than k M.2s (again unlikely), they could attempt with our open source code to recompute the shared secret, derive keys and see if they decrypt the zfs dataset on the stolen U.2 drive. They have a built in Oracle here.

Without this change the attacker always has to steal at least k M.2 drives, which is currently a security guarantee we make to our customers. They of course know that the high-order coefficients of the 32 polynomials are non-zero, but we aren't really trying to protect against brute-force attacks here. There's no online mechanism to perform an attack and so we are only dealing with theft of M.2s and U.2s right now.

I realize that differentiating the threat from stealing k vs k-1 M.2 drives for an attacker with physical access sounds kind of silly. However, it does provide us with an explicit numerical guarantee that we can make about our security.

For these reasons, I'm heavily leaning against making this change. I am open to further discussion though, and really appreciate you taking a look.

One big question I had for you is: Why you would ever want to split the same secret multiple times?

@tjade273
Copy link
Copy Markdown
Contributor Author

Hey @andrewjstone - thanks for taking the time to think about this! For what it's worth, this is a bit of a nerd snipe and for the application to one-time sharing of U.2 keys the issue causes no security harm. If you don't want to read the following blob of text, nothing bad will happen so long as you do not at some point allow for re-sharing keys.

One big question I had for you is: Why you would ever want to split the same secret multiple times?

We see this occasionally in systems which allow for adding participants or changing thresholds. For example, you have a 3-of-5 secret and want add a new server to the rack without re-encrypting all the data. So you collect all the shares, recover the secret, then re-split the secret with a fresh polynomial. Or if a server dies and you want to reshare the secret (I know you don't do this, and instead recover the original share - other systems do it differently).

If an adversary is able to get k-1 shares each time, it learns a little more data about the underlying value of the secret for each resharing.


Now I'll address your objections to allowing zeros as leading coefficients.

I read the linked articles and have spent a bit of time thinking about this this morning. You are correct that from a purely information theoretic perspective, ensuring the high order coefficient is non-zero does leak information. However, we are not operating in purely information theoretic model. An attacker doesn't have to guess if K-1 shares can recompute the secret, they can try directly in our model.

There are two notions of "secret" that could be conflated here. There is the 1-byte secret which is the constant of a single SSS polynomial. Then there is the 32-byte secret that is used to derive the disk encryption keys.

While an attacker can directly check if a 32-byte key is correct, they cannot easily determine if a single byte secret is correct - if they could, the system would be broken since the time to brute-force the encryption key would be ~32*2^8 rather than 2^(32*8).

On the other hand, the lack of zero coefficients leaks a data about the value of each individual byte. This is a much larger advantage than being able to check a whole 32-byte key at a time.

Specifically, the data being protected lives on U.2 drives in an Oxide rack. The key shares live on M.2 drives on the same rack. The M.2 drives require removal of the sleds from the rack for access, but are easily stolen. If we made the change suggested here then an attacker would only have to steal k-1 M.2 drives in the event that all polynomials of the secret had a high-order coefficient of 0. This practically weakens our security for a 32 byte secret, although again, not by much as the chance of randomly generating all zeroes for that leading coefficient is (1/256)^32.

Importantly, the attacker doesn't have to guess that the leading coefficients are all zero. If for some reason they only had time to steal k-1 M.2s rather than k M.2s (again unlikely), they could attempt with our open source code to recompute the shared secret, derive keys and see if they decrypt the zfs dataset on the stolen U.2 drive. They have a built in Oracle here.

By the same reasoning, you should ban the value 1 as a leading coefficient. If the secret has a high order coefficient of 1, then k-1 shares suffice to recover the secret - just modify each of the known shares by (x, y) -> (x, y - x^{k-1}) and interpolate the result.

The attacker need not guess that the leading coefficients are all 1, they can assume that they are and check if the resulting key opens the drive. The same holds for any other fixed value c - just subtract off c * x^{k-1} and now all your points must lie on a degree k-2 polynomial.


The point here is that any single choice of value for the leading coefficient is clearly insecure, but it's the distribution of values that matters. Similarly, why do you allow the possibility that the overall secret is 0xAAAAAAA...? If that happens to be chosen, the attacker can decrypt the drive, while holding none of the shares!

Security is a distributional property, it doesn't make sense to claim that any particular value is insecure, only that a method for selecting a value is insecure.

In practice, it often is a good idea to ban zero values - there are roughly three reasons for this:

  1. The value is chosen by a potentially malicious counterparty. A good example is rejecting the point at infinity as the other party's public key in an ECDH operation.
  2. A zero value is likely to be indicative of an error, since it should only appear with negligible probability when things are working, but may appear with substantial probability when your RNG is broken.
  3. Zero values break the correctness of future operations, and it's nicer for analysis to avoid partial functions / imperfect correctness. E.g. it's nice to require ECDSA public keys to be nonzero rather than to come up with a wire format for the point at infinity.

None of these hold for the SSS case with GF_256 - in particular, number 2 does not hold because the probability of a zero leading coefficient appearing in correct operation is 1/256, which is not negligible. If the field was a 256-bit prime field, on the other hand, rejecting zero values might be a reasonable choice.


Here's an informal analysis which gets at the distributional way of thinking about security. First you set up a game, then you decide a desired probability for how often the attacker wins.

Let "Method A" be the technique which allows 0s, and "Method B" be the technique which rejects 0s.

The overall security of this system is best phrased as a game: the honest dealer generates a random secret x and (WLOG) two shares of the 32-bit secret. The adversary receives one share, then can use an oracle to make up to n attempts at guessing the secret, where n is some number << 2^256. The scheme is secure if the probability that the attacker successfully guesses the secret is ~ n/2^256.

In the one-shot game, both methods are secure. Under Method A, it can be proven that the attacker's best strategy is to simply guess the secret or, equivalently, to guess the other share. Since for every true value of the secret x there is exactly one hidden share that would, in combination with the attacker's share, produce x, the two strategies are equivalent. In your example, the attacker guesses that the counterparty's share is all zeroes - this guess wins with probability 1/2^256, as we would like.

Under Method B, there is a slightly better strategy for the attacker - for each byte first hypothesize that the leading coefficients on that SSS polynomial was zero, then guess any other byte. This wins with probability ~ 1/(255^32) ~ 1/2^256. The loss is negligible, though positive.

However, in the repeated game, the attacker against Method B has a much stronger approach; for each resharing, note the value that was eliminated. After ~2000 re-sharings the attacker will have likely eliminated all values for each byte (https://en.wikipedia.org/wiki/Coupon_collector%27s_problem) and can guess the full secret. No such technique exists against method A - the attacker will have won with probability ~2000/2^256, which is tiny.


Thanks for reading :)

@tjade273
Copy link
Copy Markdown
Contributor Author

Oh! @andrewjstone I just realized that I linked you the old Privy blog and not the one where they realized their mistake...

https://privy.io/blog/zero-leading-coefficients-cryptography

@andrewjstone
Copy link
Copy Markdown
Contributor

Thanks for the details @tjade273. You've convinced me. Especially since one of my other colleagues here suggested breaking this out into a separate crate that others can use arbitrarily.

I think I was indeed conflating the per byte polynomial and the full secret of 32 bytes. I somewhat realized this as I was writing when I did the probability calculation that you would need all 32 bytes to be 0 (1 / 2^256 chance) for the attack to actually work. Sure they could just try to combine k-1 shares and see if it works, but that has the same overall probability of succeeding as just guessing the value of the last share.

I really appreciate you taking the time to lay this out. I feel like I should have gotten here sooner, but this is indeed tricky. The O(n log n) from the coupon problem also shows how you arrived at the ~2000 splits number.

One other thing, CI is failing due to an extra newline. Can you cargo format your change so I can merge it in. I could also do this for you, but no need to squash and include my name for the contribution on GitHub. Thanks!

@andrewjstone andrewjstone enabled auto-merge (squash) April 23, 2026 19:45
@andrewjstone andrewjstone merged commit 6d341d1 into oxidecomputer:main Apr 23, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants