interpret: properly check for inhabitedness of nested references#156977
interpret: properly check for inhabitedness of nested references#156977RalfJung wants to merge 1 commit into
Conversation
| ty::Coroutine(..) => { | ||
| true // FIXME should these really be trivially inhabited? | ||
| } | ||
| ty::CoroutineClosure(..) => { | ||
| true // FIXME should these really be trivially inhabited? | ||
| } |
There was a problem hiding this comment.
I wasn't sure how to recurse into these. Are coroutines always inhabited via a trivial start state, or can they be uninhabited due to capturing ! as an "upvar"? How do coroutine closures work?
There was a problem hiding this comment.
Looks like yes they can be uninhabited
Coroutine(DefId(0:13 ~ diverges[9ce3]::async_let::{closure#0}), [(), std::future::ResumeTy, (), !, (Void,)]) is ABI-uninhabited but not opsem-uninhabited?
There was a problem hiding this comment.
I now made them all check the upvar_tys, but I am not entirely sure if that is enough.
There was a problem hiding this comment.
For coroutines, checking upvars is enough. The coroutine state is pretty much struct { upvars, enum { Unresumed, Returned, Panicked, Suspend0 { .. }, Suspent1 { .. }, .. } }
Note that #135527 proposes to change this to enum { Unresumed { upvars }, Returned, Panicked, Suspend0 { .. }, Suspent1 { .. }, .. }. Starting from that PR, the enum state will be inhabited in all cases.
Coroutine closures work exactly like closures.
| } | ||
| ty::UnsafeBinder(..) => { | ||
| true // FIXME should these really be trivially inhabited? | ||
| } |
There was a problem hiding this comment.
I have no idea what the UnsafeBinder implementation status is but their opsem has not even been discussed yet AFAIK.
There was a problem hiding this comment.
in limbo, but so far I think it was "except for lifetimes, they behave the same as their bound value", and this is how we check them in validation
There was a problem hiding this comment.
How do I get the bound value out?
this is how we check them in validation
I just see
ty::UnsafeBinder(_) => todo!("FIXME(unsafe_binder)"),There was a problem hiding this comment.
Thanks, I added the recursion for that.
| len.try_to_target_usize(tcx).unwrap() == 0 | ||
| || is_opsem_inhabited_recursor(elem, tcx, root, adt_handler) | ||
| } | ||
| ty::Pat(inner, _pat) => is_opsem_inhabited_recursor(inner, tcx, root, adt_handler), |
There was a problem hiding this comment.
I guess in theory the pattern could make a type uninhabited... so technically if we ever want to use that for the opsem we have to add it has a check here before pattern types get stabilized.
There was a problem hiding this comment.
none of the current patterns can, but yes, that may be possible in the future
| /// | ||
| /// When we git an ADT, we call `adt_handler`, giving it as its last argument a closure that it | ||
| /// can invoke to continue the recursion. | ||
| fn is_opsem_inhabited_recursor<'tcx>( |
There was a problem hiding this comment.
Do we need to do something to bound this recursion? We already stop when encountering the same ADT again, so recursion is bounded by the depth of ADT field types until it comes back to the original type.
Do we need to do some stack growing magic?
There was a problem hiding this comment.
it's possible this can't be hit due to other stack growth protections, but theoretically you can create a very nested tuple, or just &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&T with a lot of repetitions of the reference. I'd honestly just wait for an issue
This comment has been minimized.
This comment has been minimized.
4e90bd5 to
e6d1440
Compare
This comment has been minimized.
This comment has been minimized.
e6d1440 to
33dc892
Compare
This comment has been minimized.
This comment has been minimized.
33dc892 to
846fa4f
Compare
This comment has been minimized.
This comment has been minimized.
846fa4f to
8d272fd
Compare
This comment has been minimized.
This comment has been minimized.
8d272fd to
06e6db9
Compare
This comment has been minimized.
This comment has been minimized.
06e6db9 to
f328265
Compare
| /// | ||
| /// When we git an ADT, we call `adt_handler`, giving it as its last argument a closure that it | ||
| /// can invoke to continue the recursion. | ||
| fn is_opsem_inhabited_recursor<'tcx>( |
There was a problem hiding this comment.
it's possible this can't be hit due to other stack growth protections, but theoretically you can create a very nested tuple, or just &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&T with a lot of repetitions of the reference. I'd honestly just wait for an issue
| | ty::FnPtr(..) | ||
| | ty::FnDef(..) => true, | ||
| ty::Dynamic(..) => true, // We can't reason about traits, assume they are inhabited | ||
| ty::Slice(..) => true, // Slices can always be empty |
There was a problem hiding this comment.
so slices of uninhabited types must have zero length? Is this properly checked?
There was a problem hiding this comment.
so slices of uninhabited types must have zero length?
Yes.
Is this properly checked?
What do you mean by this? I can add a test that transmutes &[()] as &[()] into &[!] if that's what you mean.
There was a problem hiding this comment.
I guess I'm just confused about which things this open check treats as inhabited while it can't actually guarantee they are.
Feels like it should be a three-state enum, but that doesn't bring any value to the call sites
There was a problem hiding this comment.
This is meant to be an upper bound to anything we could ever consider uninhabited on the ABI/Layout level. Eventually it will have to be stably documented in the Reference.
| len.try_to_target_usize(tcx).unwrap() == 0 | ||
| || is_opsem_inhabited_recursor(elem, tcx, root, adt_handler) | ||
| } | ||
| ty::Pat(inner, _pat) => is_opsem_inhabited_recursor(inner, tcx, root, adt_handler), |
There was a problem hiding this comment.
none of the current patterns can, but yes, that may be possible in the future
| } | ||
| ty::UnsafeBinder(..) => { | ||
| true // FIXME should these really be trivially inhabited? | ||
| } |
There was a problem hiding this comment.
in limbo, but so far I think it was "except for lifetimes, they behave the same as their bound value", and this is how we check them in validation
f328265 to
bffa3cd
Compare
| // Bailing out here is unfortuante as it means that the recursion limit affects the | ||
| // operational semantics... but what else could we do? | ||
| return true; | ||
| } |
There was a problem hiding this comment.
I pushed a proper recursion check including a recursion limit check.
But... I am not sure it's sound. This means that depending on the recursion limit, different crates might consider the same type inhabited or not.
OTOH if we just hard-code e.g. 256 here, then with a sufficiently high query recursion limit one can write a 257-level-nested type (Wrap<Wrap<...<Wrap<!>>...>>) that layout will still consider uninhabited but this check will give up.
There was a problem hiding this comment.
Why exactly is this check soundness-critical? I don't quite follow...
As far as I can tell, it seems like this is just a courtesy best-effort check for UB in consteval, and a sanity check after computing layout?
There was a problem hiding this comment.
This is also used by Miri so it is supposed to define what is and isn't UB.
Also our query system relies on cross-crate-consistent results, doesn't it?
There was a problem hiding this comment.
@camsteffen proposed to do something more like inhabited_predicate_adt. I must admit that I do not understand how that query works -- I can see that it produces a predicate and ships it to the trait solver, but how is that helpful for recursive types?
There was a problem hiding this comment.
Sorry, that was kind of a red herring.
More to the point is check_representability. That query uses query cycle detection to detect an infinite sized type and abort compilation.
So I think we could have a check_opsem_inhabited query which also has query cycle detection, but instead of aborting, return OpsemInhabited::False or something like that for the query result.
check_representability uses params_in_repr, so check_opsem_inhabited could use something similar like params_in_repr_or_ref to include references.
There was a problem hiding this comment.
Another idea, that lcnr proposed, is that we can error on overflow. I.e. when computing inhabitedness of a type, if we reach the recursion limit, emit a hard error.
Since an error prevents the code from being compiled, there is no problem with linking two different crates with different recursion limits -- they either both get the same result, or one of them doesn't compile.
We can still support (assume inhabited) types which refer to themselves directly like
struct A(&'static A);And error only if the same type is mentioned with different generic parameters and overflows:
struct B<T: 'static>(&'static B<(T, T)>);This is technically a breaking change, but it seems fine not to support types which infinitely grow like this (or am I missing something?).
There was a problem hiding this comment.
But this seems fine, considering how much of an edge case this is?
Given,
struct Wrap<T>(T);I believe that the algorithm you proposed would say that &Wrap<Wrap<!>> is inhabited, which doesn't seem like a particularly strange edge case. @WaffleLapkin
There was a problem hiding this comment.
This is technically a breaking change, but it seems fine not to support types which infinitely grow like this (or am I missing something?).
Given that #138599 fixed multiple issues, it seems that there exist some use cases where people want such strange recursive types. We can't know how much that is before we run crater though.
There was a problem hiding this comment.
I believe that the algorithm you proposed would say that
&Wrap<Wrap<!>>is inhabited, which doesn't seem like a particularly strange edge case. @WaffleLapkin
I personally think that it's "fine", especially given that this improves over the status quo. But I suppose we could also allow recursing through the same definition a set number of times (i.e. setting a recursion limit higher than 1)...
There was a problem hiding this comment.
@WaffleLapkin I like that first version, and I think I can combine it with an idea I had to avoid the problem @theemathas mentioned. This avoids any dependency on the recursion limit while also still satisfying the desired implication "layout inhabited => opsem inhabited". I pushed that now.
This comment has been minimized.
This comment has been minimized.
bffa3cd to
2552ff6
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
interpret: properly check for inhabitedness of nested references
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (68787dc): comparison URL. Overall result: ❌ regressions - no action neededBenchmarking means the PR may be perf-sensitive. Consider adding rollup=never if this change is not fit for rolling up. @rustbot label: -S-waiting-on-perf -perf-regression Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -3.9%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (secondary -4.6%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis perf run didn't have relevant results for this metric. Bootstrap: 511.673s -> 534.284s (4.42%) |
|
Hm, somehow cranelift-assembler-x64 is still blowing the stack. Is my recursion check broken or does that crate have a ridiculously nested but not recursive type? |
b60b5f7 to
6a1ac87
Compare
|
Do we need to crater it now that we are introducing a new error? |
6a1ac87 to
5f63b50
Compare
|
Yeah once this works it needs cratering. But previously cranelift failed to build somehow. I did a small change to the logic which might fix that if cranelift hit a really weird edge case... if not I think I'll have to sprinkle |
This comment has been minimized.
This comment has been minimized.
|
Well, now docker.io is down... |
| is_opsem_inhabited_recursor(self, tcx, &mut (), /* stop_at_ref */ false, &|ty, _, _| { | ||
| tcx.is_opsem_inhabited_raw(typing_env.as_query_input(ty)) | ||
| }) |
There was a problem hiding this comment.
Does the fact that this closure discards the seen argument mean that we keep creating a new HashSet when we shouldn't?
(Sorry, but I'm very confused by the recursive code 😅)
There was a problem hiding this comment.
Yeah the recursion is quite gnarly. Happy to hear suggestions for improving it. This is where Rust isn't quite as much a functional language as I'd like it to be. ;)
We're only calling HashSet::default() once so I don't see how we could possibly create new HashSet where we don't want that.
There was a problem hiding this comment.
Note that this here is the non-recursive case. That why the ADT case is a fallback:
- when invoked from
Ty::is_opsem_inhabited, when we encounter an ADT, we invoke the query (as then we do want the cache) - when invoked in the query, when we encounter an ADT, we want to recurse
|
@bors try |
interpret: properly check for inhabitedness of nested references
This comment has been minimized.
This comment has been minimized.
5f63b50 to
72bf43a
Compare
72bf43a to
d1a8249
Compare
|
Okay, this should be ready for review now while the crater run is ongoing. That said, technically all code this breaks already had UB. |
|
👌 Experiment ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more |
|
@rustbot ready |
View all comments
This implements the opsem from the ongoing FCP in rust-lang/unsafe-code-guidelines#413. The bit we were previously missing is that transmuting a
&&!into existence was not caught as being immediate UB -- only the&!case behaved as expected.I did not adjust the layout computation because when we compute the layout of
&T, we cannot know the layout ofT(as that might be recursive).r? @oli-obk