-
Notifications
You must be signed in to change notification settings - Fork 401
Explicitly spec out native origins #621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is trending in the right direction, thanks!
Addressed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the feedback! LGTM!
Squashed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really happy with the direction of these changes. It adds a lot of clarity! Thanks!
index.bs
Outdated
@@ -736,6 +736,16 @@ Each {{XRSpace}} has a <dfn for="XRSpace">native origin</dfn> that is tracked by | |||
|
|||
The [=effective origin=] of an {{XRSpace}} can only be observed in the coordinate system of another {{XRSpace}} as an {{XRPose}}, returned by an {{XRFrame}}'s {{XRFrame/getPose()}} method. The spatial relationship between {{XRSpace}}s MAY change between {{XRFrame}}s. | |||
|
|||
Each {{XRSpace}} has an <dfn for="XRSpace">internal type</dfn>, used to internally distinguish the various kinds of spaces. This value can be <dfn for="XRSpace">viewer space</dfn>, <dfn for="XRSpace">input target ray</dfn>, <dfn for="XRSpace">input grip</dfn>, or one of the variants of {{XRReferenceSpaceType}}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
input target ray, input grip
I'm a bit worried about creating too many different types... and honestly I was sorta hoping we'd have only one input XRSpace type and treat the gripSpace as having an offset from the targetRaySpace. What's your thought on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure if this was always going to be the case (that there's a fixed offset between the ray space and the grip space). The language around this would become a bit indirect, though (having to say that the origin offset is such that the effective origin matches up with the different axes).
Overall I kinda like having different types for different spaces even if they can be related to each other. They become easier to talk about overall, which is kinda the point of introducing the internal type. If we go for this, should we keep the distinction between eye-level and floor-level?
index.bs
Outdated
|
||
The [=native origin=] of an {{XRSpace}} with [=XRSpace/internal type=] of [=XRSpace/input target ray=] is the position and orientation of the preferred pointing ray of its [=XRSpace/input source=], as defined by the {{targetRayMode}}. | ||
|
||
The [=native origin=] of an {{XRSpace}} with [=XRSpace/internal type=] of [=XRSpace/input grip=] is the position and orientation that should be used to render virtual objects corresponding to its [=XRSpace/input source=] such that they appear to be held in the user's hand. If the user were to hold an [=XRSpace/input source=] that is straight rod, this [=native origin=] has a position at the centroid of their curled fingers. The <code>-Z</code> axis of the orientation points along the length of the rod towards their thumb. The <code>X</code> axis is perpendicular to the back of the hand being described, with back of the users right hand pointing towards <code>+X</code> and the back of the user's left hand pointing towards <code>-X</code>. The <code>Y</code> axis is implied by the relationship between the <code>X</code> and <code>Z</code> axis, with <code>+Y</code> roughly pointing in the direction of the user's arm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we were to define this relative to a common "native origin" for an XRInputSource, most of this paragraph would be describing the originOffset
index.bs
Outdated
@@ -736,6 +736,16 @@ Each {{XRSpace}} has a <dfn for="XRSpace">native origin</dfn> that is tracked by | |||
|
|||
The [=effective origin=] of an {{XRSpace}} can only be observed in the coordinate system of another {{XRSpace}} as an {{XRPose}}, returned by an {{XRFrame}}'s {{XRFrame/getPose()}} method. The spatial relationship between {{XRSpace}}s MAY change between {{XRFrame}}s. | |||
|
|||
Each {{XRSpace}} has an <dfn for="XRSpace">internal type</dfn>, used to internally distinguish the various kinds of spaces. This value can be <dfn for="XRSpace">viewer space</dfn>, <dfn for="XRSpace">input target ray</dfn>, <dfn for="XRSpace">input grip</dfn>, or one of the variants of {{XRReferenceSpaceType}}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or one of the variants of {{XRReferenceSpaceType}}.
Are we sure we want to be mixing the types like this? Should we instead be having a single internal type of "reference space"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, but that just makes everything more verbose -- I wasn't very sure what to do here. I'm fine with doing this if you still think we should, though.
index.bs
Outdated
|
||
An {{XRReferenceSpace}} is most frequently obtained by calling {{XRSession/requestReferenceSpace()}}, which creates an instance of an {{XRReferenceSpace}} or an interface extending it, determined by the {{XRReferenceSpaceType}} enum value passed into the call. The type indicates the tracking behavior that the reference space will exhibit: | ||
|
||
- Passing a type of <dfn enum-value for="XRReferenceSpaceType">identity</dfn> creates an {{XRReferenceSpace}} instance. It represents a reference space where the [=viewer=] is always at the [=native origin=]. Only the {{XRSession/viewerSpace}} and {{XRInputSource/targetRaySpace}}'s of {{XRInputSource}}'s with an {{XRInputSource/targetRayMode}} of {{XRTargetRayMode/screen}} can be spatially related to an {{identity}} reference space. | ||
- Passing a type of <dfn enum-value for="XRReferenceSpaceType">identity</dfn> creates an {{XRReferenceSpace}} instance. Its [=native origin=] is the position and orientation of the [=viewer=]. Only the {{XRSession/viewerSpace}} and {{XRInputSource/targetRaySpace}}'s of {{XRInputSource}}'s with an {{XRInputSource/targetRayMode}} of {{XRTargetRayMode/screen}} can be spatially related to an {{identity}} reference space. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its [=native origin=] is the position and orientation of the [=viewer=].
Is... is this indicating that the viewerSpace shouldn't be an XRSpace but an XRReferenceSpace with the type "identity"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be, but we have stronger restrictions on identity spaces.
I'm mostly trying to add types without changing the intent of the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sorry I didn't mean you needed to change it in this PR. More that the clarity created via this PR is raising an issue that we might consider addressing separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll file something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed. |
index.bs
Outdated
|
||
An {{XRSpace}} of type [=XRSpace/input target ray=], [=XRSpace/input grip=] must have an associated {{XRInputSource}} <dfn for="XRSpace">input source</dfn>. | ||
|
||
The [=native origin=] of an {{XRSpace}} with [=XRSpace/internal type=] of [=XRSpace/viewer space=] is the position and orientation of the [=viewer=]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we perhaps use the term "physical position and orientation" or "physical pose" consistently throughout for these definitions, to emphasize when a WebXR concept is being defined in terms of physical reality, rather than the logical position and orientation of another WebXR construct within some space?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ambivalent about this, happy to add it if people agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't object to it, and I do feel like it helps add clarity.
index.bs
Outdated
|
||
An {{XRSpace}} of type [=XRSpace/input target ray=], [=XRSpace/input grip=] must have an associated {{XRInputSource}} <dfn for="XRSpace">input source</dfn>. | ||
|
||
The [=native origin=] of an {{XRSpace}} with [=XRSpace/internal type=] of [=XRSpace/viewer space=] is the position and orientation of the [=viewer=]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this definition in particular, we should perhaps be more specific and define the "viewer space" origin to explicitly coincide with the midpoint of XRView
origins that will provide optimal rendering for a given XR device.
If not, we may see some UAs place XRViewerPose.transform
between the view transforms and some UAs place it at another logical viewer position, such as the back of the neck, top of the head, etc., especially given the following language elsewhere:
NOTE: The {{XRViewerPose}}'s {{XRPose/transform}} can be used to position graphical representations of the [=viewer=] for spectator views of the scene or multi-user interaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about devices with more than one view? (This might be something that is better split out into a different issue, I'm trying to clarify the state of the spec as it currently exists without changing behavior)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
define the "viewer space" origin to explicitly coincide with the midpoint of XRView
I'm not sure it's correct to say the viewer space's origin is the midpoint of the views. I think in some native systems it's actually set as the centroid of the logical head, and if you have more than two views it could start to skew it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key question is what we expect apps to do with "viewer space", and if that usage can handle definitions that vary widely among UAs or underlying XR platforms (e.g. some placing it in the center of the head, others placing it at the base of the neck, others placing it between the eyes, etc.).
For example, one reason to expose "viewer space" is to allow apps to raycast from a view-relative position to render a cursor or target with the user's head or device.
Specifically, the input explainer mentions that the targetRaySpace
for an XRInputSource
with a targetRayMode
of 'gaze' will match the pose of viewerSpace
.
For apps designed only for gaze-based targeting that just use viewerSpace
directly, they will expect viewer space to match the target ray of gaze-targeted input sources as described - for headsets, pointing forward between the eyes (i.e. between the two XRView
s. However, if one UA starts its viewerSpace
ray between the eyes and another UA is free to start its viewerSpace
ray in the center of the user's head, that target ray will sometimes shoot out from the user's nose instead, producing the wrong app behavior. The app would instead be required to manually average the two XRView
rays it receives to produce its gaze targeting ray, which I don't believe is what we intended here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definition will be even more important to nail down across UAs if we proceed with #623's idea of a "viewer" reference space. (which I believe we should!)
I discussed this with Nell earlier today and we've both agreed that there's a couple of things that we'd like to iterate on in this PR prior to landing. As a result, we're going to bump this PR and the associated issue back by one milestone. That's not a big deal, and has no bearing on the priority of landing these changes, it's just us sticking to the milestone process that we'd agreed on. Thanks, @Manishearth, for continuing to work with us on this PR and related issues! It's been extremely valuable. |
And it also doesn't mean we're taking our foot off the gas in terms of working to get this landed asap! |
Oh, I understand, I'd actually assumed this was in the May milestone already because in the call we said that #581 was going to slip 😄 Thanks for clarifying, though! |
index.bs
Outdated
|
||
- Passing a type of <dfn enum-value for="XRReferenceSpaceType">position-disabled</dfn> creates an {{XRReferenceSpace}} instance. It represents a tracking space where orientation is tracked but the [=viewer=]s position is always at the [=native origin=]. | ||
- Passing a type of <dfn enum-value for="XRReferenceSpaceType">position-disabled</dfn> creates an {{XRReferenceSpace}} instance. Its [=native origin=] has a position that always remains equal to the position of the [=viewer=], and a constant orientation initialized based on the conventions of the underlying [=/XR device=]. It can be used to create experiences where the movement of the user is disregarded, as it will not report changes in position when used with {{XRFrame/getViewerPose()}}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we're settling in on clean physically-based reference space definitions (which is great, especially for ensuring the associativity of space relationships!), I wonder if "position-disabled" reference spaces will operate as we intend for their 360-degree video scenarios when on stereo headsets.
Under this definition, getViewerPose
would still return non-identity eye positions for each XRView
relative to this reference space, since the eyes are indeed in physically different places. That would cause the app to render each eye from some pose other than (0, 0, 0), leading to weird parallax artifacts relative to its finite-sized video sphere, which is the primary problem that "position-disabled" was trying to avoid. I'd really hate to see us backpedal on the physically-based definitions and introduce some special case to getViewerPose
for "position-disabled" spaces to magically zero out the XRView
poses, since we should otherwise be fully clean on reference spaces behaving identically, just with differently-behaving origins.
Alternatively, what if we just replaced the "position-disabled" reference space with a disablePosition
option on getViewerPose
? 360-video apps could then just use the "eye-level" reference space like other seated experiences, with their one difference being a parameter they pass to one function whose behavior they actually wish to impact. By adding the option to the affected method, all other use of reference spaces stays clean and easy to understand.
The simplest option here would just be to require that 360-video apps use a big enough video sphere (e.g. 1000m) that these parallax issues from neck modeling and stereo separation in "eye-level" spaces would fall away. If we do that, we could just cut the "position-disabled" space entirely. Perhaps in the interest of getting quickly to WebXR 1.0, that's the best answer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #625 to track this issue, which is broader than this particular change.
Apologies for not having additional feedback sooner, but I wanted to wrap my head around this in the context of some of the tangent conversations that are sprouting from it, like consolidating I feel the native origins portion of this PR is really solid. Getting that nailed down for every space the spec describes is an important step, and one that I want to see landed ASAP. Where I still have concerns (as does Nell, as she indicated to me in an earlier conversation) is the way that the XRSpace internal types are defined and presented. Especially as we talk about simplifying the available spaces, I wonder if we can actually remove some of the complexity here too. One of the things that has a "code smell" (spec smell?) in this regard is the way that the reference space types are lumped into the potential list of internal type values. That seems to be a conflation of concepts that may cause us some heartache down the road. Additionally, I'm not clear on the value of having the internal type simply repeat the "name" of the space outside of allowing some algorithms to explicitly branch off that name. If that IS the intended benefit, it seems we'd be better off making the internal type differentiate between spaces based on their behavior rather than their source. (ie: "reference space" vs. "anchor space".) And in fact, we might not even need them at all at the moment, if we consider the proposed space simplifications? The newly defined logic about how spaces relate to eachother via their native origin covers the basic behavior quite well as long as the native origin is well defined for each space. From there, if we assume that (For the privacy portion of that, I think we'd want to take a different direction entirely and state that certain types of spaces simply shouldn't be exposed by the system unless user consent has been given, a concept which pairs nicely with some incoming privacy docs from the Google team.) The only other part of this PR that seems to be under discussion is whether the So I guess the TL;DR: recommendation here is that we should push to land the native origin portions of this ASAP, and then address #625 and #624 prior to revisiting the internal type question, because they may render it unnecessary. |
Alright, in that case I'll get rid of the notion of internal type, and specify internal origins somewhere near the source of each type of space. |
Done. This PR is much smaller now 😄 This PR doesn't assume that #626 will land, however if it is going to land then we can remove the last two commits. I can also rebase over that PR if you'd like to land it first. Yeah, the notion of internal type is mostly useful for defining native origins (which we can do directly) and to do checks in getPose() (which I've explained we don't need if we gate on space creation) |
Looks like some commits you intended to push didn't make it into this PR? Latest commit is still from 2 days ago. |
Done, sorry, the branch was named differently. |
Yeah, that is a lot smaller. 😁 LGTM! Do Nell or Alex want to take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for doing this!
index.bs
Outdated
@@ -780,9 +780,9 @@ Each {{XRReferenceSpace}} has a <dfn for="XRReferenceSpace">type</dfn>, which is | |||
|
|||
An {{XRReferenceSpace}} is most frequently obtained by calling {{XRSession/requestReferenceSpace()}}, which creates an instance of an {{XRReferenceSpace}} or an interface extending it, determined by the {{XRReferenceSpaceType}} enum value passed into the call. The type indicates the tracking behavior that the reference space will exhibit: | |||
|
|||
- Passing a type of <dfn enum-value for="XRReferenceSpaceType">identity</dfn> creates an {{XRReferenceSpace}} instance. It represents a reference space where the [=viewer=] is always at the [=native origin=]. Only the {{XRSession/viewerSpace}} and {{XRInputSource/targetRaySpace}}'s of {{XRInputSource}}'s with an {{XRInputSource/targetRayMode}} of {{XRTargetRayMode/screen}} can be spatially related to an {{identity}} reference space. | |||
- Passing a type of <dfn enum-value for="XRReferenceSpaceType">identity</dfn> creates an {{XRReferenceSpace}} instance. It represents a tracking space with a [=native origin=] that tracks the position and orientation of the [=viewer=]. Only the {{XRSession/viewerSpace}} and {{XRInputSource/targetRaySpace}}'s of {{XRInputSource}}'s with an {{XRInputSource/targetRayMode}} of {{XRTargetRayMode/screen}} can be spatially related to an {{identity}} reference space. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're keeping this type, it might be useful to add something (but probably not exactly) like:
The [=native origin=] stays constant for {{XRSession}}s without any tracking data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks!
Fixes #597 and #581
Moved from https://github.com/immersive-web/editor-collab/pull/38, totally rewritten
r? @toji @NellWaliczek