Skip to content

Explicitly spec out native origins #621

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 7, 2019

Conversation

Manishearth
Copy link
Contributor

@Manishearth Manishearth commented May 1, 2019

@cwilso cwilso requested review from NellWaliczek and toji May 1, 2019 16:07
@cwilso cwilso added this to the April 2019 milestone May 1, 2019
Copy link
Member

@toji toji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is trending in the right direction, thanks!

@Manishearth
Copy link
Contributor Author

Addressed

Copy link
Member

@toji toji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the feedback! LGTM!

@Manishearth
Copy link
Contributor Author

Squashed.

Copy link
Member

@NellWaliczek NellWaliczek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really happy with the direction of these changes. It adds a lot of clarity! Thanks!

index.bs Outdated
@@ -736,6 +736,16 @@ Each {{XRSpace}} has a <dfn for="XRSpace">native origin</dfn> that is tracked by

The [=effective origin=] of an {{XRSpace}} can only be observed in the coordinate system of another {{XRSpace}} as an {{XRPose}}, returned by an {{XRFrame}}'s {{XRFrame/getPose()}} method. The spatial relationship between {{XRSpace}}s MAY change between {{XRFrame}}s.

Each {{XRSpace}} has an <dfn for="XRSpace">internal type</dfn>, used to internally distinguish the various kinds of spaces. This value can be <dfn for="XRSpace">viewer space</dfn>, <dfn for="XRSpace">input target ray</dfn>, <dfn for="XRSpace">input grip</dfn>, or one of the variants of {{XRReferenceSpaceType}}.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input target ray, input grip

I'm a bit worried about creating too many different types... and honestly I was sorta hoping we'd have only one input XRSpace type and treat the gripSpace as having an offset from the targetRaySpace. What's your thought on that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if this was always going to be the case (that there's a fixed offset between the ray space and the grip space). The language around this would become a bit indirect, though (having to say that the origin offset is such that the effective origin matches up with the different axes).

Overall I kinda like having different types for different spaces even if they can be related to each other. They become easier to talk about overall, which is kinda the point of introducing the internal type. If we go for this, should we keep the distinction between eye-level and floor-level?

index.bs Outdated

The [=native origin=] of an {{XRSpace}} with [=XRSpace/internal type=] of [=XRSpace/input target ray=] is the position and orientation of the preferred pointing ray of its [=XRSpace/input source=], as defined by the {{targetRayMode}}.

The [=native origin=] of an {{XRSpace}} with [=XRSpace/internal type=] of [=XRSpace/input grip=] is the position and orientation that should be used to render virtual objects corresponding to its [=XRSpace/input source=] such that they appear to be held in the user's hand. If the user were to hold an [=XRSpace/input source=] that is straight rod, this [=native origin=] has a position at the centroid of their curled fingers. The <code>-Z</code> axis of the orientation points along the length of the rod towards their thumb. The <code>X</code> axis is perpendicular to the back of the hand being described, with back of the users right hand pointing towards <code>+X</code> and the back of the user's left hand pointing towards <code>-X</code>. The <code>Y</code> axis is implied by the relationship between the <code>X</code> and <code>Z</code> axis, with <code>+Y</code> roughly pointing in the direction of the user's arm.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were to define this relative to a common "native origin" for an XRInputSource, most of this paragraph would be describing the originOffset

index.bs Outdated
@@ -736,6 +736,16 @@ Each {{XRSpace}} has a <dfn for="XRSpace">native origin</dfn> that is tracked by

The [=effective origin=] of an {{XRSpace}} can only be observed in the coordinate system of another {{XRSpace}} as an {{XRPose}}, returned by an {{XRFrame}}'s {{XRFrame/getPose()}} method. The spatial relationship between {{XRSpace}}s MAY change between {{XRFrame}}s.

Each {{XRSpace}} has an <dfn for="XRSpace">internal type</dfn>, used to internally distinguish the various kinds of spaces. This value can be <dfn for="XRSpace">viewer space</dfn>, <dfn for="XRSpace">input target ray</dfn>, <dfn for="XRSpace">input grip</dfn>, or one of the variants of {{XRReferenceSpaceType}}.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or one of the variants of {{XRReferenceSpaceType}}.

Are we sure we want to be mixing the types like this? Should we instead be having a single internal type of "reference space"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but that just makes everything more verbose -- I wasn't very sure what to do here. I'm fine with doing this if you still think we should, though.

index.bs Outdated

An {{XRReferenceSpace}} is most frequently obtained by calling {{XRSession/requestReferenceSpace()}}, which creates an instance of an {{XRReferenceSpace}} or an interface extending it, determined by the {{XRReferenceSpaceType}} enum value passed into the call. The type indicates the tracking behavior that the reference space will exhibit:

- Passing a type of <dfn enum-value for="XRReferenceSpaceType">identity</dfn> creates an {{XRReferenceSpace}} instance. It represents a reference space where the [=viewer=] is always at the [=native origin=]. Only the {{XRSession/viewerSpace}} and {{XRInputSource/targetRaySpace}}'s of {{XRInputSource}}'s with an {{XRInputSource/targetRayMode}} of {{XRTargetRayMode/screen}} can be spatially related to an {{identity}} reference space.
- Passing a type of <dfn enum-value for="XRReferenceSpaceType">identity</dfn> creates an {{XRReferenceSpace}} instance. Its [=native origin=] is the position and orientation of the [=viewer=]. Only the {{XRSession/viewerSpace}} and {{XRInputSource/targetRaySpace}}'s of {{XRInputSource}}'s with an {{XRInputSource/targetRayMode}} of {{XRTargetRayMode/screen}} can be spatially related to an {{identity}} reference space.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its [=native origin=] is the position and orientation of the [=viewer=].

Is... is this indicating that the viewerSpace shouldn't be an XRSpace but an XRReferenceSpace with the type "identity"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be, but we have stronger restrictions on identity spaces.

I'm mostly trying to add types without changing the intent of the spec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sorry I didn't mean you needed to change it in this PR. More that the clarity created via this PR is raising an issue that we might consider addressing separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll file something

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Manishearth
Copy link
Contributor Author

Addressed.

index.bs Outdated

An {{XRSpace}} of type [=XRSpace/input target ray=], [=XRSpace/input grip=] must have an associated {{XRInputSource}} <dfn for="XRSpace">input source</dfn>.

The [=native origin=] of an {{XRSpace}} with [=XRSpace/internal type=] of [=XRSpace/viewer space=] is the position and orientation of the [=viewer=].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we perhaps use the term "physical position and orientation" or "physical pose" consistently throughout for these definitions, to emphasize when a WebXR concept is being defined in terms of physical reality, rather than the logical position and orientation of another WebXR construct within some space?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ambivalent about this, happy to add it if people agree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't object to it, and I do feel like it helps add clarity.

index.bs Outdated

An {{XRSpace}} of type [=XRSpace/input target ray=], [=XRSpace/input grip=] must have an associated {{XRInputSource}} <dfn for="XRSpace">input source</dfn>.

The [=native origin=] of an {{XRSpace}} with [=XRSpace/internal type=] of [=XRSpace/viewer space=] is the position and orientation of the [=viewer=].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this definition in particular, we should perhaps be more specific and define the "viewer space" origin to explicitly coincide with the midpoint of XRView origins that will provide optimal rendering for a given XR device.

If not, we may see some UAs place XRViewerPose.transform between the view transforms and some UAs place it at another logical viewer position, such as the back of the neck, top of the head, etc., especially given the following language elsewhere:

NOTE: The {{XRViewerPose}}'s {{XRPose/transform}} can be used to position graphical representations of the [=viewer=] for spectator views of the scene or multi-user interaction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about devices with more than one view? (This might be something that is better split out into a different issue, I'm trying to clarify the state of the spec as it currently exists without changing behavior)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

define the "viewer space" origin to explicitly coincide with the midpoint of XRView

I'm not sure it's correct to say the viewer space's origin is the midpoint of the views. I think in some native systems it's actually set as the centroid of the logical head, and if you have more than two views it could start to skew it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key question is what we expect apps to do with "viewer space", and if that usage can handle definitions that vary widely among UAs or underlying XR platforms (e.g. some placing it in the center of the head, others placing it at the base of the neck, others placing it between the eyes, etc.).

For example, one reason to expose "viewer space" is to allow apps to raycast from a view-relative position to render a cursor or target with the user's head or device.

Specifically, the input explainer mentions that the targetRaySpace for an XRInputSource with a targetRayMode of 'gaze' will match the pose of viewerSpace.

For apps designed only for gaze-based targeting that just use viewerSpace directly, they will expect viewer space to match the target ray of gaze-targeted input sources as described - for headsets, pointing forward between the eyes (i.e. between the two XRViews. However, if one UA starts its viewerSpace ray between the eyes and another UA is free to start its viewerSpace ray in the center of the user's head, that target ray will sometimes shoot out from the user's nose instead, producing the wrong app behavior. The app would instead be required to manually average the two XRView rays it receives to produce its gaze targeting ray, which I don't believe is what we intended here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definition will be even more important to nail down across UAs if we proceed with #623's idea of a "viewer" reference space. (which I believe we should!)

@toji
Copy link
Member

toji commented May 2, 2019

I discussed this with Nell earlier today and we've both agreed that there's a couple of things that we'd like to iterate on in this PR prior to landing. As a result, we're going to bump this PR and the associated issue back by one milestone. That's not a big deal, and has no bearing on the priority of landing these changes, it's just us sticking to the milestone process that we'd agreed on.

Thanks, @Manishearth, for continuing to work with us on this PR and related issues! It's been extremely valuable.

@toji toji modified the milestones: April 2019, May 2019 May 2, 2019
@NellWaliczek
Copy link
Member

And it also doesn't mean we're taking our foot off the gas in terms of working to get this landed asap!

@Manishearth
Copy link
Contributor Author

Oh, I understand, I'd actually assumed this was in the May milestone already because in the call we said that #581 was going to slip 😄

Thanks for clarifying, though!

index.bs Outdated

- Passing a type of <dfn enum-value for="XRReferenceSpaceType">position-disabled</dfn> creates an {{XRReferenceSpace}} instance. It represents a tracking space where orientation is tracked but the [=viewer=]s position is always at the [=native origin=].
- Passing a type of <dfn enum-value for="XRReferenceSpaceType">position-disabled</dfn> creates an {{XRReferenceSpace}} instance. Its [=native origin=] has a position that always remains equal to the position of the [=viewer=], and a constant orientation initialized based on the conventions of the underlying [=/XR device=]. It can be used to create experiences where the movement of the user is disregarded, as it will not report changes in position when used with {{XRFrame/getViewerPose()}}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we're settling in on clean physically-based reference space definitions (which is great, especially for ensuring the associativity of space relationships!), I wonder if "position-disabled" reference spaces will operate as we intend for their 360-degree video scenarios when on stereo headsets.

Under this definition, getViewerPose would still return non-identity eye positions for each XRView relative to this reference space, since the eyes are indeed in physically different places. That would cause the app to render each eye from some pose other than (0, 0, 0), leading to weird parallax artifacts relative to its finite-sized video sphere, which is the primary problem that "position-disabled" was trying to avoid. I'd really hate to see us backpedal on the physically-based definitions and introduce some special case to getViewerPose for "position-disabled" spaces to magically zero out the XRView poses, since we should otherwise be fully clean on reference spaces behaving identically, just with differently-behaving origins.

Alternatively, what if we just replaced the "position-disabled" reference space with a disablePosition option on getViewerPose? 360-video apps could then just use the "eye-level" reference space like other seated experiences, with their one difference being a parameter they pass to one function whose behavior they actually wish to impact. By adding the option to the affected method, all other use of reference spaces stays clean and easy to understand.

The simplest option here would just be to require that 360-video apps use a big enough video sphere (e.g. 1000m) that these parallax issues from neck modeling and stereo separation in "eye-level" spaces would fall away. If we do that, we could just cut the "position-disabled" space entirely. Perhaps in the interest of getting quickly to WebXR 1.0, that's the best answer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #625 to track this issue, which is broader than this particular change.

@toji
Copy link
Member

toji commented May 3, 2019

Apologies for not having additional feedback sooner, but I wanted to wrap my head around this in the context of some of the tangent conversations that are sprouting from it, like consolidating identity/viewer spaces and removing position-disabled.

I feel the native origins portion of this PR is really solid. Getting that nailed down for every space the spec describes is an important step, and one that I want to see landed ASAP.

Where I still have concerns (as does Nell, as she indicated to me in an earlier conversation) is the way that the XRSpace internal types are defined and presented. Especially as we talk about simplifying the available spaces, I wonder if we can actually remove some of the complexity here too.

One of the things that has a "code smell" (spec smell?) in this regard is the way that the reference space types are lumped into the potential list of internal type values. That seems to be a conflation of concepts that may cause us some heartache down the road. Additionally, I'm not clear on the value of having the internal type simply repeat the "name" of the space outside of allowing some algorithms to explicitly branch off that name. If that IS the intended benefit, it seems we'd be better off making the internal type differentiate between spaces based on their behavior rather than their source. (ie: "reference space" vs. "anchor space".)

And in fact, we might not even need them at all at the moment, if we consider the proposed space simplifications? The newly defined logic about how spaces relate to eachother via their native origin covers the basic behavior quite well as long as the native origin is well defined for each space. From there, if we assume that identity and viewerSpace will merge and that we won't be placing any restrictions on what spaces can be related to them (as was suggested in #623) then the baseline space behavior should be sufficient. There's no other scenarios that I can see where the internal type is used to dictate behavior in this PR.

(For the privacy portion of that, I think we'd want to take a different direction entirely and state that certain types of spaces simply shouldn't be exposed by the system unless user consent has been given, a concept which pairs nicely with some incoming privacy docs from the Google team.)

The only other part of this PR that seems to be under discussion is whether the targetRaySpace and gripSpace should be wholly separate spaces or defined via an originOffset from one to the other. On that count I'll admit that I kind of don't care. The origin offset approach seems conceptually clean if they're assumed to have a generally static relationship, and I wouldn't object to an implementation representing them as such internally, but that DOES feel like an implementation detail.

So I guess the TL;DR: recommendation here is that we should push to land the native origin portions of this ASAP, and then address #625 and #624 prior to revisiting the internal type question, because they may render it unnecessary.

@Manishearth
Copy link
Contributor Author

Alright, in that case I'll get rid of the notion of internal type, and specify internal origins somewhere near the source of each type of space.

@Manishearth
Copy link
Contributor Author

Done. This PR is much smaller now 😄

This PR doesn't assume that #626 will land, however if it is going to land then we can remove the last two commits. I can also rebase over that PR if you'd like to land it first.

Yeah, the notion of internal type is mostly useful for defining native origins (which we can do directly) and to do checks in getPose() (which I've explained we don't need if we gate on space creation)

@Manishearth Manishearth changed the title Add internal type for XRSpaces, explicitly spec out native origins Explicitly spec out native origins May 3, 2019
@toji
Copy link
Member

toji commented May 3, 2019

Looks like some commits you intended to push didn't make it into this PR? Latest commit is still from 2 days ago.

@Manishearth
Copy link
Contributor Author

Done, sorry, the branch was named differently.

@toji
Copy link
Member

toji commented May 3, 2019

Yeah, that is a lot smaller. 😁 LGTM! Do Nell or Alex want to take a look?

@toji toji requested a review from NellWaliczek May 3, 2019 22:13
Copy link
Member

@NellWaliczek NellWaliczek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for doing this!

index.bs Outdated
@@ -780,9 +780,9 @@ Each {{XRReferenceSpace}} has a <dfn for="XRReferenceSpace">type</dfn>, which is

An {{XRReferenceSpace}} is most frequently obtained by calling {{XRSession/requestReferenceSpace()}}, which creates an instance of an {{XRReferenceSpace}} or an interface extending it, determined by the {{XRReferenceSpaceType}} enum value passed into the call. The type indicates the tracking behavior that the reference space will exhibit:

- Passing a type of <dfn enum-value for="XRReferenceSpaceType">identity</dfn> creates an {{XRReferenceSpace}} instance. It represents a reference space where the [=viewer=] is always at the [=native origin=]. Only the {{XRSession/viewerSpace}} and {{XRInputSource/targetRaySpace}}'s of {{XRInputSource}}'s with an {{XRInputSource/targetRayMode}} of {{XRTargetRayMode/screen}} can be spatially related to an {{identity}} reference space.
- Passing a type of <dfn enum-value for="XRReferenceSpaceType">identity</dfn> creates an {{XRReferenceSpace}} instance. It represents a tracking space with a [=native origin=] that tracks the position and orientation of the [=viewer=]. Only the {{XRSession/viewerSpace}} and {{XRInputSource/targetRaySpace}}'s of {{XRInputSource}}'s with an {{XRInputSource/targetRayMode}} of {{XRTargetRayMode/screen}} can be spatially related to an {{identity}} reference space.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're keeping this type, it might be useful to add something (but probably not exactly) like:

The [=native origin=] stays constant for {{XRSession}}s without any tracking data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks!

@toji toji merged commit 8c7a85c into immersive-web:master May 7, 2019
@AdaRoseCannon AdaRoseCannon added the ed:spec Include in newsletter, spec change label May 8, 2019
@Manishearth Manishearth deleted the space-defs branch August 13, 2019 05:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ed:spec Include in newsletter, spec change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Explicitly spec out "native origin" behind each XRSpace
6 participants