Skip to content

Single-canvas inline, drop XRPresentationContext #656

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jun 13, 2019
Merged
Prev Previous commit
Next Next commit
Updated the explainer
  • Loading branch information
toji committed Jun 13, 2019
commit 771b48a5fe312eba79bda66fac0179595f64f06d
69 changes: 10 additions & 59 deletions explainer.md
Original file line number Diff line number Diff line change
Expand Up @@ -365,41 +365,7 @@ function checkARSupport() {

The UA may choose to present the immersive AR session's content via any type of display, including dedicated XR hardware (for devices like HoloLens or Magic Leap) or 2D screens (for APIs like [ARKit](https://developer.apple.com/arkit/) and [ARCore](https://developers.google.com/ar/)). In all cases the session takes exclusive control of the display, hiding the rest of the page if necessary. On a phone screen, for example, this would mean that the session's content should be displayed in a mode that is distinct from standard page viewing, similar to the transition that happens when invoking the `requestFullscreen` API. The UA must also provide a way of exiting that mode and returning to the normal view of the page, at which point the immersive AR session must end.

## Rendering to the Page

There are a couple of scenarios in which developers may want to present content rendered with the WebXR Device API on the page instead of (or in addition to) a headset: Mirroring and inline rendering. Both methods display WebXR content on the page via a Canvas element with an `XRPresentationContext`. Like a `WebGLRenderingContext`, developers acquire an `XRPresentationContext` by calling the `HTMLCanvasElement` or `OffscreenCanvas` `getContext()` method with the context id of "xrpresent". The returned `XRPresentationContext` is permanently bound to the canvas.

An `XRPresentationContext` can only be supplied imagery by an `XRSession`, though the exact behavior depends on the scenario in which it's being used. The context is associated with a session by setting the `XRRenderState`'s `outputContext` to the desired `XRPresentationContext` object. An `XRPresentationContext` cannot be used with multiple `XRSession`s simultaneously, so when an `XRPresentationContext` is set as the `outputContext` for a session's `XRRenderState`, any session it was previously associated with will have it's `renderState.outputContext` set to `null`.

### Mirroring

On desktop devices, or any device which has an external display connected to it, it's frequently desirable to show what the user in the headset is seeing on the external display. This is usually referred to as mirroring.

In order to mirror WebXR content to the page, the session's `renderState.outputContext` must be set to a `XRPresentationContext`. Once a valid `outputContext` has been set any content displayed on the headset will then be mirrored into the canvas associated with the `outputContext`.

When mirroring only one eye's content will be shown, and it should be shown without any distortion to correct for headset optics. The UA may choose to crop the image shown, display it at a lower resolution than originally rendered, and the mirror may be multiple frames behind the image shown in the headset. The mirror may include or exclude elements added by the underlying XR system (such as visualizations of room boundaries) at the UA's discretion. Pages should not rely on a particular timing or presentation of mirrored content, it's really just for the benefit of bystanders or demo operators.

The UA may also choose to ignore the `outputContext` on systems where mirroring is inappropriate, such as devices without an external display like mobile or all-in-one systems.

```js
function beginXRSession() {
let mirrorCanvas = document.createElement('canvas');
let mirrorCtx = mirrorCanvas.getContext('xrpresent');
document.body.appendChild(mirrorCanvas);

navigator.xr.requestSession('immersive-vr')
.then((session) => {
// A mirror context isn't required to render, so it's not necessary to
// wait for the updateRenderState promise to resolve before continuing.
// It may mean that a frame is rendered which is not mirrored.
session.updateRenderState({ outputContext: mirrorCtx });
onSessionStarted(session);
})
.catch((reason) => { console.log("requestSession failed: " + reason); });
}
```

### Inline sessions
## Inline sessions

There are several scenarios where it's beneficial to render a scene whose view is controlled by device tracking within a 2D page. For example:

Expand All @@ -411,31 +377,25 @@ These scenarios can make use of inline sessions to render tracked content to the

The [`RelativeOrientationSensor`](https://w3c.github.io/orientation-sensor/#relativeorientationsensor) and [`AbsoluteOrientationSensor`](https://w3c.github.io/orientation-sensor/#absoluteorientationsensor) interfaces (see [Motion Sensors Explainer](https://w3c.github.io/motion-sensors/)) can be used to polyfill the first case.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not this PR, but boy does this sentence seem strange now that we're committed to inline.


Similar to mirroring, to make use of this mode the `XRRenderState`'s `outputContext` must be set. At that point content rendered to the `XRRenderState`'s `baseLayer` will be rendered to the canvas associated with the `outputContext`. The UA is also allowed to composite in additional content if desired. (In the future, if multiple layers are used their composited result will be what is displayed in the `outputContext`.)
To make use of this mode a `XRWebGLLayer` must be created with the `useDefaultFramebuffer` option set to `true`. This instructs the layer to not allocate a new WebGL framebuffer but instead set the `framebuffer` attribute to `null`. That way when `framebuffer` is bound all WebGL commands will naturally execute against the WebGL context's default framebuffer and display on the page like any other WebGL content. When that layer is set as the `XRRenderState`'s `baseLayer` the inline session is able to render it's output to the page.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, I presume we're going to want inline sessions to participate in new XRLayers types that require composition. How is that going to work in light of the fact that inline sessions are required to use useDefaultFramebuffer=true? Are any layers besides XRWebGLLayer going be ignored in that case?

I worry about the compatibility aspects of these types of flags. The goal of WebXR is to be a write once, run anywhere deal. When it comes to drawing stuff, the spec should ideally say "You render your stuff to this framebuffer, we'll take care of composing your layers together with things the browser wants to draw and everything will just work." The more special canvas contexts, useXXXOnlyWorksInInline flags, etc, the more complexity we add for web developers and the more time we'll spend debugging sites that are broken on some hardware but not others.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, I presume we're going to want inline sessions to participate in new XRLayers types that require composition.

Yes, absolutely!

How is that going to work in light of the fact that inline sessions are required to use useDefaultFramebuffer=true? Are any layers besides XRWebGLLayer going be ignored in that case?

Once we have new layer types and/or multi-layer the restriction that inline session layers must use that flag would be lifted (and I expect most layers wouldn't have the flag at all.) At that point if a layer IS constructed with the flag it would be an error to use it with any other layers.

I worry about the compatibility aspects of these types of flags... The more special canvas contexts, useXXXOnlyWorksInInline flags, etc, the more complexity we add for web developers and the more time we'll spend debugging sites that are broken on some hardware but not others.

Completely agree, but in that regard I feel like this is a lateral move rather than a regression. Previously we were asking developers to use a special secondary canvas for inline sessions that was necessary in order to support features that didn't actually exist yet, so we were already making users jump through hoops without any clear benefits to doing so. With this approach we are scaling back to hoop jumping to setting a single boolean option at which point everything works the way they already expect WebGL to work for inline content. Then when those eventual advanced features ARE ready we can introduce the more complex route with a much more tractable explanation of why they're necessary.

To address a comment that I remember from the call, it looks like we could say "turn this flag on automatically for inline" and be done with it. But then we're in a weird position down the road where when layer compositing IS a thing and we expect most content will eventually want to use it, now inline has to flip the same damn flag in reverse for the majority of content just to get the same default behavior as immersive. I'd rather have the default be compositing is used everywhere and for this first release we have this weird little required opt-out wart.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RafaelCintron Does that address your concern? If so, I think we'd like to try to get this merged today or tomorrow...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're removing XRPresentationContext (which I'm fully supportive of), now is an opportune time to eliminate any lingering immersive vs. inline differences as much as we can.

The way the spec is written today, setting the flag to true means content will not work in immersive sessions. This creates compatibility problems for developers who only have access to inline session hardware. I understand that the API cannot hide all differences between hardware types but having this flag, in my view, does not meet the bar for introducing compatibility problems.

Why can't we eliminate the flag and have the returned framebuffer always be non-null?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're saying now, but I'm not sure how to achieve that in a way that actually makes developers lives simpler (and doesn't turn into a really sticky spec issue).

If the returned framebufer is always non-null then that means that content rendered to it is going somewhere other than the default backbuffer for the WebGL canvas, and as such needs an explicit mechanism to display it on the page. There's a couple of ways that could happen:

  • We have it bound to some sort of explicit output surface. This is what xrpresent did for us, and what we're trying to move away from.
  • We declare that we are taking ownership of the WebGL canvas when a layer is created like this and start displaying our content in favor of the default framebuffer. This seems pretty messy to me, would probably require coordination with the WebGL WG to achive, and at the very least Ken has expressed that he's less than thrilled with the idea when I've floated it with him in the past. There's also ownership lifetime issues to worry about in this model, and it would likely force something like spectator mode to use an OffscreenCanvas or second context altogether.
  • We can require the developer to manually blit the rendered content from our framebuffer (probably surfaced as a texture in that case) into the WebGL default framebuffer. This now requires WAY more work on the developers part to support inline content, and increases the divergence of the two rendering paths, which I think is what we're both trying to reduce.
  • A variant of the previous option, we could provide a way to surface the framebuffer content as an ImageBitmap, which could then be shown directly via a bitmaprenderer context or easily converted into a texture for WebGL use. Again, though, this is significantly more work in order to get inline to display properly. And we wouldn't want to expose the ImageBitmap producer to immersive sessions because it would either hold references to swap chain surfaces in an environment where we want to control their flow or necessitate a copy. And we still run into the core issue that the code paths between immersive and inline and now more divergent than ever.

In light of the above, it seemed to me that flipping a single boolean flag and then having everything else "just work" was an attractive option. I've coded up a couple of tests that use this route already and the code ends up looking like this:

let layer = new WebGLLayer(session, gl, { compositionDisabled: mode == 'inline'; });

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have it bound to some sort of explicit output surface. This is what xrpresent did for us, and what we're trying to move away from.

This is along the lines of what I had in mind, but without the extra IDL of XRPresentationContext.

What is your plan for introducing composition to inline sessions in the future? Won't that require "taking ownership of the WebGL canvas" like you described? If so, seems like we should have a discussion of the implications sooner rather than later, especially if the flag is going to end up in the final, ratified spec. Or will inline session never be able to take advantage of composition?

My high order bit is minimizing the number of differences between inline and immersive for core scenarios. Without this PR, developers who create xrpresent contexts will at least have the 3D portion of their content work correctly in immersive. From that standpoint, this change is more of a regression than a lateral move.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is along the lines of what I had in mind, but without the extra IDL of XRPresentationContext.

I'm really curious what you have in mind for the output surface in that case, without introducing additional IDL?

What is your plan for introducing composition to inline sessions in the future?

Essentially to re-introduce XRPresentationContext or something like it, though I think there's still reasonable design discussions to be had around the shape of that API (hence my curiosity about what you had in mind above). I do strongly feel there's value to having composited output from inline sessions, but it's very hard to justify to developers the need to juggle additional canvases and new interfaces to do so when we don't yet support any scenarios that compositing would actually benefit from (multilayer, etc.)

Also, I missed emphasizing previously that one of the explicit goals of this change is to make the polyfill more performant, and I don't see a way to do that if a secondary output surface is involved, whereas the proposal in this PR makes it trivial. (To be clear: I'm also OK with the polyfill staying limited to the core API)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially to re-introduce XRPresentationContext or something like it, though I think there's still reasonable design discussions to be had around the shape of that API (hence my curiosity about what you had in mind above).

I see. I didn't realize that you were already counting on inline developers having to change their code in other ways in order to get composition in the future.

When we add XRPresentationContext in the future, could we detect "use composition" intent for inline sessions by virtue of developers using XRPresentationContext in the WebGL layer's init parameters? This way, we can remove the flag now instead of having awkward "we may remove this flag in the future" verbiage or get stuck having to support XRPresentationContext use without composition down the road.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want the XRPresentationContext to be added to the WebGL layer directly, because the entire point is that it accepts the composited results, not the results from a single layer. That's why it was on the session's XRRenderState previously, which is where I would be inclined to put it again in the future.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Why can't we use the presence of XRPresentationContext on XRRenderState to determine whether the developer has opted into composition for inline sessions?

Immersive and inline sessions can use the same render loop, but there are some differences in behavior to be aware of. Most importantly, inline sessions will not pump their render loop if they do not have a valid `outputContext`. Instead the session acts as though it has been [suspended](#handling-suspended-sessions) until a valid `outputContext` has been assigned.
Immersive and inline sessions can use the same render loop, but there are some differences in behavior to be aware of. Most importantly, inline sessions will not pump their render loop if they do not have a `baseLayer` with `useDefaultFramebuffer` set. (This restriction may be lifted in the future to enable more advanced effects.) Instead the session acts as though it has been [suspended](#handling-suspended-sessions) until a valid `baseLayer` has been assigned.

Immersive and inline sessions may run their render loops at at different rates. During immersive sessions the UA runs the rendering loop at the XR device's native refresh rate. During inline sessions the UA runs the rendering loop at the refresh rate of page (aligned with `window.requestAnimationFrame`.) The method of computation of `XRView` projection and view matrices also differs between immersive and inline sessions, with inline sessions taking into account the output canvas dimensions and possibly the position of the users head in relation to the canvas if that can be determined.

Most instances of inline sessions will only provide a single `XRView` to be rendered, but UA may request multiple views be rendered if, for example, it's detected that that output medium of the page supports stereo rendering. As a result pages should always draw every `XRView` provided by the `XRFrame` regardless of what type of session has been requested.

UAs may have different restrictions on inline sessions that don't apply to immersive sessions. For instance, the UA does not have to guarantee the availability of tracking data to inline sessions, and even when it does a different set of `XRReferenceSpace` types may be available to inline sessions versus immersive sessions.

```js
let inlineCanvas = document.createElement('canvas');
let inlineCtx = inlineCanvas.getContext('xrpresent');
document.body.appendChild(inlineCanvas);

function beginInlineXRSession() {
// Request an inline session in order to render to the page.
navigator.xr.requestSession('inline')
.then((session) => {
// Inline sessions must have an output context prior to rendering, so
// it's a good idea to wait until the outputContext is confirmed to have
// taken effect before rendering.
session.updateRenderState({ outputContext: inlineCtx }).then(() => {
onSessionStarted(session);
});
// Inline sessions must have an appropriately constructed WebGL layer
// set as the baseLayer prior to rendering. (This code assumes the WebGL
// context has already been made XR compatible.)
let glLayer = new XRWebGLLayer(session, gl, { useDefaultFramebuffer: true });
session.updateRenderState({ baseLayer: glLayer });
onSessionStarted(session);
})
.catch((reason) => { console.log("requestSession failed: " + reason); });
}
Expand Down Expand Up @@ -543,7 +503,7 @@ function drawScene() {

Whenever possible the matrices given by `XRView`'s `projectionMatrix` attribute should make use of physical properties, such as the headset optics or camera lens, to determine the field of view to use. Most inline content, however, won't have any physically based values from which to infer a field of view. In order to provide a unified render pipeline for inline content an arbitrary field of view must be selected.

By default a vertical field of view of 0.5 radians (90 degrees) is used for inline sessions. The horizontal field of view can be computed from the vertical field of view based on the width/height ratio of the `outputContext`'s canvas.
By default a vertical field of view of 0.5 radians (90 degrees) is used for inline sessions. The horizontal field of view can be computed from the vertical field of view based on the width/height ratio of the `XRWebGLLayer`'s associated canvas.

If a different default field of view is desired, it can be specified by passing a new `inlineVerticalFieldOfView` value, in radians, to the `updateRenderState` method:

Expand Down Expand Up @@ -647,15 +607,13 @@ dictionary XRRenderStateInit {
double depthFar;
double inlineVerticalFieldOfView;
XRWebGLLayer? baseLayer;
XRPresentationContext? outputContext
};

[SecureContext, Exposed=Window] interface XRRenderState {
readonly attribute double depthNear;
readonly attribute double depthFar;
readonly attribute double? inlineVerticalFieldOfView;
readonly attribute XRWebGLLayer? baseLayer;
readonly attribute XRPresentationContext? outputContext;
};

//
Expand Down Expand Up @@ -748,11 +706,4 @@ partial dictionary WebGLContextAttributes {
partial interface WebGLRenderingContextBase {
Promise<void> makeXRCompatible();
};

//
// RenderingContext
//
[SecureContext, Exposed=Window] interface XRPresentationContext {
readonly attribute HTMLCanvasElement canvas;
};
```
2 changes: 1 addition & 1 deletion input-explainer.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Tracked pointers are input sources able to be tracked separately from the viewer
#### Screen
Screen based input is driven by mouse and touch interactions on a 2D screen that are then translated into a 3D targeting ray. The targeting ray originates at the interacted point on the screen as mapped into the input `XRSpace` and extends out into the scene along a line from the screen's viewer pose position through that point. The specific mapped depth of the origin point depends on the user agent. It SHOULD correspond to the actual 3D position of the point on the screen where available, but MAY also be projected onto the closest clipping plane (defined by the smaller of the `depthNear` and `depthFar` attributes of the `XRSession`) if the actual screen placement is not known.

To accomplish this, pointer events over the relevant screen regions are monitored and temporary input sources are generated in response to allow unified input handling. For inline sessions with an `outputContext`, the monitored region is the `outputContext`'s canvas. For immersive sessions (e.g. hand-held AR), the entire screen is monitored.
To accomplish this, pointer events over the relevant screen regions are monitored and temporary input sources are generated in response to allow unified input handling. For inline sessions the monitored region is the canvas associated with the `baseLayer`. For immersive sessions (e.g. hand-held AR), the entire screen is monitored.

### Selection styles
In addition to a targeting ray, all input sources provide a mechanism for the user to perform a "select" action. This user intent is communicated to developers through events which are discussed in detail in the [Input events](#input-events) section. The physical action which triggers this selection will differ based on the input type. For example (though this is hardly conclusive):
Expand Down