Develop

Select your platform

Input API

The OpenXR Input API stands out from the rest of the OpenXR specification and there are key differences from older APIs such as VrApi, CAPI, and so on. This topic discusses the core philosophy of the OpenXR Input API and provides a walkthrough of how it mechanically functions.

Overview

Consider the following scenario: a Native app developer designs a user experience where users can press the same Trigger button of a controller to “teleport” within the experience, interact with a menu, and resize a VR object within the app. Assuming that this developer has a limited amount of controllers to test the app against, the following should happen:

Existing Meta Quest or Rift controllers should support the app.

Future Meta Quest or Rift controllers should support the app.

Other vendor systems and controllers should support the app.

Also, users might be able to perform rebinding at a system level to perform these same functions by using different buttons or combinations of buttons across different devices.

Before OpenXR, this presented challenges such as:

Developers would be responsible for compatibility and maintenance of the app across many runtimes and devices.

Developers would make assumptions across different systems.

Developers would test these assumptions against limited physical devices.

Developers wouldn’t have critical knowledge about future devices, so the app would need to recompile against these.

Runtimes would not collect enough information from the app to allow for fully featured rebinding. The best case scenario would normally be to allow rebinding from one button to another.

Previous APIs, like VrApi, allowed developers to consider the hardware state by following this process:

The app would ask the system what devices are available to the user by enumerating these devices.

The system would respond to the app about the devices that are available along with their capabilities.

The app could start asking about the state of the hardware per frame, for example, checking each button state.

This old process leaves the app developer with the responsibility of device compatibility and makes it challenging for the runtime to know whether the app was built with the current device in mind or if it needs to run in compatibility mode.

OpenXR inverts this process by letting you specify bindings for every device (“interaction profile”) the app was tested against, allowing the runtime to simulate those devices. This means apps will never get unexpected input. The runtime receives crucial information about what the app was built for, enabling old apps to run smoothly with new controllers.

The OpenXR input system ensures that apps provide the runtime with enough information about how the input is used, enabling it to build a system rebinding UI. Depending on context, the same button can be used for multiple actions, leading to the description of actions and action sets, along with localized descriptions.

Actions and Action Sets

OpenXR shifts from a computer-centric approach to a more human-centric one, so the developer’s focus is no longer on the state of the device an app runs on. This concept applies to all devices, even to the ones that are not even developed yet. The key to this approach is that OpenXR code defines both the user intent (XrAction) and the context of actions (XrActionSet). As an OpenXR Native developer, you do not work directly with the hardware state of the user’s input device. Rather, you must focus on the action state and supply bindings that bind these actions to physical input for every tested device (interaction profile).

The following example illustrates this idea. Assuming that an imaginary app uses a Trigger button to support navigation to different rooms within the experience, interaction with a tool that spawns a cube object, and interaction with a menu UI, the following actions and action sets may exist:

Action sets 1, 2, and 3 comprise actions that relate to a certain context within the app (that is, actions when using a toolset to manipulate a cube, user navigation within the experience, or interaction with a menu UI). Each action belongs to only one action set. By using the same Trigger button, the user can still spawn cubes, enable the user to move to the next room of the experience, or select a highlighted option from a menu. However, the associated actions are different: their context is different because they belong to different action sets. So, it is the cross-section between context and intent of use that defines each action. All these define two important OpenXR concepts.

OpenXR Input API Concept #1. OpenXR is an action-based system, so you must expose the usage intent (actions) and the context (action sets) to the runtime.

You must define action sets and actions in their app to provide this information to the runtime. Depending on the capabilities of the system the app runs on, the user might also choose to rebind these actions against the default app bindings, even at the system level. For example, a controller might support rebinding these three actions to three different buttons, which is not as simple as rebinding from button to button.

OpenXR Input API Concept #2. Instead of the system telling the app what devices are available, the app provides the system with a list of physical devices it was built for and tested with.

This way, the runtime can make decisions about supporting backwards compatibility, as the app provides valuable information about its original design. In OpenXR, it is then up to the runtime to determine the best way to drive the app and support the user experience across different systems. So, you must offer:

A list of the devices the app was built and tested against

A list of how you would set up bindings for these specific devices

These two happen with the same mechanism. You must suggest bindings for devices you built the app and tested it against. By suggesting bindings for a device, the app is implicitly saying that it was built and tested for that device.

Important: It is expected that the app suggests bindings for everything it supports, so you should not try to guess the “right” device. Rather, only suggest bindings for that device.

For example, the runtime can see that the developer wrote, built, and tested an app against a certain Meta Quest controller. Because OpenXR has knowledge of the architecture on which Meta devices rely (even future ones), it supports backward and forward compatibility across different devices. In that sense, listing the devices the app was compiled and tested against offers the runtime with a list of “suggested bindings”. These are hints to the runtime to better support the app and its users across different devices.

Defining and Creating Actions and Action Sets

In an imaginary app where the user places cubes in the world and manipulates them, here is how definitions of actions and action sets look like:

XrActionSet cubeToolActionSet;
XrAction cubeSpawnAction;

XrActionSetCreateInfo actionSetInfo{XR_TYPE_ACTION_SET_CREATE_INFO};
actionSetInfo.next = nullptr;
actionSetInfo.actionSetName = "cube_tool"; // Machine friendly name
actionSetInfo.localizedActionSetName = "Cube tool"; // Human friendly name
xrCreateActionSet(instance, actionSetInfo, &cubeToolActionSet);

XrActionCreateInfo actionInfo = {XR_TYPE_ACTION_CREATE_INFO};
actionInfo.next = nullptr;
actionInfo.actionName = "spawn_cube"; // Machine friendly name
actionInfo.localizedActionName = "Place cube"; // Human friendly name
actionInfo.actionType = XR_ACTION_TYPE_BOOLEAN_INPUT;
xrCreateAction(cubeToolActionSet, actionInfo, &cubeSpawnAction);

Where XrActionType is an enum that associates return value types to actions states, defined in the OpenXR specification as:

typedef enum XrActionType {
    XR_ACTION_TYPE_BOOLEAN_INPUT = 1,
    XR_ACTION_TYPE_FLOAT_INPUT = 2,
    XR_ACTION_TYPE_VECTOR2F_INPUT = 3,
    XR_ACTION_TYPE_POSE_INPUT = 4,
    XR_ACTION_TYPE_VIBRATION_OUTPUT = 100,
    XR_ACTION_TYPE_MAX_ENUM = 0x7FFFFFFF
} XrActionType;

Declare all available actions and action sets upfront. They must also create every action and action set they might need for a VR session by a number of calls to xrCreateActionSet and xrCreateAction. The localizedActionName and localizedActionSetName store suitable names for each action and action set respectively. The runtime may wish to display these to the user, for example, in a rebinding menu UI. (“Localized” means text in the natural language used in the user’s system UI.)

Suggesting Bindings

After creating actions and action sets the app must inform the system how these bind to real devices. In the OpenXR Input API, this is the only point where you leave the “actions” abstraction and think about actual devices. The following code snippet demonstrates how it works.

XrPath touchInteractionProfile{XR_NULL_PATH};
xrStringToPath(instance, "/interaction_profiles/oculus/touch_controller", &touchInteractionProfile));

std::vector<XrActionSuggestedBinding> bindings{};

XrPath inputRightAClick{XR_NULL_PATH};
xrStringToPath(instance, "/user/hand/right/input/input/a/click", &inputRightAClick));
bindings.emplaceBack(XrActionSuggestedBinding{cubeSpawnAction, inputRightAClick});

XrPath inputRightSqueeze{XR_NULL_PATH};
xrStringToPath(instance, "/user/hand/right/input/squeeze/value", &inputRightSqueeze));
bindings.emplaceBack(XrActionSuggestedBinding{cubeRotateAction, inputRightSqueeze});

XrInteractionProfileSuggestedBinding suggestedBindings{XR_TYPE_INTERACTION_PROFILE_SUGGESTED_BINDING};
// These bindings are for the Meta Quest Touch interaction profile
suggestedBindings.interactionProfile = touchInteractionProfile;
suggestedBindings.suggestedBindings = bindings.data();
suggestedBindings.countSuggestedBindings = bindings.size();

// Suggest all the bindings for the Meta Quest Touch interaction profile
xrSuggestInteractionProfileBindings(instance, &suggestedBindings);

// Repeat for other controllers, like vive_controller, xbox controller, and so on

The concept of actual devices in OpenXR is called “interaction profiles”. In this example, the profile is the "/interaction_profiles/oculus/touch_controller" . So, this hypothetical app suggests bindings for the Touch controller. Then it lists a long list of pairs of values, like cubeSpawnAction, XrPath("/user/hand/right/input/input/a/click") that bind the actions to a path. This path is a special string defined in the spec per interaction profile, referring to a physical button or any part of this input device. In this example the spawn cube action binds to the right-hand controller’s A button on the Meta Quest Touch controller. Then, you must do the same for all other actions.

You must call xrSuggestInteractionProfileBindings and repeat the process for all interaction profiles (that is, every input device the you have tested the app against and aim to support by the app), for example, the Meta Quest Touch controller, the Vive controller, the Xbox controller, and so on.

You should also go through and bind all actions to valid paths for these specific interaction profiles. For lists of available paths per interaction profile, see Interaction Profile Paths⁠ in the OpenXR specification.

Important: It is explicitly called “suggest” interaction profile bindings because this is not a request for getting the specific bindings, rather, it is a hint to the system that the app was designed with these bindings in mind. Then, the runtime can change these bindings if the app runs on a device that you have never seen, or even if the user rebinds these settings.

Binding Immutability

The final step before using the actions is to call the xrAttachSessionActionSets function which attaches a group of action sets (therefore, their associated actions as well) to the session that is running.

xrAttachSessionActionSets(session, &listOfActionSets);

When an action set is attached to the session, it becomes immutable. This step also enforces immutability of the bindings after that point. The xrAttachSessionActionSets function can be called only once per session and apps can only use action sets that are part of this call. In this way, it is not possible for apps to add new actions as the user progresses through an experience. That would be problematic, for example, if the user’s system supports a rebinding UI and the user attempts a complete rebinding while in session. The system will have all the binding-related information regardless of the user’s progression to any specific part of an experience that uses them, such as interacting with a menu UI that displays at a certain stage of the app.

Syncing Actions

The app has to explicitly ask for updates of the input state, once per frame (generally, at the beginning of the frame loop). To collect information about actions, the app must call xrSyncActions once per frame, for example:

std::vector<XrActiveActionSet> activeActionSets = { {cubeToolActionSet}, {navigationActionSet} };

XrActionsSyncInfo syncInfo = {XR_TYPE_ACTIONS_SYNC_INFO};
syncInfo.countActiveActionSets = activeActionSets.size();
syncInfo.activeActionSets = activeActionSets.data();

xrSyncActions(Session, &syncInfo);

This is where all action states get updated atomically. If you call xrSynActions once and then ask about the state of the cubeSpawnAction action twice, it is guaranteed to get the exact same result if there is no invocation of the xrSyncActions function in between. This provides you with a robust control mechanism about new input states.

In each xrSyncActions call, you must define which action sets are active. An action set can be active for the world in general, but when, for example, the user opens a specific menu UI, the app can switch to the action set for this menu interaction and the user will not be able to teleport by using the same Trigger button.

Receiving Input State of an Action

The following slightly abstract snippet calls the xrGetActionStateBoolean function, so it relates to a boolean action.

XrActionStateGetInfo getInfo = {XR_TYPE_ACTION_STATE_GET_INFO};
getInfo.action = spawnCubeAction;

// Can be used to distinguish between hands, but optional:
getInfo.subactionPath = XR_NULL_PATH;

// Output struct
XrActionStateBoolean spawnCubeState = {XR_TYPE_ACTION_STATE_BOOLEAN};

xrGetActionStateBoolean(session, &getInfo, &spawnCubeState);

This returns the following:

Value	Notes
A boolean value representing whether it is active or not, depending on whether this state has passed to `xrSyncActions`	It is also inactive if the device is powered off, or if the app loses focus to a system menu. The input should be ignored if it’s inactive.
The current state (in this case, `true` or `false`)
A boolean value representing whether the state has changed since the last time `xrSyncActions` was called	This is useful because it allows detecting edges on a signal in a robust way, so for example, the user can trigger an event only once.
A value representing the estimated last time the physical button state had changed	This can provide some precision since the last time you called `xrSyncActions`.

Pose Actions

A pose action is an action type that returns a pose of something in the world, for example, the position and rotation (pose) of a controller. To receive such pose data, you must create a space by using xrCreateActionSpace for a particular action.

XrPath leftHandPath;
xrStringToPath(instance, "/user/hand/left", &leftHandPath));

// Init:
XrActionSpaceCreateInfo createInfo{XR_TYPE_ACTION_SPACE_CREATE_INFO};
createInfo.action  = gripAction;
// poseInActionSpace is a fixed offset to the returned pose
createInfo.poseInActionSpace = poseIdentity;
createInfo.subactionPath = leftHandPath;
xrCreateActionSpace(session, &createInfo, &leftGripActionSpace);

// Frame loop:

// Input struct
XrActionStateGetInfo getInfo = {XR_TYPE_ACTION_STATE_GET_INFO};
getInfo.action = gripAction;
getInfo.subactionPath = leftHandPath;

// Output struct
XrActionStatePose state{XR_TYPE_ACTION_STATE_POSE};

// Call getActionStatePose to see if the pose is active
xrGetActionStatePose(Session, &getInfo, &state));


if (state.isActive){
    // Output struct
    XrSpaceLocation leftGripLocation{XR_TYPE_SPACE_LOCATION};

    xrLocateSpace(leftGripActionSpace, localReferenceSpace, predictedDisplayTime, &leftGripLocation);
    // Returns pose + valid + tracked bits (Note: not the same as state.Active)
}

For example, if you have previously called an action named gripAction that was bound to both left and right hands, the subactionPath allows for selecting only the left hand. Then, create an action space (leftGripActionSpace). This must happen before the frame loop. During the frame loop, in each frame the app can locate this action when you call the xrLocateSpace function on this leftGripActionSpace. This returns the pose (position + rotation) of the controller relative to the provided reference space.

Important: There are three reference spaces to which the app can get the result relative: VIEW, LOCAL, and STAGE. These have well-defined meanings, defining origin position, and axes orientation. For details, see XrSpace⁠ in the OpenXR specification. It’s also important that the app uses the correct timestamp, because timestamps are used for pose prediction into the future. If you use a wrong timestamp, this can lead to higher latency. Apply the predictedisplayTime returned from the last call to xrWaitFrame. For details, see xrWaitFrame⁠ in the OpenXR specification.

You must call the xrGetActionStatePose function to receive an additional state on the pose action, which offers information about whether the pose action is active. For example, this would return false if the user opens the dash menu on their Quest device. The underlying app, which was focused, will be enforced to turn all its actions to being inactive. To avoid floating controllers rendered on top of the dash controller, you must check whether the state pose is active before using it.

When the app locates the controller, there are two additional returned values: valid and tracked:

The valid bit represents the validity of the pose. It becomes false if the app completely loses tracking of the controller or if isActive is false. There could be a case where the action itself is active but the valid bit is false if the user is in the dash menu.

The tracked bit records whether the pose is tracked. The controller might be in a valid position, but not currently being tracked. This could be the case when the runtime uses dead reckoning to estimate the controller’s position; the app could use the data but be aware that the tracking accuracy have degraded.

General Flow

To sum up, managing input occurs through this flow:

// For each ActionSet:
xrCreateActionSet(...)

// For each Action
xrCreateAction(...)

// For each ActionSpace
xrCreateActionSpace(...)

// For each supported and tested device:
xrSuggestInteractionProfileBindings(...)

xrAttachSessionActionSets(...)

while (frameLoop) {
    // Only once per frame:
    xrSyncActions(...)

    // As many times as needed, can be multiple times for the same action
    xrGetActionStateTYPE(...)
    xrLocateSpace(...)
}

Did you find this page helpful?