Set up cross-linking into the WebGPU and WGSL specs. [Issue #tabatkins/bikeshed#2006]
Complete the planned sections. [Issue #gpuweb/gpuweb#1321]
1. Introduction
WebGPU is a proposed Web API to enable webpages to use the system’s GPU (Graphics Processing Unit) to perform computations and draw complex images that can be presented inside the page. This goal is similar to the WebGL family of APIs, but WebGPU enables access to more advanced features of GPUs. Whereas WebGL is mostly for drawing images but can be repurposed (with great effort) to do other kinds of computations, WebGPU has first-class support for performing general computations on the GPU.
1.1. Use cases
Example use cases for WebGPU that aren’t addressed by WebGL 2 are:
-
Drawing images with highly-detailed scenes with many different objects (such as CAD models). WebGPU’s drawing commands are individually cheaper than WebGL’s.
-
Executing advanced algorithms for drawing realistic scenes. Many modern rendering techniques and optimizations cannot execute on WebGL 2 due to the lack of support for general computations.
-
Executing machine learning models efficiently on the GPU. It is possible to do general-purpose GPU (GPGPU) computation in WebGL, but it is sub-optimal and much more difficult.
Concrete examples are:
-
Improving existing Javascript 3D libraries like Babylon.js and Three.js with new rendering techniques (compute-based particles, fancier post-processing, ...) and offloading to the GPU expensive computations currently done on the CPU (culling, skinned model transformation, ...).
-
Porting newer game engines to the Web, and enable engines to expose more advanced rendering features. For example Unity’s WebGL export uses the lowest feature set of the engine, but WebGPU could use a higher feature set.
-
Porting new classes of applications to the Web: many productivity applications offload computations to the GPU and need WebGPU’s support for general computations.
-
Improving existing Web teleconferencing applications. For example, Google Meet uses machine learning to separate the user from the background. Running the machine learning in WebGPU would make it faster and more power-efficient, allowing (1) these capabilities to reach cheaper, more accessible user devices and (2) more complex and robust models.
1.2. Goals
Goals:
-
Enable rendering of modern graphics both onscreen and offscreen.
-
Enable general purpose computations to be executed efficiently on the GPU.
-
Support implementations targeting various native GPU APIs: Microsoft’s D3D12, Apple’s Metal, and Khronos' Vulkan.
-
Provide a human-authorable language to specify computations to run on the GPU.
-
Be implementable in the multi-process architecture of browsers and uphold the security of the Web.
-
As much as possible, have applications work portably across different user systems and browsers.
-
Interact with the rest of the Web platform in useful but carefully-scoped ways (essentially sharing images one way or another).
-
Provide a foundation to expose modern GPU functionality on the Web. WebGPU is structured similarly to all current native GPU APIs, even if it doesn’t provide all their features. There are plans to later extend it to have more modern functionality. See also: § 1.3 Why not "WebGL 3"?.
Non-goals:
-
Expose support for hardware that’s not programmable at all, or much less flexible, like DSPs or specialized machine learning hardware.
-
Expose support for hardware that can’t do general-purpose computations (like older mobile phones GPUs or even older desktop GPUs).
-
Exhaustively expose all functionality available on native GPU APIs (some functionality is only available on GPUs from a single vendor, or is too niche to be added to WebGPU).
-
Allow extensive mixing and matching of WebGL and WebGPU code.
-
Tightly integrate with the page rendering flow like CSS Houdini.
1.3. Why not "WebGL 3"?
WebGL 1.0 and WebGL 2.0 are Javascript projections of the OpenGL ES 2.0 and OpenGL ES 3.0 APIs, respectively. WebGL’s design traces its roots back to the OpenGL 1.0 API released in 1992 (which further traces its roots back to IRIS GL from the 1980s). This lineage has many advantages, including the vast available body of knowledge and the relative ease of porting applications from OpenGL ES to WebGL.
However, this also means that WebGL doesn’t match the design of modern GPUs, causing CPU performance and GPU performance issues. It also makes it increasingly hard to implement WebGL on top of modern native GPU APIs. WebGL 2.0 Compute was an attempt at adding general compute functionality to WebGL but the impedance mismatch with native APIs made the effort incredibly difficult. Contributors to WebGL 2.0 Compute decided to focus their efforts on WebGPU instead.
2. Additional Background
2.1. Sandboxed GPU Processes in Web Browsers
A major design constraint for WebGPU is that it must be implementable and efficient in browsers that use a GPU-process architecture. GPU drivers need access to additional kernel syscalls than what’s otherwise used for Web content, and many GPU drivers are prone to hangs or crashes. To improve stability and sandboxing, browsers use a special process that contains the GPU driver and talks with the rest of the browser through asynchronous IPC. GPU processes are (or will be) used in Chromium, Gecko, and WebKit.
GPU processes are less sandboxed than content processes, and they are typically shared between multiple origins. Therefore, they must validate all messages, for example to prevent a compromised content process from being able to look at the GPU memory used by another content process. Most of WebGPU’s validation rules are necessary to ensure it is secure to use, so all the validation needs to happen in the GPU process.
Likewise, all GPU driver objects only live in the GPU process, including large allocations (like buffers and textures) and complex objects (like pipelines).
In the content process, WebGPU types (GPUBuffer
, GPUTexture
, GPURenderPipeline
, ...) are mostly just "handles" that identify objects that live in the GPU process.
This means that the CPU and GPU memory used by WebGPU object isn’t necessarily known in the content process.
A GPUBuffer
object can use maybe 150 bytes of CPU memory in the content process but hold a 1GB allocation of GPU memory.
See also the description of the content and device timelines in the specification.
2.2. Memory Visibility with GPUs and GPU Processes
The two major types of GPUs are called "integrated GPUs" and "discrete GPUs". Discrete GPUs are separate from the CPU; they usually come as PCI-e cards that you plug into the motherboard of a computer. Integrated GPUs live on the same die as the CPU and don’t have their own memory chips; instead, they use the same RAM as the CPU.
When using a discrete GPU, it’s easy to see that most GPU memory allocations aren’t visible to the CPU because they are inside the GPU’s RAM (or VRAM for Video RAM).
For integrated GPUs most memory allocations are in the same physical places, but not made visible to the GPU for various reasons (for example, the CPU and GPU can have separate caches for the same memory, so accesses are not cache-coherent).
Instead, for the CPU to see the content of a GPU buffer, it must be "mapped", making it available in the virtual memory space of the application (think of mapped as in mmap()
).
GPUBuffers must be specially allocated in order to be mappable - this can make it less efficient to access from the GPU (for example if it needs to be allocate in RAM instead of VRAM).
All this discussion was centered around native GPU APIs, but in browsers, the GPU driver is loaded in the GPU process, so native GPU buffers can be mapped only in the GPU process’s virtual memory. In general, it is not possible to map the buffer directly inside the content process (though some systems can do this, providing optional optimizations). To work with this architecture an extra "staging" allocation is needed in shared memory between the GPU process and the content process.
The table below recapitulates which type of memory is visible where:
Regular ArrayBuffer
| Shared Memory | Mappable GPU buffer | Non-mappable GPU buffer (or texture) | |
---|---|---|---|---|
CPU, in the content process | Visible | Visible | Not visible | Not visible |
CPU, in the GPU process | Not visible | Visible | Visible | Not visible |
GPU | Not visible | Not visible | Visible | Visible |
3. JavaScript API
This section goes into details on important and unusual aspects of the WebGPU JavaScript API. Generally, each subsection can be considered its own "mini-explainer", though some require context from previous subsections.
3.1. Adapters and Devices
A WebGPU "adapter" (GPUAdapter
) is an object which identifies a particular WebGPU
implementation on the system (e.g. a hardware accelerated implementation on an integrated or
discrete GPU, or software implementation).
Two different GPUAdapter
objects on the same page could refer to the same underlying
implementation, or to two different underlying implementations (e.g. integrated and discrete GPUs).
The set of adapters visible to the page is at the discretion of the user agent.
A WebGPU "device" (GPUDevice
) represents a logical connection to a WebGPU adapter.
It is called a "device" because it abstracts away the underlying implementation (e.g. video card)
and encapsulates a single connection: code that owns a device can act as if it is the only user
of the adapter.
As part of this encapsulation, a device is the root owner of all WebGPU objects created from it
(textures, etc.), which can be (internally) freed whenever the device is lost or destroyed.
Multiple components on a single webpage can each have their own WebGPU device.
All WebGPU usage is done through a WebGPU device or objects created from it.
In this sense, it serves a subset of the purpose of WebGLRenderingContext
; however, unlike WebGLRenderingContext
, it is not associated with a canvas object, and most commands are
issued through "child" objects.
3.1.1. Adapter Selection and Device Init
To get an adapter, an application calls navigator.gpu.requestAdapter()
, optionally passing
options which may influence what adapter is chosen, like a powerPreference
("low-power"
or "high-performance"
) or forceFallbackAdapter
to force a software implementation.
requestAdapter()
never rejects, but may resolve to null if an adapter can’t be returned with
the specified options.
A returned adapter exposes info
(vendor
/architecture
/etc., implementation-defined), a boolean isFallbackAdapter
so
applications with fallback paths (like WebGL or 2D canvas) can avoid slow software implementations,
and the § 3.1.2 Optional Capabilities available on the adapter.
const adapter= await navigator. gpu. requestAdapter( options); if ( ! adapter) return goToFallback();
To get a device, an application calls adapter.requestDevice()
, optionally passing a descriptor
which enables additional optional capabilities - see § 3.1.2 Optional Capabilities.
requestDevice()
will reject (only) if the request is invalid,
i.e. it exceeds the capabilities of the adapter.
If anything else goes wrong in creation of the device,
it will resolve to a GPUDevice
which has already been lost - see § 3.4 Device Loss.
(This simplifies the number of different situations an app must handle
by avoiding an extra possible return value like null
or another exception type,.)
const device= await adapter. requestDevice( descriptor); device. lost. then( recoverFromDeviceLoss);
An adapter may become unavailable, e.g. if it is unplugged from the system, disabled to save
power, or marked "stale" ([[current]]
becomes false).
From then on, such an adapter can no longer vend valid devices,
and always returns already-lost GPUDevice
s.
3.1.2. Optional Capabilities
Each adapter may have different optional capabilities called "features" and "limits". These are the maximum possible capabilities that can be requested when a device is created.
The set of optional capabilities exposed on each adapter is at the discretion of the user agent.
A device is created with an exact set of capabilities, specified in the arguments to adapter.requestDevice()
(see above).
When any work is issued to a device, it is strictly validated against the capabilities of the device - not the capabilities of the adapter. This eases development of portable applications by avoiding implicit dependence on the capabilities of the development system.
3.2. Object Validity and Destroyed-ness
3.2.1. WebGPU’s Error Monad
A.k.a. Contagious Internal Nullability. A.k.a. transparent promise pipelining.
WebGPU is a very chatty API, with some applications making tens of thousands of calls per frame to render complex scenes. We have seen that the GPU processes needs to validate the commands to satisfy their security property. To avoid the overhead of validating commands twice in both the GPU and content process, WebGPU is designed so Javascript calls can be forwarded directly to the GPU process and validated there. See the error section for more details on what’s validated where and how errors are reported.
At the same time, during a single frame WebGPU objects can be created that depend on one another.
For example a GPUCommandBuffer
can be recorded with commands that use temporary GPUBuffer
s created in the same frame.
In this example, because of the performance constraint of WebGPU, it is not possible to send the message to create the GPUBuffer
to the GPU process and synchronously wait for its processing before continuing Javascript execution.
Instead, in WebGPU all objects (like GPUBuffer
) are created immediately on the content timeline and returned to JavaScript.
The validation is almost all done asynchronously on the "device timeline".
In the good case, when no errors occur , everything looks to JS as if it is synchronous.
However, when an error occurs in a call, it becomes a no-op (except for error reporting).
If the call returns an object (like createBuffer
), the object is tagged as "invalid" on the GPU process side.
Since validation and allocation occur asynchronously, errors are reported asynchronously. By itself, this can make for challenging debugging - see § 3.3.1.1 Debugging.
All WebGPU calls validate that all their arguments are valid objects. As a result, if a call takes one WebGPU object and returns a new one, the new object is also invalid (hence the term "contagious").
const srcBuffer= device. createBuffer({ size: 4 , usage: GPUBufferUsage. COPY_SRC}); const dstBuffer= ...; const encoder= device. createCommandEncoder(); encoder. copyBufferToBuffer( srcBuffer, 0 , dstBuffer, 0 , 4 ); const commands= encoder. finish(); device. queue. submit([ commands]);
// The size of the buffer is too big, this causes an OOM and srcBuffer is invalid. const srcBuffer= device. createBuffer({ size: BIG_NUMBER, usage: GPUBufferUsage. COPY_SRC}); const dstBuffer= ...; // The encoder starts as a valid object. const encoder= device. createCommandEncoder(); // Special case: an invalid object is used when encoding commands, so the encoder // becomes invalid. encoder. copyBufferToBuffer( srcBuffer, 0 , dstBuffer, 0 , 4 ); // Since the encoder is invalid, encoder.finish() is invalid and returns // an invalid object. const commands= encoder. finish(); // The command references an invalid object so it becomes a no-op. device. queue. submit([ commands]);
3.2.1.1. Mental Models
One way to interpret WebGPU’s semantics is that every WebGPU object is actually a Promise
internally and that all WebGPU methods are async
and await
before using each of the WebGPU objects it gets as argument.
However the execution of the async code is outsourced to the GPU process (where it is actually done synchronously).
Another way, closer to actual implementation details, is to imagine that each GPUFoo
JS object maps to a gpu::InternalFoo
C++/Rust object on the GPU process that contains a bool isValid
.
Then during the validation of each command on the GPU process, the isValid
are all checked and a new, invalid object is returned if validation fails.
On the content process side, the GPUFoo
implementation doesn’t know if the object is valid or not.
3.2.2. Early Destruction of WebGPU Objects
Most of the memory usage of WebGPU objects is in the GPU process: it can be GPU memory held by objects like GPUBuffer
and GPUTexture
, serialized commands held in CPU memory by GPURenderBundles
, or complex object graphs for the WGSL AST in GPUShaderModule
.
The JavaScript garbage collector (GC) is in the renderer process and doesn’t know about the memory usage in the GPU process.
Browsers have many heuristics to trigger GCs but a common one is that it should be triggered on memory pressure scenarios.
However a single WebGPU object can hold on to MBs or GBs of memory without the GC knowing and never trigger the memory pressure event.
It is important for WebGPU applications to be able to directly free the memory used by some WebGPU objects without waiting for the GC.
For example applications might create temporary textures and buffers each frame and without the explicit .destroy()
call they would quickly run out of GPU memory.
That’s why WebGPU has a .destroy()
method on those object types which can hold on to arbitrary amount of memory.
It signals that the application doesn’t need the content of the object anymore and that it can be freed as soon as possible.
Of course, it becomes a validation error to use the object after the call to .destroy()
.
const dstBuffer= device. createBuffer({ size: 4 usage: GPUBufferUsage. COPY_DST}); // The buffer is not destroyed (and valid), success! device. queue. writeBuffer( dstBuffer, 0 , myData); dstBuffer. destroy(); // The buffer is now destroyed, commands using that would use its // content produce validation errors. device. queue. writeBuffer( dstBuffer, 0 , myData);
Note that, while this looks somewhat similar to the behavior of an invalid buffer, it is distinct.
Unlike invalidity, destroyed-ness can change after creation, is not contagious, and is validated only when work is actually submitted (e.g. queue.writeBuffer()
or queue.submit()
), not when creating dependent objects (like command encoders, see above).
3.3. Errors
In a simple world, error handling in apps would be synchronous with JavaScript exceptions. However, for multi-process WebGPU implementations, this is prohibitively expensive.
See § 3.2 Object Validity and Destroyed-ness, which also explains how the browser handles errors.
3.3.1. Problems and Solutions
Developers and applications need error handling for a number of cases:
-
Debugging: Getting errors synchronously during development, to break in to the debugger.
-
Fatal Errors: Handling device/adapter loss, either by restoring WebGPU or by fallback to non-WebGPU content.
-
Fallible Allocation: Making fallible GPU-memory resource allocations (detecting out-of-memory conditions).
-
Fallible Validation: Checking success of WebGPU calls, for applications' unit/integration testing, WebGPU conformance testing, or detecting errors in data-driven applications (e.g. loading glTF models that may exceed device limits).
-
App Telemetry: Collecting error logs in web app deployment, for bug reporting and telemetry.
The following sections go into more details on these cases and how they are solved.
3.3.1.1. Debugging
Solution: Dev Tools.
Implementations should provide a way to enable synchronous validation, for example via a "break on WebGPU error" option in the developer tools.
This can be achieved with a content-process⇆gpu-process round-trip in every validated WebGPU call, though in practice this would be very slow. It can be optimized by running a "predictive" mirror of the validation steps in the content process, which either ignores out-of-memory errors (which it can’t predict), or uses round-trips only for calls that can produce out-of-memory errors.
3.3.1.2. Fatal Errors: Adapter and Device Loss
Solution: § 3.4 Device Loss.
3.3.1.3. Fallible Allocation, Fallible Validation, and Telemetry
Solution: Error Scopes.
For important context, see § 3.2 Object Validity and Destroyed-ness. In particular, all errors (validation and out-of-memory) are detected asynchronously, in a remote process. In the WebGPU spec, we refer to the thread of work for each WebGPU device as its "device timeline".
As such, applications need a way to instruct the device timeline on what to do with any errors that occur. To solve this, WebGPU uses Error Scopes.
3.3.2. Error Scopes
WebGL exposes errors using a getError
function which returns the first error since the last getError
call.
This is simple, but has two problems.
-
It is synchronous, incurring a round-trip and requiring all previously issued work to be finished. We solve this by returning errors asynchronously.
-
Its flat state model composes poorly: errors can leak to/from unrelated code, possibly in libraries/middleware, browser extensions, etc. We solve this with a stack of error "scopes", allowing each component to hermetically capture and handle its own errors.
In WebGPU, each device1 maintains a persistent "error scope" stack state.
Initially, the device’s error scope stack is empty. GPUDevice.pushErrorScope('validation')
or GPUDevice.pushErrorScope('out-of-memory')
begins an error scope and pushes it onto the stack.
This scope captures only errors of a particular type depending on the type of error the application
wants to detect.
It is rare to need to detect both, so two nested error scopes are needed to do so.
GPUDevice.popErrorScope()
ends an error scope, popping it from the stack and returning a Promise<GPUError?>
, which resolves once enclosed operations have completed and reported back.
This includes exactly all fallible operations that were issued during between the push and pop calls.
It resolves to null
if no errors were captured, and otherwise resolves to an object describing
the first error that was captured by the scope - either a GPUValidationError
or a GPUOutOfMemoryError
.
Any device-timeline error from an operation is passed to the top-most error scope on the stack at the time it was issued.
-
If an error scope captures an error, the error is not passed down the stack. Each error scope stores only the first error it captures; any further errors it captures are silently ignored.
-
If not, the error is passed down the stack to the enclosing error scope.
-
If an error reaches the bottom of the stack, it may2 fire the
uncapturederror
event onGPUDevice
3 (and could issue a console warning as well).
1 In the plan to add § 3.6 Multithreading, error scope state to actually be per-device, per-realm. That is, when a GPUDevice is posted to a Worker for the first time, the error scope stack for that device+realm is always empty. (If a GPUDevice is copied back to an execution context it already existed on, it shares its error scope state with all other copies on that execution context.)
2 The implementation may not choose to always fire the event for a given error, for example if it has fired too many times, too many times rapidly, or with too many errors of the same kind. This is similar to how Dev Tools console warnings work today for WebGL. In poorly-formed applications, this mechanism can prevent the events from having a significant performance impact on the system.
3 More specifically, with § 3.6 Multithreading, this event would only exist on the originating GPUDevice
(the one that came from createDevice
, and not by receiving posted messages);
a distinct interface would be used for non-originating device objects.
enum GPUErrorFilter {"out-of-memory" ,"validation" };interface GPUOutOfMemoryError {constructor (); };interface GPUValidationError {constructor (DOMString message );readonly attribute DOMString message ; };typedef (GPUOutOfMemoryError or GPUValidationError )GPUError ;partial interface GPUDevice {undefined pushErrorScope (GPUErrorFilter filter );Promise <GPUError ?>popErrorScope (); };
3.3.2.1. How this solves Fallible Allocation
If a call that fallibly allocates GPU memory (e.g. createBuffer
or createTexture
) fails, the
resulting object is invalid (same as if there were a validation error), but an 'out-of-memory'
error is generated.
An 'out-of-memory'
error scope can be used to detect it.
Example: tryCreateBuffer
async function tryCreateBuffer( device: GPUDevice , descriptor: GPUBufferDescriptor ) : Promise< GPUBuffer| null > { device. pushErrorScope( 'out-of-memory' ); const buffer= device. createBuffer( descriptor); if ( await device. popErrorScope() !== null ) { return null ; } return buffer; }
This interacts with buffer mapping error cases in subtle ways due to numerous possible out-of-memory situations in implementations, but they are not explained here. The principle used to design the interaction is that app code should need to handle as few different edge cases as possible, so multiple kinds of situations should result in the same behavior.
In addition, there are (will be) rules on the relative ordering of most promise resolutions, to prevent non-portable browser behavior or flaky races between async code.
3.3.2.2. How this solves Fallible Validation
A 'validation'
error scope can be used to detect validation errors, as above.
Example: Testing
device. pushErrorScope( 'out-of-memory' ); device. pushErrorScope( 'validation' ); { // (Do stuff that shouldn't produce errors.) { device. pushErrorScope( 'validation' ); device. doOperationThatIsExpectedToError(); device. popErrorScope(). then( error=> { assert( error!== null ); }); } // (More stuff that shouldn't produce errors.) } // Detect unexpected errors. device. popErrorScope(). then( error=> { assert( error=== null ); }); device. popErrorScope(). then( error=> { assert( error=== null ); });
3.3.2.3. How this solves App Telemetry
As mentioned above, if an error is not captured by an error scope, it may fire the
originating device’s uncapturederror
event.
Applications can either watch for that event, or encapsulate parts of their application with
error scopes, to detect errors for generating error reports.
uncapturederror
is not strictly necessary to solve this, but has the benefit of providing a
single stream for uncaptured errors from all threads.
3.3.2.4. Error Messages and Debug Labels
Every WebGPU object has a read-write attribute, label
, which can be set by the application to
provide information for debugging tools (error messages, native profilers like Xcode, etc.)
Every WebGPU object creation descriptor has a member label
which sets the initial value of the
attribute.
Additionally, parts of command buffers can be labeled with debug markers and debug groups. See § 3.7.1 Debug Markers and Debug Groups.
For both debugging (dev tools messages) and app telemetry (uncapturederror
)
implementations can choose to report some kind of "stack trace" in their error messages,
taking advantage of object debug labels.
For example, a debug message string could be:
<myQueue>.submit failed: - commands[0] (<mainColorPass>) was invalid: - in the debug group <environment>: - in the debug group <tree 123>: - in setIndexBuffer, indexBuffer (<mesh3.indices>) was invalid: - in createBuffer, desc.usage (0x89) was invalid
3.3.3. Alternatives Considered
-
Synchronous
getError
, like WebGL. Discussed at the beginning: § 3.3.2 Error Scopes. -
Callback-based error scope:
device.errorScope('out-of-memory', async () => { ... })
. Since it’s necessary to allow asynchronous work inside error scopes, this formulation is actually largely equivalent to the one shown above, as the callback could never resolve. Application architectures would be limited by the need to conform to a compatible call stack, or they would remap the callback-based API into a push/pop-based API. Finally, it’s generally not catastrophic if error scopes become unbalanced, though the stack could grow unboundedly resulting in an eventual crash (or device loss).
3.4. Device Loss
Any situation that prevents further use of a GPUDevice
results in a device loss.
These can arise due to WebGPU calls or external events; for example: device.destroy()
, an unrecoverable out-of-memory condition, a GPU process crash, a long
operation resulting in GPU reset, a GPU reset caused by another application, a discrete GPU being
switched off to save power, or an external GPU being unplugged.
Design principle: There should be as few different-looking error behaviors as possible. This makes it easier for developers to test their app’s behavior in different situations, improves robustness of applications in the wild, and improves portability between browsers.
Finish this explainer (see ErrorHandling.md).
3.5. Buffer Mapping
A GPUBuffer
represents a memory allocation usable by other GPU operations.
This memory can be accessed linearly, contrary to GPUTexture
for which the actual memory layout of sequences of texels are unknown. Think of GPUBuffers
as the result of gpu_malloc()
.
CPU→GPU: When using WebGPU, applications need to transfer data from JavaScript to GPUBuffer
very often and potentially in large quantities.
This includes mesh data, drawing and computations parameters, ML model inputs, etc.
That’s why an efficient way to update GPUBuffer
data is needed. GPUQueue.writeBuffer
is reasonably efficient but includes at least an extra copy compared to the buffer mapping used for writing buffers.
GPU→CPU: Applications also often need to transfer data from the GPU to Javascript, though usually less often and in lesser quantities. This includes screenshots, statistics from computations, simulation or ML model results, etc. This transfer is done with buffer mapping for reading buffers.
See § 2.2 Memory Visibility with GPUs and GPU Processes for additional background on the various types of memory that buffer mapping interacts with.
3.5.1. CPU-GPU Ownership Transfer
In native GPU APIs, when a buffer is mapped, its content becomes accessible to the CPU. At the same time the GPU can keep using the buffer’s content, which can lead to data races between the CPU and the GPU. This means that the usage of mapped buffer is simple but leaves the synchronization to the application.
On the contrary, WebGPU prevents almost all data races in the interest of portability and consistency.
In WebGPU there is even more risk of non-portability with races on mapped buffers because of the additional "shared memory" step that may be necessary on some drivers.
That’s why GPUBuffer
mapping is done as an ownership transfer between the CPU and the GPU.
At each instant, only one of the two can access it, so no race is possible.
When an application requests to map a buffer, it initiates a transfer of the buffer’s ownership to the CPU. At this time, the GPU may still need to finish executing some operations that use the buffer, so the transfer doesn’t complete until all previously-enqueued GPU operations are finished. That’s why mapping a buffer is an asynchronous operation (we’ll discuss the other arguments below):
typedef [EnforceRange ]unsigned long GPUMapModeFlags ;namespace GPUMapMode {const GPUFlagsConstant READ = 0x0001;const GPUFlagsConstant WRITE = 0x0002; };partial interface GPUBuffer {Promise <undefined >mapAsync (GPUMapModeFlags mode ,optional GPUSize64 offset = 0,optional GPUSize64 size ); };
// Mapping a buffer for writing. Here offset and size are defaulted, // so the whole buffer is mapped. const myMapWriteBuffer= ...; await myMapWriteBuffer. mapAsync( GPUMapMode. WRITE); // Mapping a buffer for reading. Only the first four bytes are mapped. const myMapReadBuffer= ...; await myMapReadBuffer. mapAsync( GPUMapMode. READ, 0 , 4 );
Once the application has finished using the buffer on the CPU, it can transfer ownership back to the GPU by unmapping it.
This is an immediate operation that makes the application lose all access to the buffer on the CPU (i.e. detaches ArrayBuffers
):
partial interface GPUBuffer {undefined unmap (); };
const myMapReadBuffer= ...; await myMapReadBuffer. mapAsync( GPUMapMode. READ, 0 , 4 ); // Do something with the mapped buffer. buffer. unmap();
When transferring ownership to the CPU, a copy may be necessary from the underlying mapped buffer to shared memory visible to the content process.
To avoid copying more than necessary, the application can specify which range it is interested in when calling GPUBuffer.mapAsync
.
GPUBuffer.mapAsync
’s mode
argument controls which type of mapping operation is performed.
At the moment its values are redundant with the buffer creation’s usage flags, but it is present for explicitness and future extensibility.
While a GPUBuffer
is owned by the CPU, it is not possible to submit any operations on the device timeline that use it; otherwise, a validation error is produced.
However it is valid (and encouraged!) to record GPUCommandBuffer
s using the GPUBuffer
.
3.5.2. Creation of Mappable Buffers
The physical memory location for a GPUBuffer
’s underlying buffer depends on whether it should be mappable and whether it is mappable for reading or writing (native APIs give some control on the CPU cache behavior for example).
At the moment mappable buffers can only be used to transfer data (so they can only have the correct COPY_SRC
or COPY_DST
usage in addition to a MAP_*
usage),
That’s why applications must specify that buffers are mappable when they are created using the (currently) mutually exclusive GPUBufferUsage.MAP_READ
and GPUBufferUsage.MAP_WRITE
flags:
const myMapReadBuffer= device. createBuffer({ usage: GPUBufferUsage. MAP_READ| GPUBufferUsage. COPY_DST, size: 1000 , }); const myMapWriteBuffer= device. createBuffer({ usage: GPUBufferUsage. MAP_WRITE| GPUBufferUsage. COPY_SRC, size: 1000 , });
3.5.3. Accessing Mapped Buffers
Once a GPUBuffer
is mapped, it is possible to access its memory from JavaScript
This is done by calling GPUBuffer.getMappedRange
, which returns an ArrayBuffer
called a "mapping".
These are available until GPUBuffer.unmap
or GPUBuffer.destroy
is called, at which point they are detached.
These ArrayBuffer
s typically aren’t new allocations, but instead pointers to some kind of shared memory visible to the content process (IPC shared memory, mmap
ped file descriptor, etc.)
When transferring ownership to the GPU, a copy may be necessary from the shared memory to the underlying mapped buffer. GPUBuffer.getMappedRange
takes an optional range of the buffer to map (for which offset
0 is the start of the buffer).
This way the browser knows which parts of the underlying GPUBuffer
have been "invalidated" and need to be updated from the memory mapping.
The range must be within the range requested in mapAsync()
.
partial interface GPUBuffer {ArrayBuffer getMappedRange (optional GPUSize64 offset = 0,optional GPUSize64 size ); };
const myMapReadBuffer= ...; await myMapReadBuffer. mapAsync( GPUMapMode. READ); const data= myMapReadBuffer. getMappedRange(); // Do something with the data myMapReadBuffer. unmap();
3.5.4. Mapping Buffers at Creation
A common need is to create a GPUBuffer
that is already filled with some data.
This could be achieved by creating a final buffer, then a mappable buffer, filling the mappable buffer, and then copying from the mappable to the final buffer, but this would be inefficient.
Instead this can be done by making the buffer CPU-owned at creation: we call this "mapped at creation".
All buffers can be mapped at creation, even if they don’t have the MAP_WRITE
buffer usages.
The browser will just handle the transfer of data into the buffer for the application.
Once a buffer is mapped at creation, it behaves as regularly mapped buffer: GPUBUffer.getMappedRange()
is used to retrieve ArrayBuffer
s, and ownership is transferred to the GPU with GPUBuffer.unmap()
.
mappedAtCreation: true
in the buffer descriptor on creation:
const buffer= device. createBuffer({ usage: GPUBufferUsage. UNIFORM, size: 256 , mappedAtCreation: true , }); const data= buffer. getMappedRange(); // write to data buffer. unmap();
When using advanced methods to transfer data to the GPU (with a rolling list of buffers that are mapped or being mapped), mapping buffer at creation can be used to immediately create additional space where to put data to be transferred.
3.5.5. Examples
const dracoDecoder= ...; const buffer= device. createBuffer({ usage: GPUBuffer. VERTEX| GPUBuffer. INDEX, size: dracoDecoder. decompressedSize, mappedAtCreation: true , }); dracoDecoder. decodeIn( buffer. getMappedRange()); buffer. unmap();
const texture= getTheRenderedTexture(); const readbackBuffer= device. createBuffer({ usage: GPUBufferUsage. COPY_DST| GPUBufferUsage. MAP_READ, size: 4 * textureWidth* textureHeight, }); // Copy data from the texture to the buffer. const encoder= device. createCommandEncoder(); encoder. copyTextureToBuffer( { texture}, { buffer: readbackBuffer, bytesPerRow: textureWidth* 4 }, [ textureWidth, textureHeight], ); device. queue. submit([ encoder. finish()]); // Get the data on the CPU. await readbackBuffer. mapAsync( GPUMapMode. READ); saveScreenshot( readbackBuffer. getMappedRange()); readbackBuffer. unmap();
void frame() { // Create a new buffer for our updates. In practice we would // reuse buffers from frame to frame by re-mapping them. const stagingBuffer= device. createBuffer({ usage: GPUBufferUsage. MAP_WRITE| GPUBufferUsage. COPY_SRC, size: 16 * objectCount, mappedAtCreation: true , }); const stagingData= new Float32Array( stagingBuffer. getMappedRange()); // For each draw we are going to: // - Put the data for the draw in stagingData. // - Record a copy from the stagingData to the uniform buffer for the draw // - Encoder the draw const copyEncoder= device. createCommandEncoder(); const drawEncoder= device. createCommandEncoder(); const renderPass= myCreateRenderPass( drawEncoder); for ( var i= 0 ; i< objectCount; i++ ) { stagingData[ i* 4 + 0 ] = ...; stagingData[ i* 4 + 1 ] = ...; stagingData[ i* 4 + 2 ] = ...; stagingData[ i* 4 + 3 ] = ...; const { uniformBuffer, uniformOffset} = getUniformsForDraw( i); copyEncoder. copyBufferToBuffer( stagingBuffer, i* 16 , uniformBuffer, uniformOffset, 16 ); encodeDraw( renderPass, { uniformBuffer, uniformOffset}); } renderPass. end(); // We are finished filling the staging buffer, unmap() it so // we can submit commands that use it. stagingBuffer. unmap(); // Submit all the copies and then all the draws. The copies // will happen before the draw such that each draw will use // the data that was filled inside the for-loop above. device. queue. submit([ copyEncoder. finish(), drawEncoder. finish() ]); }
3.6. Multithreading
Multithreading is a key part of modern graphics APIs. Unlike OpenGL, newer APIs allow applications to encode commands, submit work, transfer data to the GPU, and so on, from multiple threads at once, alleviating CPU bottlenecks. This is especially relevant to WebGPU, since IDL bindings are generally much slower than C calls.
WebGPU does not yet allow multithreaded use of a single GPUDevice
, but the API has been
designed from the ground up with this in mind.
This section describes the tentative plan for how it will work.
As described in § 2.1 Sandboxed GPU Processes in Web Browsers, most WebGPU objects are actually just "handles" that refer to
objects in the browser’s GPU process.
As such, it is relatively straightforward to allow these to be shared among threads.
For example, a GPUTexture
object can simply be postMessage()
d to another thread, creating a
new GPUTexture
JavaScript object containing a handle to the same (ref-counted) GPU-process object.
Several objects, like GPUBuffer
, have client-side state.
Applications still need to use them from multiple threads without having to postMessage
such
objects back and forth with [Transferable]
semantics (which would also create new wrapper
objects, breaking old references).
Therefore, these objects will also be [Serializable]
but have a small amount of (content-side) shared state, just like SharedArrayBuffer
.
Though access to this shared state is somewhat limited - it can’t be changed arbitrarily quickly
on a single object - it might still be a timing attack vector, like SharedArrayBuffer
,
so it is tentatively gated on cross-origin isolation.
See Timing attacks.
-
Main:
const B1 = device.createBuffer(...);
. -
Main: uses postMessage to send
B1
to Worker. -
Worker: receives message →
B2
. -
Worker:
const mapPromise = B2.mapAsync()
→ successfully puts the buffer in the "map pending" state. -
Main:
B1.mapAsync()
→ throws an exception (and doesn’t change the state of the buffer). -
Main: encodes some command that uses
B1
, like:encoder
. copyBufferToTexture( B1, T); const commandBuffer= encoder. finish(); → succeeds, because this doesn’t depend on the buffer’s client side state.
-
Main:
queue.submit(commandBuffer)
→ asynchronous WebGPU error, because the CPU currently owns the buffer. -
Worker:
await mapPromise
, writes to the mapping, then callsB2.unmap()
. -
Main:
queue.submit(commandBuffer)
→ succeeds -
Main:
B1.mapAsync()
→ successfully puts the buffer in the "map pending" state
Further discussion can be found in #354 (note not all of it reflects current thinking).
3.6.1. Unsolved: Synchronous Object Transfer
Some application architectures require objects to be passed between threads without having to asynchronously wait for a message to arrive on the receiving thread.
The most crucial class of such architectures are in WebAssembly applications:
Programs using native C/C++/Rust/etc. bindings for WebGPU will want to assume object handles
are plain-old-data (e.g. typedef struct WGPUBufferImpl* WGPUBuffer;
)
that can be passed between threads freely.
Unfortunately, this cannot be implemented in C-on-JS bindings (e.g. Emscripten) without complex,
hidden, and slow asynchronicity (yielding on the receiving thread, interrupting the sending
thread to send a message, then waiting for the object on the receiving thread).
Some alternatives are mentioned in issue #747:
-
SharedObjectTable
, an object with shared-state (likeSharedArrayBuffer
) containing a table of[Serializable]
values. Effectively, a store into the table would serialize once, and then any thread with theSharedObjectTable
could (synchronously) deserialize the object on demand. -
A synchronous
MessagePort.receiveMessage()
method. This would be less ideal as it would require any thread that creates one of these objects to eagerly send it to every thread, just in case they need it later. -
Allow "exporting" a numerical ID for an object that can be used to "import" the object on another thread. This bypasses the garbage collector and makes it easy to leak memory.
3.7. Command Encoding and Submission
Many operations in WebGPU are purely GPU-side operations that don’t use data from the CPU.
These operations are not issued directly; instead, they are encoded into GPUCommandBuffer
s
via the builder-like GPUCommandEncoder
interface, then later sent to the GPU with gpuQueue.submit()
.
This design is used by the underlying native APIs as well. It provides several benefits:
-
Command buffer encoding is independent of other state, allowing encoding (and command buffer validation) work to utilize multiple CPU threads.
-
Provides a larger chunk of work at once, allowing the GPU driver to do more global optimization, especially in how it schedules work across the GPU hardware.
3.7.1. Debug Markers and Debug Groups
For error messages and debugging tools, it is possible to label work inside a command buffer. (See § 3.3.2.4 Error Messages and Debug Labels.)
-
insertDebugMarker(markerLabel)
marks a point in a stream of commands. -
pushDebugGroup(groupLabel)
/popDebugGroup()
nestably demarcate sub-streams of commands. This can be used e.g. to label which part of a command buffer corresponds to different objects or parts of a scene.
3.7.2. Passes
3.8. Pipelines
3.9. Image, Video, and Canvas input
Exact API still in flux as of this writing.
WebGPU is largely isolated from the rest of the Web platform, but has several interop points.
One of these is image data input into the API.
Aside from the general data read/write mechanisms (writeTexture
, writeBuffer
, and mapAsync
),
data can also come from <img>
/ImageBitmap
, canvases, and videos.
There are many use-cases that require these, including:
-
Initializing textures from encoded images (JPEG, PNG, etc.)
-
Rendering text with 2D canvas for use in WebGPU.
-
Video element and video camera input for image processing, ML, 3D scenes, etc.
There are two paths:
-
copyExternalImageToTexture()
copies color data from a sub-rectangle of an image/video/canvas object into an equally-sized sub-rectangle of aGPUTexture
. The input data is captured at the moment of the call. -
importExternalTexture()
takes a video or canvas and creates aGPUExternalTexture
object which can provide direct read access to an underlying resource if it exists on the (same) GPU already, avoiding unnecessary copies or CPU-GPU bandwidth. This is typically true of hardware-decoded videos and most canvas elements.
3.9.1. GPUExternalTexture
A GPUExternalTexture
is a sampleable texture object which can be used in similar ways to normal
sampleable GPUTexture
objects.
In particular, it can be bound as a texture resource to a shader and used directly from the GPU:
when it is bound, additional metadata is attached that allows WebGPU to "automagically"
transform the data from its underlying representation (e.g. YUV) to RGB sampled data.
A GPUExternalTexture
represents a particular imported image, so the underlying data must not
change after import, either from internal (WebGPU) or external (Web platform) access.
Describe how this is achieved for video element, VideoFrame, canvas element, and OffscreenCanvas.
3.10. Canvas Output
Historically, drawing APIs (2d canvas, WebGL) are initialized from canvases using getContext()
.
However, WebGPU is more than a drawing API, and many applications do not need a canvas.
WebGPU is initialized without a canvas - see § 3.1.1 Adapter Selection and Device Init.
Following this, WebGPU has no "default" drawing buffer. Instead, a WebGPU device may be connected to any number of canvases (zero or more) and render to any number of them each frame.
Canvas context creation and WebGPU device creation are decoupled.
Any GPUCanvasContext
may be dynamically used with any GPUDevice
.
This makes device switches easy (e.g. after recovering from a device loss).
(In comparison, WebGL context restoration is done on the same WebGLRenderingContext
object,
even though context state does not persist across loss/restoration.)
In order to access a canvas, an app gets a GPUTexture
from the GPUCanvasContext
and then writes to it, as it would with a normal GPUTexture
.
3.10.1. Canvas Configuration
Canvas GPUTexture
s are vended in a very structured way:
-
canvas.getContext('webgpu')
provides aGPUCanvasContext
. -
GPUCanvasContext.configure({ device, format, usage })
modifies the current configuration invalidating any previous texture object, attaching the canvas to the provided device, and setting options for vended textures and canvas behavior. -
Resizing the canvas also invalidates previous texture objects.
-
GPUCanvasContext.getCurrentTexture()
provides aGPUTexture
. -
GPUCanvasContext.unconfigure()
returns the context to its initial, unconfigured state.
This structure provides maximal compatibility with optimized paths in native graphics APIs. In these, typically, a platform-specific "surface" object can produce an API object called a "swap chain" which provides, possibly up-front, a possibly-fixed list of 1-3 textures to render into.
3.10.2. Current Texture
A GPUCanvasContext
provides a "current texture" via getCurrentTexture()
.
For canvas
elements, this returns a texture for the current frame:
-
On
getCurrentTexture()
, a new[[drawingBuffer]]
is created if one doesn’t exist for the current frame, wrapped in aGPUTexture
, and returned. -
During the "Update the rendering" step, the
[[drawingBuffer]]
becomes readonly. Then, it is shared by the browser compositor (for display) and the page’s canvas (readable using drawImage/toDataURL/etc.)
3.10.3. getPreferredCanvasFormat()
Due to framebuffer hardware differences, different devices have different preferred byte layouts
for display surfaces.
Any allowed format is allowed on all systems, but applications may save power by using the
preferred format.
The exact format cannot be hidden, because the format is observable - e.g.,
in the behavior of a copyBufferToTexture
call and in compatibility rules with render pipelines
(which specify a format, see GPUColorTargetState.format
).
Most hardware prefers bgra8unorm
(4 bytes in BGRA order) or is agnostic, while some mobile and
embedded devices (like Android phones) prefer rgba8unorm
(4 bytes in RGBA order).
For high-bit-depth, different systems may also prefer different formats,
like rgba16float
or rgb10a2unorm
.
3.10.4. Multiple Displays
Some systems have multiple displays with different capabilities (e.g. HDR vs non-HDR). Browser windows can be moved between these displays.
As today with WebGL, user agents can make their own decisions about how to expose these capabilities, e.g. choosing the capabilities of the initial, primary, or most-capable display.
In the future, an event might be provided that allows applications to detect when a canvas moves
to a display with different properties so they can call getPreferredCanvasFormat()
and configure()
again.
3.10.4.1. Multiple Adapters
Some systems have multiple displays connected to different hardware adapters; for example, laptops with switchable graphics might have the internal display connected to the integrated GPU and the HDMI port connected to the discrete GPU.
This can incur overhead, as rendering on one adapter and displaying on another typically incurs a copy or direct-memory-access (DMA) over a PCI bus.
Currently, WebGPU does not provide a way to detect which adapter is optimal for a given display. In the future, applications may be able to detect this, and receive events when this changes.
3.11. Bitflags
WebGPU uses C-style bitflags in several places.
(Search GPUFlagsConstant
in the spec for instances.)
A typical bitflag definition looks like this:
typedef [EnforceRange ]unsigned long GPUColorWriteFlags ; [Exposed =Window ]namespace GPUColorWrite {const GPUFlagsConstant RED = 0x1;const GPUFlagsConstant GREEN = 0x2;const GPUFlagsConstant BLUE = 0x4;const GPUFlagsConstant ALPHA = 0x8;const GPUFlagsConstant ALL = 0xF; };
This was chosen because there is no other particularly ergonomic way to describe "enum sets" in JavaScript today.
Bitflags are used in WebGL, which many WebGPU developers will be familiar with. They also match closely with the API shape that would be used by many native-language bindings.
The closest option is sequence<enum type>
, but it doesn’t naturally describe
an unordered set of unique items and doesn’t easily allow things like GPUColorWrite.ALL
above.
Additionally, sequence<enum type>
has significant overhead, so we would have to avoid it in any
APIs that are expected to be "hot paths" (like command encoder methods), causing inconsistency with
parts of the API that do use it.
See also issue #747 which mentions that strongly-typed bitflags in JavaScript would be useful.
4. Security and Privacy (self-review)
This section is the Security and Privacy self-review. You can also see the Malicious use considerations section of the specification.
4.1. What information might this feature expose to Web sites or other parties, and for what purposes is that exposure necessary?
The feature exposes information about the system’s GPUs (or lack thereof).
It allows determining if one of the GPUs in the system supports WebGPU by requesting a GPUAdapter
without software fallback.
This is necessary for sites to be able to fallback to hardware-accelerated WebGL if the system doesn’t support hardware-accelerated WebGPU.
For requested adapters the feature exposes a name, set of optional WebGPU capabilities that the GPUAdapter
supports, as well as a set of numeric limits that the GPUAdapter
supports.
This is necessary because there is a lot of diversity in GPU hardware and while WebGPU target the lowest common denominator it is meant to scale to expose more powerful features when the hardware allows it.
The name can be surfaced to the user when choosing, for example to let it choose an adapter and can be used by sites to do GPU-specific workarounds (this was critical in the past for WebGL).
Note that the user agent controls which name, optional features, and limits are exposed. It is not possible for sites to differentiate between hardware not supporting a feature and the user agent choosing not to expose it. User agents are expected to bucket the actual capabilities of the GPU and only expose a limited number of such buckets to the site.
4.2. Do features in your specification expose the minimum amount of information necessary to enable their intended uses?
Yes. WebGPU only requires exposing if hardware-accelerated WebGPU is available, not why, or if the browser chose to not expose it etc.
For the name, optional features, and limits the information exposed is not specified to be minimal because each site might require a different subset of the limits and optional features. Instead the information exposed is controlled by the user-agent that is expected to only expose a small number of buckets that all expose the same information.
4.3. How do the features in your specification deal with personal information, personally-identifiable information (PII), or information derived from them?
WebGPU doesn’t deal with PII unless the site puts PII inside the API, which means that Javascript got access to the PII before WebGPU could.
4.4. How do the features in your specification deal with sensitive information?
WebGPU doesn’t deal with sensitive information. However some of the information it exposes could be correlated with sensitive information: the presence of powerful optional features or a high speed of GPU computation would allow deducing access to "high-end" GPUs which itself correlates with other information.
4.5. Do the features in your specification introduce new state for an origin that persists across browsing sessions?
The WebGPU specification doesn’t introduce new state. However implementations are expected to cache the result of compiling shaders and pipelines. This introduces state that could be inspected by measuring how long compilation of a set of shaders and pipelines take. Note that GPU drivers also have their own caches so user-agents will have to find ways to disable that cache (otherwise state could be leaked across origins).
4.6. Do the features in your specification expose information about the underlying platform to origins?
Yes.
The specification exposes whether hardware-accelerated WebGPU is available and a user-agent controlled name and set of optional features and limits each GPUAdapter
supports.
Different requests for adapters returning adapters with different capabilities would also indicate the system contains multiple GPUs.
4.7. Does this specification allow an origin to send data to the underlying platform?
WebGPU allows sending data to the system’s GPU. The WebGPU specification prevents ill-formed GPU commands from being sent to the hardware. It is also expected that user-agents will have work-arounds for bugs in the driver that could cause issue even with well-formed GPU commands.
4.8. Do features in this specification allow an origin access to sensors on a user’s device?
No.
4.9. What data do the features in this specification expose to an origin? Please also document what data is identical to data exposed by other features, in the same or different contexts.
WebGPU exposes whether hardware-accelerated WebGPU is available, which is a new piece of data. The adapter’s name, optional features, and limits has a large intersection with WebGL’s RENDERER_STRING, limits and extensions: even limits not in WebGL can mostly be deduced from the other limits exposed by WebGL (by deducing what GPU model the system has).
4.10. Do features in this specification enable new script execution/loading mechanisms?
Yes.
WebGPU allows running arbitrary GPU computations specified with the WebGPU Shading Language (WGSL).
WGSL is compiled into a GPUShaderModule
objects that are then used to specify "pipelines" that run computations on the GPU.
4.11. Do features in this specification allow an origin to access other devices?
No. WebGPU allows access to PCI-e and external GPUs plugged into the system but these are just part of the system.
4.12. Do features in this specification allow an origin some measure of control over a user agent’s native UI?
No. However WebGPU can be used to render to fullscreen or WebXR which does change the UI. WebGPU can also run GPU computations that take too long and cause of device timeout and a restart of GPU (TDR), which can produce a couple system-wide black frames. Note that this is possible with "just" HTML / CSS but WebGPU makes it easier to cause a TDR.
4.13. What temporary identifiers do the features in this specification create or expose to the web?
None.
4.14. How does this specification distinguish between behavior in first-party and third-party contexts?
There are no specific behavior difference between first-party and third-party contexts.
However the user-agent can decide to limit the GPUAdapters
returned to third-party contexts: by using fewer buckets, by using a single bucket, or by not exposing WebGPU.
4.15. How do the features in this specification work in the context of a browser’s Private Browsing or Incognito mode?
There is no difference in Incognito mode, but the user-agent can decide to limit the GPUAdapters
returned.
User-agents will need to be careful not to reuse the shader compilation caches when in Incognito mode.
4.16. Does this specification have both "Security Considerations" and "Privacy Considerations" sections?
Yes. They are both under the Malicious use considerations section.
4.17. Do features in your specification enable origins to downgrade default security protections?
No. Except that WebGPU can be used to render to fullscreen or WebXR.
4.18. What should this questionnaire have asked?
Does the specification allow interacting with cross-origin data? With DRM data?
At the moment WebGPU cannot do that but it is likely that someone will request these features in the future. It might be possible to introduce the concept of "protected queues" that only allow computations to end up on the screen, and not into Javascript. However investigation in WebGL show that GPU timings can be used to leak from such protected queues.