Kernel and Op Implementation and Registration API

Status Accepted
Author(s) James Ring (
Sponsor Günhan Gülsoy (
Updated 2020-06-02


Tensorflow (TF) currently provides a C++ API for implementing kernels and ops. The Voltron project aims to create a modular/plugin-based TF implementation with API and ABI surfaces. Plugins will be able to create and register custom kernel and op implementations.

In order to provide a stable ABI, the Voltron team has chosen to provide C APIs to plugin authors. This document introduces the C API for op and kernel registration. For authors who wish to continue using C++ to interface with TensorFlow, an ABI-stable C++ header-only API is provided.


Presently, there is no ABI-stable API for extending TensorFlow with new kernels and ops. There is no guarantee that a plugin written with one compiler will work with a version of TensorFlow built with another, even on the same operating system and architecture. This makes it difficult to distribute plugins without also distributing the source code and requiring end-users to build the plugin alongside TensorFlow.

An ABI-stable API for extending TensorFlow will simplify the distribution of plugins and allow plugin authors to distribute binary artifacts without necessarily publishing plugin source code.

User Benefit

Plugin authors will be able to publish plugins that users can use more easily. In turn, the TensorFlow community will benefit from an increase in the number of variety of available plugins.

Design Overview

In general, the kernel and op registration C APIs aim to permit the implementation of any kernel or op that is currently possible with the C++ API. Where possible, existing C++ function implementations are reused from within a C wrapper. The purpose of the wrapper is simply to provide ABI stability.

Since plugins will be dynamically loaded (e.g. via dlopen on POSIX), the API avoids relying on static initialization.

The intention is that existing kernels should be able to be ported to the new APIs with a minimum of reimplementation effort. This precludes a from-scratch re-imagining of TensorFlow APIs.

The following diagram describes the components built with the proposed C and C++ APIs.

                +----------------+ <--+
                |                |    |
                | Plugin         |    |
                |                |    |
                +----------------+    |
                |                |    |
                | C++ header API |    |  Plugin
                |                |    |
           +--> +----------------+    |
           |    |                |    |
           |    | C API headers  |    |
           |    |                |    |
           |    +----------------+ <--+
           |    |                |
           |    | C API impl     |
   Core    |    |                |
Tensorflow |    +----------------+   |    |                |
           |    | Core C++ APIs  |
           |    |                |
           +--> +----------------+

In this example, there are two object files: and is implemented in terms of the C++ header-only API, which is in turn implemented in terms of the C API headers. The C API implementation is provided by TensorFlow at runtime when it loads the plugin's shared object.

This design addresses changes that are required to the existing C API that are required to support op and kernel plugins. It also introduces the C++ header-only API, which currently does not exist.


This section introduces changes to the C API that are required to support ops. An alpha version of this API is already checked in at tensorflow/c/ops.h.


In the C++ API, ops are registered at static initialization time using the REGISTER_OP macro. For example:

  .Input("input: T")
  .Output("output: type")
  .Attr("T: {bfloat16, ...}")
  .Attr("type: {bfloat16, ...}")
  .SetShapeFn([](InferenceContext* ctx) { ... })
  .Doc("A bitcast operator");

The equivalent C API will be a series of functions that operate on TF_OpDefinitionBuilder *, a pointer to an opaque struct (i.e. a struct whose content is not made known to the user). The functions include, but are not limited to:

  • TF_OpDefinitionBuilder* TF_NewOpDefinitionBuilder(const char* op_name): constructs and returns a new op registration builder for an op with the given name

  • void TF_OpDefinitionBuilderAddAttr(TF_OpDefinitionBuilder* builder, const char* attr): adds the given attribute to the builder (equivalent to Attr above)

  • void TF_OpDefinitionBuilderAddInput(TF_OpDefinitionBuilder* builder, const char* input): adds the given input to the builder (equivalent to Input above)

Additional functions are provided for setting other properties of the operation (e.g. TF_OpDefinitionBuilderSetIsCommutative).

Registration is then actually performed using the TF_RegisterOpDefinition function. This function populates a TF_Status indicating whether registration was successful and frees the resources associated with the op definition builder.

The C equivalent of the bitcast op registration example above is shown below:

#include "tensorflow/c/ops.h"

void InferBitcastShape(TF_ShapeInferenceContext* ctx,  // see the section below on
                       TF_Status* status);             // shape inference

void InitPlugin() {
  TF_OpDefinitionBuilder* b = TF_NewOpDefinitionBuilder("Bitcast");
  TF_OpDefinitionBuilderAddInput(b, "input: T");
  TF_OpDefinitionBuilderAddOutput(b, "output: type");
  TF_OpDefinitionBuilderAddAttr(b, "T: {bfloat16, ...}");
  TF_OpDefinitionBuilderAddAttr(b, "type: {bfloat16, ...}");
  TF_OpDefinitionBuilderSetShapeInferenceFunction(b, &InferBitcastShape);

  TF_Status* status = TF_NewStatus();
  TF_RegisterOpDefinition(b, status);
  if (TF_GetCode(status) != TF_OK) { /* handle errors */ }

Shape Inference

A significant feature of certain ops is their ability to infer their output shapes. TensorFlow will invoke the registered shape inference function (if one is provided) when it needs to know the op's output shape. The registration function declaration is shown below:

void TF_OpDefinitionBuilderSetShapeInferenceFunction(
  TF_OpDefinitionBuilder* builder,
  void (*shape_inference_func)(TF_ShapeInferenceContext* ctx, TF_Status* status));

A series of functions prefixed with TF_ShapeInferenceContext is provided for the following purposes:

  • Examining operator input shapes (TF_ShapeInferenceContextGetInput)

  • Creating and deleting shape and dimension handles (TF_{New,Delete}ShapeHandle, TF_{New,Delete}DimensionHandle)

  • Manipulating shape and dimension handles (TF_ShapeInferenceContextWithRank, TF_ShapeInferenceContextDim)

In general, C analogues to the C++ methods in tensorflow::shape_inference (see tensorflow/core/framework/shape_inference.h) will be provided.


This section introduces changes to the C API that are required to support kernels. An alpha version of this API is already checked in at tensorflow/c/kernels.h.


Kernel registration with the C++ API is accomplished with the REGISTER_KERNEL_BUILDER macro. This macro expands to code that relies on static initialization to register the provided kernel with the global kernel registry. See below for an example of registering a kernel with the C++ API:

#include "tensorflow/core/framework/op_kernel.h"

class BitcastOp : public OpKernel {
  explicit BitcastOp(OpKernelConstruction* context) : OpKernel(context) { … }
  void Compute(OpKernelContext* context) override { … }

REGISTER_KERNEL_BUILDER(Name("Bitcast").Device(DEVICE_CPU), BitcastOp)

The equivalent C API provides a series of functions that operate on TF_KernelBuilder, an opaque struct obtained with the TF_NewKernelBuilder call. The kernel builder is registered with TensorFlow using the TF_RegisterKernelBuilder function. See below for an example of registering the bitcast kernel using the C API:

#include "tensorflow/c/kernels.h"

typedef struct bitcast_kernel { … } bitcast_kernel;

// Bitcast_Create, Bitcast_Compute and Bitcast_Delete actually implement the
// kernel. See the section below for discussion on kernel implementation.
static void* Bitcast_Create(TF_OpKernelConstruction* context) {
  bitcast_kernel* k = (bitcast_kernel*) calloc(1, sizeof(bitcast_kernel));
  /* initialize the fields of k as needed */
  return (void*) k;

static void* Bitcast_Compute(void* k, TF_OpKernelContext* context) {
  bitcast_kernel* kernel = (bitcast_kernel*) k;  // this is the pointer returned by
                                                 // Bitcast_Create
  /* compute the result */
  TF_SetOutput(context, ...);

static void Bitcast_Delete(void *k) { free(k); }

void InitPlugin() {
  TF_KernelBuilder* builder = TF_NewKernelBuilder(/*op_name*/"Bitcast", DEVICE_CPU,
      &Bitcast_Create, &Bitcast_Compute, &Bitcast_Delete);
  TF_Status* status = TF_NewStatus();
  TF_RegisterKernelBuilder(/*kernel_name*/"Bitcast", builder, status);
  if (TF_GetCode(status) != TF_OK) { /* handle errors */ }

The registration function prototypes are provided below. Kernel authors must provide a compute function. Creation and deletion functions are optional, but if a creation function is provided that causes memory allocation, a deletion function that frees the memory should also be provided, otherwise a leak will occur.

TF_KernelBuilder* TF_NewKernelBuilder(
  const char* op_name, const char* device_name,
  void* (*create_func)(TF_OpKernelConstruction*),
  void (*compute_func)(void*, TF_OpKernelContext*),
  void (*delete_func)(void*));

void TF_RegisterKernelBuilder(const char* name, TF_KernelBuilder* builder,
                              TF_Status* status);


The main classes for C++ kernel implementations are OpKernelCreation (provided by TensorFlow to the kernel constructor) and OpKernelContext (provided to the kernel's Compute method). The analogues in the C API are TF_OpKernelCreation and TF_OpKernelContext. The aim of the C API is to provide functions for working with these structs that match, as closely as possible, the C++ API.

Inputs and Outputs

Kernels must be able to retrieve their inputs and provide outputs. In the C++ API, the tensorflow::OpKernelContext::GetInput and SetOutput family of functions provide this functionality. The equivalent C calls will be TF_GetInput and TF_SetInput. These functions operate on TF_Tensor, which is already part of the existing TensorFlow C API.

String tensors will be supported in an ABI-stable way. This will require changes to their binary representation described in the tstring design document.

C++ Header-Only API

As described above, the main motivation for providing a C API is ABI stability. However, some programmers may find the C API less convenient than the non-ABI-stable C++ API. To address this concern, we plan to provide a header-only C++ API that is implemented in terms of the ABI-stable C API. This API will contain classes such as Tensor, OpKernelContext, and OpKernelConstruction, whose names will be familiar to existing C++ API users. Ideally, this API will be as close as possible to the existing non-ABI-stable Tensorflow C++ API, so that kernels and ops currently implemented in C++ may be ported to the ABI-stable C++ with as little implementation churn as possible.