Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Tensorflow Core as a product #331

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

sun51
Copy link

@sun51 sun51 commented Nov 20, 2020

Tensorflow Core as a product RFC

Status Proposed
Authors Yanhua Sun (yanhuasun@google.com)
Sponsor Rohan Jain (rohanj@google.com)
Updated 2020-11-13

Objective

This RFC proposes Tensorflow core as a standalone product and describes the high level components and the principles behind it.

@google-cla google-cla bot added the cla: yes label Nov 20, 2020
@ematejska ematejska changed the title Create 20201113-tf-core.md RFC: Tensorflow Core as a product Nov 30, 2020
rfcs/20201113-tf-core.md Outdated Show resolved Hide resolved
ExtensionType (as known as CompositeTensor, RFC): provides a pair of protocols for defining new types that can be directly supported by TensorFlow APIs. (This API is also known as the "CompositeTensor" API). Example ExtensionType in core includes IndexSlices.
TypeSpec:(go/tf-type-spec) provides a standard way to define and reason about TensorFlow-specific types that are supported by TensorFlow APIs.

Note: FullType (go/tf-fulltype) definitions to be added once the design is ready.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need a public link for fulltype



## Objective

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs some more nuance.

When I first read "TF core" I thought it was about separating the C++ library from the python; but below it seems like you want to include some python API as part of core.

So for example when it comes to XLA/MLIR, either JITted or AOT (e.g., the functionality in saved_model_cli), do you plan to keep those in core, or do you expect that all XLA functionality be split off in separate symbolic libraries? If so, how do you expect to support jit_compile=True extension of tf.function?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for separating C++ library from the Python one. The Python API might serve for prototyping purpose if we'd like to collect feedbacks from a broader audience though, e.g. https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/distribute/v1/all_reduce.py

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try and clarify the proposal a bit here. There are couple of directions this RFC is pushing towards

  1. Python API modularity - Currently TF core includes Keras, tf.data, distribution strategies etc. without any clear layering and dependency structure. We want to define a small core on which all these libraries can be layered on top of. We don't plan to remove any APIs from TF or add too many new APIs etc.

  2. C++ first API - We want to think of the "Core API" as primarily a C++ one with a thin (probably 1-1 mapping) layer of python on top. Users can choose to interact with either API. Arguably, one can think of this as an implementation detail but its a real important implementation detail that we wanted to bring folks attention to.

Concretely, the first step we're taking now is defining the boundaries of what this small core API surface is going to look like and ensuring that the other TF APIs (tf.data, distribute, keras etc.) are dependent on it (and not the other way). This involves removing circular dependencies, BUILD file refactoring, moving some files around, creating right extension points / interfaces in TF core with no API surface changes expected whatsoever.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rohan100jain +1
If there is design meeting kindly let us know. I would like to join the call.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, design review meeting is scheduled at 10AM PT Dec 15, 2020

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, we need more time to figure out things better, the design review is cancelled.


### Keys to get there

Understand users’ needs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will require a SIG for the core.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or "users" here refer to the high-level API owners, tf-keras/data/distribute etc. then one could argue internal communication might be more effective/efficient at the initial stage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please involve SIGs leads at least.


Performance

TF Core will be extremely lightweight and performant. We’ll try to do as much computation as possible in C++ with a thin layer on top. For example, Python APIs will be one thin layer on top. This also involves building a clean C API layer where state of the art runtimes like TFRT and Graph Compilers like MLIR can plug in and improve performance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What Is the python layer here? A 1-to-1 wrapper? Or an higher level layer with a reduced surface?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would like see layered Python APIs, with Keras, tf-data, distributed, etc built on core APIs and extension/composite mechanics. This core APIs are a reduced and complete surface. One high level API could be composed by a few core APIs.


* Is this API serving any use cases?

When we add/remove APIs, the first thing we should ask is ‘what is the use case for this APIs’. This API should serve users’ purpose, not developers’ purpose.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we cannot do too much claims for users without a regular SIG.

Also what about custom c++ ops in SIGS?
Can we really analyze composabilty in python space before evaluate a core API change/extension?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some essential and basic C++ Ops. Also a lot of custom c++ ops can be replaced with python code. currently one of main reason for these custom C++ ops is performance. We do have some plans to bring the python code performance closer to custom c++ op performance, so that custom c++ is not required, instead, same functions are implemented in the high level languages, this provides easier programability and flexibility to modify.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this was my point and this seems to me to give a quite high responsibility to the core API set perimeter definition + compiler stack performance to not create a bottleneck.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick question, when I add a new hardware device(TensorFlow modular plugin), for ABI stable, I need register Ops/Kernels with C API, How could register them into TF-Core? also , I see (https://github.com/tensorflow/community/pull/77), if user need a new Python API(eg. for plugin device new Ops/Kernels), they need to interact TF-Core(C++ libraries) with C API, What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sun51 In the meantime can you expose a little but more if and how this is going to interact with https://github.com/tensorflow/tensorflow/tree/master/tensorflow/compiler/mlir/tfr.
Cause I expect that reducing the core API surface is going to create more pressure over ecosystem repos that have the infrastructure for custom ops.

Are the error messages for the APIs easy to understand? Are the error messages clear enough to give the feedback on what is going wrong, and how to fix it?

* Is this API performant?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On what reference runtime? TFRT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general suggestion is to avoid over-emphasizing "performant" without going into detailed analysis to the trade-offs, e.g. why this API design enables high performance implementation and its alternatives do not.


* Is this API similar or redundant with existing ones?

Corresponding to the principle ‘Primitive/Essential/Orthogonality’: If one API can be written by using existing a few APIs together with mechanics core provides, then the API is not essential. Instead of including that particular APIs, the primitive APIs that API is composed of should be included in the core.
Copy link
Contributor

@bhack bhack Dec 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the "few" term Is different in the c++ domain instead of python domain


### Functional components of TF core
* Control flow: tf.cond, tf.while
* tf.function: Trace compile tensorflow programs to accelerate them
Copy link
Contributor

@bhack bhack Dec 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the quality of the compiler performance in a function have an impact on the evaluation about extending the core? Or we will be forwarded on compiler improvements?

@bhack
Copy link
Contributor

bhack commented Dec 1, 2020

Generally I think that this RFC will have some positive impacts if the SIGs leads will be involved in the process and not only in the RFC revision.

What I don't understand is if the scope of this RFC is to have a TF-core only repository in the near future maintained by a specific TF-core team.
This Is related to the C++/C/Python API relationship. Just to make some examples:

  • Keras Is a python only SiG but when It needs c++ ops are contributed in the TF monolitich repo as needed (e.g. image processing)

  • TF.data, if It will be a separare SIG, It requires c++

  • TF Addons and TFIO have custom ops when composabilty or tf.function is not enought to reach performance. But we know the portability ISSUES related to custom ops (Custom Op TPU custom-op#53)

Are we going in a TF world where many SIGs will Need to maintain its own c++/c API to offer High level python API for all the cases not covered by python composabilty on TF-core or when the compiler stack (tf.function) is not efficient enought?

Also, are we sure that we can still minimize duplications across the SIGs ecosystem if the "centralized" core is going to have a minimal surface?
I strongly belive that we will need a robust ecosystem review process with a smaller core proposal.

@mihaimaruseac
Copy link
Contributor

mihaimaruseac commented Dec 1, 2020

There will be a design meeting for this on 15th of December at 10 am Pacific.

If you want to join, here is the Meet link

Edit: meeting has been postponed, see #331 (comment)

Corresponding to the principle ‘Primitive/Essential/Orthogonality’: If one API can be written by using existing a few APIs together with mechanics core provides, then the API is not essential. Instead of including that particular APIs, the primitive APIs that API is composed of should be included in the core.

* Is this API easy to use?
Usability - representation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some formatting (especially line breaks) are broken in this doc when rendered. For example, you need a blank line here to break the line.

* Testing: APIs and functions for users to test Tensorflow programs.
* Basic Utilities: Essential and primitive utilities functions, APIs.

### Math/NN APIs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about RNG ops (both stateless RNGs and tf.random.Generator)?

@sun51
Copy link
Author

sun51 commented Dec 14, 2020

There will be a design meeting for this on 15th of December at 10 am Pacific.

If you want to join, here is the Meet link

Sorry we need more time to figure out things better, the design review is cancelled for now, we will let you know if there is one later.

@sun51
Copy link
Author

sun51 commented Dec 14, 2020

Thanks everyone for the discussion and comments, we will need more time to figure out the design, so we will hold this design RFC and have cancelled the design review for now. Will update later.

@bhack
Copy link
Contributor

bhack commented Dec 14, 2020

Thanks everyone for the discussion and comments, we will need more time to figure out the design, so we will hold this design RFC and have cancelled the design review for now. Will update later.

Do you mean #331 (comment) ?

@Craigacp
Copy link

Would this TF-core have a C API as well as a C++ one, or is that planned to live elsewhere?

@SidneyLann
Copy link

Hope most things do in c/c++

@ematejska
Copy link
Contributor

@rohan100jain Are we still pursuing this RFC?

@bhack
Copy link
Contributor

bhack commented May 3, 2022

It would be really useful if we could revamp and approve this RFC.

@bhack
Copy link
Contributor

bhack commented May 5, 2022

/cc @learning-to-play

@bhack
Copy link
Contributor

bhack commented Jul 22, 2022

Do we have an opportunity to re-evalute/refresh this and all the useful feedback we had collected?

@learning-to-play
Copy link

@joker-eph Do you think there is an opportunity to incorporate the collected feedback into the current refactoring projects? See @bhack's comment above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.