Utility to help developers debug operators: Tensor Inspector #15490

Zha0q1 · 2019-07-08T23:38:03Z

Created a separate PR for a new tutorial on usage.
#15517

Background

When developing new operators, developers need to deal with Tensors, TBlobs, and NDArrays extensively (for simplicity, we will simply refer them as “tensors”). However, there is no existing tool that helps developers inspect the value of the tensors within an operator, so it’s hard to locate the spot where the computation goes wrong. With that said, we can create a new utility that supports inspecting, dumping and verifying tensor values to alleviate the developer’s need to use hacky prints every time. This new utility will support all the three data types (Tensor, TBlob, NDArray), as well as both CPU and GPU tensors.

Some related GitHub issues: #15198

Methodology

First off, to support all three data types, we will need to first convert them to TBlob, so that we can base our new functionalists on one single class and avoid code repetition. To access the value of a tensor/TBlob object, the easiest way is to add some function, such as to_string(), to the TBlob class. Then, we could use shape_ to dereference dptr and extract the actual data from the tensor in a structured format. However, to preserve the integrity of the TBlob class, we would want to avoid making direct changes to tensor_blob.h. Instead, considering that all the member variables of TBlob are all public, we could create a wrapper around TensorBlob, which we would call TensorInspector, and we could then add our new functions to this new wrapper class. We could put this new file tensor_inspector.h in src/common along with a bunch of other utilities.

class TensorInspector {
  private:
    const TBlob tb_;
  public:
    // our new functions
}

Besides tensor inspectors which are one-to-one with tensors, I also propose to have a singleton class called InspectorManage which is responsible for data sharing and maintaining locking between the individual TensorInspector objects. This way, we could control some global behaviors within each of the objects.

The Value of Creating a New Utility

It’s important to point out that there is an existing utility that helps print out the value of a tensor, tests/cpp/include/test_util.h, (https://github.com/apache/incubator-mxnet/blob/master/tests/cpp/include/test_util.h#L472). However, this new utility under discussion aims to start from the use cases of the users, who in our case are developers, and has more functionalities than the existing utility. Note: this new utility will use CAccessAsCPU from test_util.h for data copying from GPU memory to CPU memory.

Proposed Funtionalities/APIs

Create a TensorInspector Object from Tensor, TBlob, and NDarray Objects

We can create a TensorInspector object by passing in a Tensor, TBlob, or NDArray object.

// Convolution-inl.h, Forward()
Tensor<xpu, 3, DType> weight_3d = in_data[conv::kWeight].get_with_shape<xpu, 3, DType>(
        Shape3(group_, M, K), s);
TensorInspector ti(weight_3d, ctx.run_ctx);

// Similarly, we could do:
TensorInspector ti(/*TBlob obj*/, ctx.run_ctx);
// OR:
TensorInspector ti(/*NDArray obj*/, ctx.run_ctx);

Print Tensor Value (static)

We could use std::string TensorInspector::to_string() or void TensorInspector::print_string() to get a formatted string representation of the entire tensor. We could see the tensor value in a structured way, the data type, as well as the tensor shape.

TensorInspector ti(/*Tensor, Tblob, or Ndarray obj*/, /*RunContext obj*/);
ti.to_string(); // print tensor

Interactively Print Tensor Value (dynamic)

During debugging, situations might occur that ahead of time (at compilation time), we do not know which part of a tensor we need to inspect. Also, sometimes we might want to pause the operator control flow to “zoom into” an erroneous part of a tensor multiple times until we are satisfied to move on. In this regard, we could use void TensorInspector::interactive_print(std::string tag = "") like a break point to dynamically specify the part of the tensor to inspect and whether we want to skip the break point. tag is an optional parameter where we can give the break points names so that we can distinguish them. A “Visit” counter would tell us how many times have we stepped into a break point with this tag name.

    TensorInspector ti(weight_3d);
    ti.interactive_print("My Convolution");

As we can see from the screenshot, we would be in a loop that asks us for command. We would be prompted with the Tensor info, <f Tensor 20x1x5x5>, and we could enter coordinates such as “0,0” to print only a part of the tensor. We could also type “e” for the entire tensor, “b” to break from this loop, and “s” to break from this loop and skip all the future break points.

Check Tensor Value

Sometimes, developers might want to check if the tensor contains unexpected values which could be negative values, Nans, or others. We could have a templated api std::vector<std::vector<int>>TensorInspector::validate_value(const ValueChecker& vc, bool interactive = false, std::string tag = "") where vc is a bool value checker function that users can define, interactive is whether you want to check the values on the go (similar to interactive print, refer to the first screenshot below), and tag is the name you want to give to the call to identify it. The return is a vector of coordinates (represented with vector) at which checker evaluates to true.

To the developer's convenience, we would also pre-define a bunch of value checkers. Refer to the Enum below:

Developers could skip wiring their own checker by using this API: std::vector<std::vector<int>>TensorInspector::validate_value(ValueChecker vc, bool interactive = false, std::string tag = "")

This API can also be used creatively to find coordinates where the values equal to some required value or fall in some range.

Dump Tensor Value

Sometimes, we would want to dump the tensor to a file in binary mode, so that we could use, for example, a python script to further analyze the numbers. We could use void TensorInspector::dump_value(std::string tag) to achieve this. We would want to use .npy format for its compatibility with Numpy and its great load efficiency. A benchmark can be found here: https://github.com/mverleg/array_storage_benchmark.

In this screenshot, we can compare side by side the regular print (left) and the value dump (right). Even without formatting the dumped value, we can see it has a better accuracy.

As we can see in the screenshot, the output files are named "{tag}_{visit}.npy", where tag is the name that we give to the call, and visit is the visit count.

To load the file, we could just do:

import numpy as np
a = np.load('abc_1.npy')
print(a)

Notice: in interactive print, we could also do value dump with command "d":

Usage

To test or use this new utility in your operator, just include the .h file and construct a TensorInspector object, say "ti", as described above. Then, you can use all functionalities listed above by writing, for example, "ti.print_string()" or "ti.interactive_print()". Finally, just run any python script that uses the operator that you modified.

TODOs

Support sparse NDArrays if necessary
Check corner cases e.g. if Tblob's shape is unknown

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

sandeep-krishnamurthy · 2019-07-09T04:50:43Z

Awesome! thank you for the nice utility. 👍 for clear and detailed PR description.

src/common/tensor_inspector.h

Zha0q1 · 2019-07-09T07:00:04Z

I could probably rewrite build_checker() (https://github.com/apache/incubator-mxnet/pull/15490/files#diff-9e7d5c2420ecc900c4a85a3c35d91bffR446) with macro to suppress warnings when DType is not floating point. But it's not causing any issues because in such cases the control flow wouldn't go by the undefined operation. Will change if have time.

Zha0q1 · 2019-07-11T00:07:09Z

I could probably rewrite build_checker() (https://github.com/apache/incubator-mxnet/pull/15490/files#diff-9e7d5c2420ecc900c4a85a3c35d91bffR446) with macro to suppress warnings when DType is not floating point. But it's not causing any issues because in such cases the control flow wouldn't go by the undefined operation. Will change if have time.

After experimenting for several hours today, I think there is not a good way to breach at compile time to suppress the warnings. Macro apparently did not work. I also tried simulating C++17 style constexpr if with templated structs, which is an idea proposed by https://baptiste-wicht.com/posts/2015/07/simulate-static_if-with-c11c14.html, but that method was compiler specific and did not work either.

Having no warnings when compiling is nice, but given that this is a developer's tool and that behavior-wise the code has no issue, I think I will leave the warnings.

Zha0q1 · 2019-07-11T01:50:55Z

@apeforest @anirudh2290 @access2rohit @larroy @sandeep-krishnamurthy ,
Hi guys, I completed this PR and it's ready for review. The only thing that I do not have yet is support for CSR NDArrays because I want to ask your opinion. I think I probably do not need to generate the whole matrix. Having something like this in my interactive print should be good enough? (this is in the existing test::print)

access2rohit · 2019-07-11T16:36:36Z

Can you split the PR into 2 separate PRs. One for CPU and one for GPU ?

Zha0q1 · 2019-07-11T21:08:46Z

Just realized there were more than one Tensor structs. Added mshadow:: before Tensor so compiler would not get confused. a41720d#diff-9e7d5c2420ecc900c4a85a3c35d91bffR623

rondogency

LGTM

src/common/tensor_inspector.h

Zha0q1 · 2019-07-12T00:45:20Z

Created a separate PR for a new tutorial on usage.
#15517

src/common/tensor_inspector.h

anirudh2290

Thanks! I see this being very useful for operator developers.

src/common/tensor_inspector.h

access2rohit · 2019-07-16T21:05:34Z

LGTM just address my last comment and no need to split this PR into 2 now since you have already addressed most comments

access2rohit · 2019-07-16T22:57:30Z

retrigger the build ... there is some julia error. Probably not because of your code changes. Also add link to the test cases that you covered

Zha0q1 · 2019-07-16T22:58:17Z

retrigger the build ... there is some julia error. Probably not because of your code changes. Also add link to the test cases that you covered

cases covered are described in my tutorial at the end. Link: #15517

Zha0q1 · 2019-07-17T17:36:30Z

@sandeep-krishnamurthy Looks ready!

apeforest · 2019-07-17T18:03:54Z

src/common/tensor_inspector.h

+    if (pos->size() > static_cast<size_t>(dimension)) {
+      return false;
+    }
+    for (unsigned i = 0; i < pos->size(); ++i) {


replace unsigned with size_t

apeforest · 2019-07-17T18:04:34Z

src/common/tensor_inspector.h

+    const int sub_dim = dimension - pos.size();
+    sub_shape->resize(sub_dim);
+    index_t multiple = 1;
+    for (int i = pos.size(), j = 0; i < dimension; ++i, ++j) {


change int to index_t or size_t

apeforest · 2019-07-17T18:05:02Z

src/common/tensor_inspector.h

+    }
+    index_t sum = 0;
+    index_t m = 1;
+    for (int i = pos.size() - 1; i >= 0; --i) {


change int to index_t

apeforest · 2019-07-17T18:05:55Z

Three more changes. Otherwise good to go. Thanks for your contribution.

Zha0q1 · 2019-07-17T18:08:24Z

@sandeep-krishnamurthy Looks ready!

After a few more changes requested by Lin

apeforest

LGTM

larroy · 2019-07-18T01:23:02Z

src/common/tensor_inspector.h

+struct InspectorManager {
+  static InspectorManager* get() {
+    static std::mutex mtx;
+    static std::unique_ptr<InspectorManager> im = nullptr;


Could you add an utility function encapsulating this idiom? this is being used in several places, and would be good to use this.

Zha0q1 added 5 commits July 8, 2019 16:36

initial. Supports Tensor, TBlob, NDArray

30b2271

add GPU tensor support

571c65a

move run context reference to constructors

0ae0bae

sanity fix

5123a14

sanity fix

969cf73

kshitij12345 reviewed Jul 9, 2019

View reviewed changes

src/common/tensor_inspector.h Outdated Show resolved Hide resolved

Zha0q1 changed the title ~~[WIP] Utility to help developers debug operators: Tensor inspector~~ [WIP] Utility to help developers debug operators: Tensor Inspector Jul 9, 2019

Zha0q1 added 2 commits July 8, 2019 22:57

fix checker bug & add new checker type

eafcadc

add more checker types

275fb27

Zha0q1 added 6 commits July 9, 2019 11:05

fix gpu tensor constructor call

9c06300

add value dumping funtionality

b013c2f

sanity fix

b8c01ac

sanity fix

ec8b44f

add dumping support to interactive print

eef4884

sanity fix

108d74d

Zha0q1 changed the title ~~[WIP] Utility to help developers debug operators: Tensor Inspector~~ Utility to help developers debug operators: Tensor Inspector Jul 11, 2019

Zha0q1 added 2 commits July 11, 2019 10:59

Re-Trigger build

95c7c44

add namespace before Tensor

a41720d

rondogency approved these changes Jul 11, 2019

View reviewed changes

src/common/tensor_inspector.h Outdated Show resolved Hide resolved

Zha0q1 added 3 commits July 11, 2019 14:48

add more checker types

e2e16f4

bug fix

36c8d9f

fix comments

e3356d5

Zha0q1 mentioned this pull request Jul 12, 2019

Tensor Inspector Tutorial #15517

Merged

7 tasks

fix warnings

15c46e6