Skip to content
gchanan edited this page Aug 1, 2019 · 12 revisions

Highlights

TensorBoard (currently experimental)

Breaking Changes

New Features

Operators

NN

Tensors / dtypes

Optim

Distributions

Samplers

DistributedDataParallel

TorchScript and Tracer

Experimental Features

Improvements

Bug Fixes

Serious

Other

Deprecations

Performance

Highlights

Other

Documentation

ONNX

Documentation

  • Add magma for CUDA 10.1 to Windows docs (19914).
  • Improve clarity of JIT documentation and document torch.jit.Attribute (19929).
  • Improve build-from-source instructions (20088).
  • Add ninja to build instructions (20079).
  • Update explanation of module attributes in JIT type refinement docs (20912).
  • Update libtorch build docs (21150).
  • Updated web links on contribution_guide and governance documentation (21243).
  • Improve documentation for publishing hub models (21307).
  • Clarify performance implications of deterministic mode (21337).
  • Update a configured copyright notice (21372).
  • Suggest a faster linker in the contributing guide (21334).
  • Update cuda pinned memory note to include tensor.to (20977).
  • Improve output of doxygen build (20362).
  • Add CUDA C++11 and profiling notes to the contribution guide (21386).
  • Update documentation of entry point in hub (21568).
  • Fix a typo in reference to hubconf.py filename (21631).
  • Update code comments for MAGMA functions (22618).
  • nn.CTCLoss: Fix rendering of docs (19662).
  • nn.CTCLoss: Change Inputs to Shape to unify the format, and add the type of Output in Shape (20422).
  • nn.MultiheadAttention: Add documentation for add_bias_kv, add_zero_attn, and attn_mask (20071).
  • nn.TripleMarginLoss: Clarify an example (20145).
  • nn.init.calculate_gain: update example (20131).
  • nn.Softmax: Fixed to specify dimension to prevent warning in 1.1.0. (20310).
  • nn.MultiheadAttention: Fix documentation for attention mask shape (20850).
  • nn.functional.conv{1,2,3}d: Remove padding_mode (20891).
  • nn.functional.upsample and nn.functional.interpolate: Fix align corner docs (20961).
  • nn.functional.gelu: Fix formatting (21265).
  • nn.functional / nn.init: Breaks up NN module in docs so they load faster (21291).
  • nn.functional.gumbel_softmax Fix links to Gumbel-Softmax arxiv papers (21376).
  • nn.functional.one_hot: Fix incorrect signature in docs (22929).
  • nn.modules.RNN: Fix subscripts (20949).
  • nn.modules.batchnorm.SyncBatchNorm: Update an example (20991).
  • nn.module.Activation: Improve repr of inplace (20127).
  • nn.normal_ / nn.kaiming_normal_ Fix latex formular error (21000).
  • nn.transformer.TransformerEncoder / nn.transformer.TransformerDecoder: Edit docs for nn.transformer (21746).
  • torch.optim.lr_scheduler.CosineAnnealingLR: fix a typo (20110).
  • torch.eig: Fix formatting for note (19743).
  • torch.utils.tensorboard.add_video: clarify the data type (19959).
  • torch.geometric_: Update to reflect correct tensor behavior (20091).
  • torch.Tensor: Add a warning about memory usage (20801).
  • torch.multiprocessing: Explain refcounting of CUDA tensors (19904).
  • torch.optim.lr_scheduler.CyclicLR: Clarify base_momentum and max_momentum (20880).
  • torch.functional.tensordot: Fix a typo (21510).
  • torch.triangular_solve Fix incorrect use of TeX (21649).
  • torch.distributions.categorical.Categorical: Update "log probabilities" to "log-odds" (21707).
  • torch.load / torch.save: Improve formatting (21747).
  • torch.autograd.grad_mode: Document that no_grad is thread local. (21755).
  • torch.diagflat, torch.bincount, torch.allclose: Update incorrect argument names and types (21846).
  • torch.arange: Fix incorrect docs (21992).
  • torch.bool: Document the Boolean tensor type (21601).
  • torch.utils.data.IterableDataset: Update IterableDataset doc to be consistent with current behavior (22230).
  • torch.utils.data.Dataloader: Documentation RNG state consumption (22540).
  • torch.irfft: Improve irfft docs (22995).
  • torch.sign: Add the mathematical definition (22894).
  • torch.as_strided: Add documentation (22842).\

Perf

##Performance

  • torch.bmm: Improve performance on CPU by applying TensorAccessor (20266).
  • torch.matmul Optimization for the case A.ndim <= 2 && B.ndim >= 3 (20448).
  • torch.randperm: Parallelize initialization in randperm on CPU (21529).
  • torch.get_num_interop_threads: Add get/set_num_interop_threads into torch.h (20659).
  • torch.copy_: Refactor CUDA copy kernel and improve performance (20685).
  • torch.inverse: Move workspace query and allocation outside loop to improve performance (20904).
  • torch.cdist: Improve torch.cdist performance (20605).
  • torch.lerp: Vectorize the lerp operator with TensorIterator (22038).
  • torch.topk: Optimize CPU performance using parallel and partial sort (22865).
  • torch.sinh / torch.cosh: Move legacy TH functions to TensorIterator + Vec256 (21115).
  • nn.Softmax: Add persistent CUDA kernels that speed up SoftMax (20827).
  • torch.coalesce: Use _sparse_coo_tensor_unsafe in coalesce for speedup (21214).
  • torch.normal: Move normal, normal_means, normal_stddevs, and normal_means_stddevs to ATen (21287).
  • torch.distributions.cauchy: Move THCTensor_(cauchy) to ATen (21289).
  • torch.bernoulli: Speedup bernoulli_scalar_cuda_kernel with grid-stride loop (21300).
  • torch.eye: Parallelize eye() on CPU (21077).
  • nn.Upsample: Increase throughput of bilinear upsampling (19306).
  • nn.Upsample: Faster bilinear2d kernel (21879).
  • nn.functional.layer_norm: Optimize layer_norm forward (20345).
  • nn.EmbeddingBag: Optimize CUDA kernel (22016).
  • at::parallel_for: Port TH library to ATen/Parallel instead of omp parallel for (19105).
  • at::parallel_for: Port THNN to ATen/Parallel (20032).
  • at::launch: Add a benchmark (21581).
  • Remove explicit checks for parallelism from TH (20002).
  • Intra-op parallelism microbenchmarks (19997).
  • Port ATen/native to ATen/Parallel (20043).
  • Move inter-op parallelization settings into ATen/Parallel (20050).
  • Don't split 256-bit AVX2 load/store intrinsics (20609).
  • Add a native ATen/Parallel backend (20087).
  • Improve performance of advanced indexing backward (20557).
  • Correctly honor OMP/MKL NUM_THREADS environment variables (21189).
  • Allow more flexibility in callback profiling (21394).
  • Improve jit unpickling performance by reserving correct capacity in memoization table (21542).
  • Native TBB parallel backend (20480).
  • Always enable P2P access for GPU copies (21872).
  • Improve performance of CUDA upsample kernel (21694).
  • Limit the number of threads used by TBB (22045).
  • Provide an option to use a single thread pool (22047).
  • Add benchmarking options (22051).
  • Add a PyTorch ThroughputBenchmark (20766).
  • Use a pool of workers for each device in autograd (21911).
  • Use const refs in TensorIterator to avoid copy construction (22465).
  • Performance improvements for depthwise convolutions in FP16 on Volta and Turing GPUs (22302).
  • Optimize RNN on CPU (22512).
  • Use mkldnn inner product for nn.Linear() to improve BERT performance (21851).

Other excluded changes, refactoring etc:

  • Fix init_thread calls in thread pool initialization (20848).
  • Split ATen/Parallel into interface and backend (20057).
  • nn.functional.layer_norm: Add autograd for layer_norm on CPU (20883).
  • Future interface for ATen/Parallel (21764).
  • Use lazy initialization in autograd record_function to avoid static (22317).
  • Fix a race between landing diffs (22291).
  • Resend "Split ATen/Parallel into interface and backend" (20825).

build related?

  • Restore TBB module (20454).
  • Allow a non-OpenMP based build (19749).
  • Fix TBB build for older versions of cmake (23038).