1.2 Release Notes

Highlights

TensorBoard (currently experimental)

Breaking Changes

New Features

(19140)
(19928)
(19818)
(19695)
(19316)
(18731)
(20455)
()
(20820)
(20745)
(20413)
(20598)
(20689)
(21032)
(21033)
(20665)
(21237)
(21184)
(20934)
(21175)
(21191)
(21274)
(21421)
(21435)
(20293)
(21215)
(20458)
(21516)
(21538)
(8824)
(20170)
(21610)
(20575)
(20573)
(21943)
(19228)
(21933)
(21771)
(20558)
(21889)
(21924)
(22285) fatal: empty string is not a valid pathspec. please use . instead if you meant to match all paths fatal: empty string is not a valid pathspec. please use . instead if you meant to match all paths
()
(22386)
(22347)
(21250)
(21288)
(22491)
(22576)
(21522)
(21523)
(21860)
(22283)
(22320)
(22326)
(22102)
(21892)
(22852)
(22966)
(22877)
(23099)
(22546)

Operators

NN

Tensors / dtypes

Optim

Distributions

Samplers

DistributedDataParallel

TorchScript and Tracer

Experimental Features

Improvements

Bug Fixes

Serious

Other

(19938)
Correctly handle process_group in convert_sync_batchnorm. (19240).
(20270)
(20305)
(20150)
(20221)
(20391)
(20517)
(20541)
(20505)
(20182)
(20679)
(20369)
(20397)
(20759)
(20782)
(20797)
(20116)
(20914)
(20919)
(20943)
(21019)
(20971)
(19000)
(18519)
(21067)
(21192)
(21236)
(21253)
(21371)
(21392)
(21213)
(21293)
(21458)
(20900)
(21400)
(21461)
(13774)
(21497)
(21530)
(20288)
(21324)
(20401)
(21324)
(21619)
(21689)
(21617)
(21658)
(21652)
(21691)
(19612)
(21723)
(21499)
(21687)
(21672)
(21533)
(21914)
(22088)
(22119)
(21910)
(22111)
(22183)
(20996)
(22392)
(22401)
(22405)
(22248)
(22470)
(22545)
(22533)
(22493)
(22242)
(22445)
(22569)
(22821)
(22730)
(22715)
(22304)
(22977)
(23007)
(22827)
(23123)
(23125)
(23105)
(23040)
(22850)
(22983)
(23030)

Deprecations

(20026)

Performance

Highlights

Other

Documentation

ONNX

(20059)
(20018)
(20035)
(20175)
(19963)
(19485)
(20283)
(20281)
(20415)
(20451)
(20307)
(20483)
(20581)
(20618)
(20565)
(20584)
(20648)
(20799)
(20800)
(20817)
(20968)
(20908)
(20976)
(20779)
(21024)
(20863)
(21285)
(21302)
(21174)
(21366)
(19802)
(21310)
(21468)
(21022)
(17955)
(21763)
(21608)
(21797)
(22110)
(22073)
(22075)
(20952)
(21858)
(20730)
(22160)
(22369)
(19089)
(22238)
(19014)
(22034)
(20545)
(21986)
(22397)
(22411)
(22757)
(21588)
(22229)
(22588)
(22594)
(22433)

Documentation

Add magma for CUDA 10.1 to Windows docs (19914).
Improve clarity of JIT documentation and document torch.jit.Attribute (19929).
Improve build-from-source instructions (20088).
Add ninja to build instructions (20079).
Update explanation of module attributes in JIT type refinement docs (20912).
Update libtorch build docs (21150).
Updated web links on contribution_guide and governance documentation (21243).
Improve documentation for publishing hub models (21307).
Clarify performance implications of deterministic mode (21337).
Update a configured copyright notice (21372).
Suggest a faster linker in the contributing guide (21334).
Update cuda pinned memory note to include tensor.to (20977).
Improve output of doxygen build (20362).
Add CUDA C++11 and profiling notes to the contribution guide (21386).
Update documentation of entry point in hub (21568).
Fix a typo in reference to hubconf.py filename (21631).
Update code comments for MAGMA functions (22618).
nn.CTCLoss: Fix rendering of docs (19662).
nn.CTCLoss: Change Inputs to Shape to unify the format, and add the type of Output in Shape (20422).
nn.MultiheadAttention: Add documentation for add_bias_kv, add_zero_attn, and attn_mask (20071).
nn.TripleMarginLoss: Clarify an example (20145).
nn.init.calculate_gain: update example (20131).
nn.Softmax: Fixed to specify dimension to prevent warning in 1.1.0. (20310).
nn.MultiheadAttention: Fix documentation for attention mask shape (20850).
nn.functional.conv{1,2,3}d: Remove padding_mode (20891).
nn.functional.upsample and nn.functional.interpolate: Fix align corner docs (20961).
nn.functional.gelu: Fix formatting (21265).
nn.functional / nn.init: Breaks up NN module in docs so they load faster (21291).
nn.functional.gumbel_softmax Fix links to Gumbel-Softmax arxiv papers (21376).
nn.functional.one_hot: Fix incorrect signature in docs (22929).
nn.modules.RNN: Fix subscripts (20949).
nn.modules.batchnorm.SyncBatchNorm: Update an example (20991).
nn.module.Activation: Improve repr of inplace (20127).
nn.normal_ / nn.kaiming_normal_ Fix latex formular error (21000).
nn.transformer.TransformerEncoder / nn.transformer.TransformerDecoder: Edit docs for nn.transformer (21746).
torch.optim.lr_scheduler.CosineAnnealingLR: fix a typo (20110).
torch.eig: Fix formatting for note (19743).
torch.utils.tensorboard.add_video: clarify the data type (19959).
torch.geometric_: Update to reflect correct tensor behavior (20091).
torch.Tensor: Add a warning about memory usage (20801).
torch.multiprocessing: Explain refcounting of CUDA tensors (19904).
torch.optim.lr_scheduler.CyclicLR: Clarify base_momentum and max_momentum (20880).
torch.functional.tensordot: Fix a typo (21510).
torch.triangular_solve Fix incorrect use of TeX (21649).
torch.distributions.categorical.Categorical: Update "log probabilities" to "log-odds" (21707).
torch.load / torch.save: Improve formatting (21747).
torch.autograd.grad_mode: Document that no_grad is thread local. (21755).
torch.diagflat, torch.bincount, torch.allclose: Update incorrect argument names and types (21846).
torch.arange: Fix incorrect docs (21992).
torch.bool: Document the Boolean tensor type (21601).
torch.utils.data.IterableDataset: Update IterableDataset doc to be consistent with current behavior (22230).
torch.utils.data.Dataloader: Documentation RNG state consumption (22540).
torch.irfft: Improve irfft docs (22995).
torch.sign: Add the mathematical definition (22894).
torch.as_strided: Add documentation (22842).\

Perf

##Performance

torch.bmm: Improve performance on CPU by applying TensorAccessor (20266).
torch.matmul Optimization for the case A.ndim <= 2 && B.ndim >= 3 (20448).
torch.randperm: Parallelize initialization in randperm on CPU (21529).
torch.get_num_interop_threads: Add get/set_num_interop_threads into torch.h (20659).
torch.copy_: Refactor CUDA copy kernel and improve performance (20685).
torch.inverse: Move workspace query and allocation outside loop to improve performance (20904).
torch.cdist: Improve torch.cdist performance (20605).
torch.lerp: Vectorize the lerp operator with TensorIterator (22038).
torch.topk: Optimize CPU performance using parallel and partial sort (22865).
torch.sinh / torch.cosh: Move legacy TH functions to TensorIterator + Vec256 (21115).
nn.Softmax: Add persistent CUDA kernels that speed up SoftMax (20827).
torch.coalesce: Use _sparse_coo_tensor_unsafe in coalesce for speedup (21214).
torch.normal: Move normal, normal_means, normal_stddevs, and normal_means_stddevs to ATen (21287).
torch.distributions.cauchy: Move THCTensor_(cauchy) to ATen (21289).
torch.bernoulli: Speedup bernoulli_scalar_cuda_kernel with grid-stride loop (21300).
torch.eye: Parallelize eye() on CPU (21077).
nn.Upsample: Increase throughput of bilinear upsampling (19306).
nn.Upsample: Faster bilinear2d kernel (21879).
nn.functional.layer_norm: Optimize layer_norm forward (20345).
nn.EmbeddingBag: Optimize CUDA kernel (22016).
at::parallel_for: Port TH library to ATen/Parallel instead of omp parallel for (19105).
at::parallel_for: Port THNN to ATen/Parallel (20032).
at::launch: Add a benchmark (21581).
Remove explicit checks for parallelism from TH (20002).
Intra-op parallelism microbenchmarks (19997).
Port ATen/native to ATen/Parallel (20043).
Move inter-op parallelization settings into ATen/Parallel (20050).
Don't split 256-bit AVX2 load/store intrinsics (20609).
Add a native ATen/Parallel backend (20087).
Improve performance of advanced indexing backward (20557).
Correctly honor OMP/MKL NUM_THREADS environment variables (21189).
Allow more flexibility in callback profiling (21394).
Improve jit unpickling performance by reserving correct capacity in memoization table (21542).
Native TBB parallel backend (20480).
Always enable P2P access for GPU copies (21872).
Improve performance of CUDA upsample kernel (21694).
Limit the number of threads used by TBB (22045).
Provide an option to use a single thread pool (22047).
Add benchmarking options (22051).
Add a PyTorch ThroughputBenchmark (20766).
Use a pool of workers for each device in autograd (21911).
Use const refs in TensorIterator to avoid copy construction (22465).
Performance improvements for depthwise convolutions in FP16 on Volta and Turing GPUs (22302).
Optimize RNN on CPU (22512).
Use mkldnn inner product for nn.Linear() to improve BERT performance (21851).

Other excluded changes, refactoring etc:

Fix init_thread calls in thread pool initialization (20848).
Split ATen/Parallel into interface and backend (20057).
nn.functional.layer_norm: Add autograd for layer_norm on CPU (20883).
Future interface for ATen/Parallel (21764).
Use lazy initialization in autograd record_function to avoid static (22317).
Fix a race between landing diffs (22291).
Resend "Split ATen/Parallel into interface and backend" (20825).

build related?

Restore TBB module (20454).
Allow a non-OpenMP based build (19749).
Fix TBB build for older versions of cmake (23038).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.2 Release Notes

Highlights

TensorBoard (currently experimental)

Breaking Changes

New Features

Operators

NN

Tensors / dtypes

Optim

Distributions

Samplers

DistributedDataParallel

TorchScript and Tracer

Experimental Features

Improvements

Bug Fixes

Serious

Other

Deprecations

Performance

Highlights

Other

Documentation

ONNX

Documentation

Perf

Clone this wiki locally