forked from pytorch/pytorch
-
Notifications
You must be signed in to change notification settings - Fork 1
1.2 Release Notes
gchanan edited this page Aug 1, 2019
·
12 revisions
- (19140)
- (19928)
- (19818)
- (19695)
- (19316)
- (18731)
- (20455)
- ()
- (20820)
- (20745)
- (20413)
- (20598)
- (20689)
- (21032)
- (21033)
- (20665)
- (21237)
- (21184)
- (20934)
- (21175)
- (21191)
- (21274)
- (21421)
- (21435)
- (20293)
- (21215)
- (20458)
- (21516)
- (21538)
- (8824)
- (20170)
- (21610)
- (20575)
- (20573)
- (21943)
- (19228)
- (21933)
- (21771)
- (20558)
- (21889)
- (21924)
- (22285) fatal: empty string is not a valid pathspec. please use . instead if you meant to match all paths fatal: empty string is not a valid pathspec. please use . instead if you meant to match all paths
- ()
- (22386)
- (22347)
- (21250)
- (21288)
- (22491)
- (22576)
- (21522)
- (21523)
- (21860)
- (22283)
- (22320)
- (22326)
- (22102)
- (21892)
- (22852)
- (22966)
- (22877)
- (23099)
- (22546)
- (19938)
- Correctly handle process_group in
convert_sync_batchnorm
. (19240). - (20270)
- (20305)
- (20150)
- (20221)
- (20391)
- (20517)
- (20541)
- (20505)
- (20182)
- (20679)
- (20369)
- (20397)
- (20759)
- (20782)
- (20797)
- (20116)
- (20914)
- (20919)
- (20943)
- (21019)
- (20971)
- (19000)
- (18519)
- (21067)
- (21192)
- (21236)
- (21253)
- (21371)
- (21392)
- (21213)
- (21293)
- (21458)
- (20900)
- (21400)
- (21461)
- (13774)
- (21497)
- (21530)
- (20288)
- (21324)
- (20401)
- (21324)
- (21619)
- (21689)
- (21617)
- (21658)
- (21652)
- (21691)
- (19612)
- (21723)
- (21499)
- (21687)
- (21672)
- (21533)
- (21914)
- (22088)
- (22119)
- (21910)
- (22111)
- (22183)
- (20996)
- (22392)
- (22401)
- (22405)
- (22248)
- (22470)
- (22545)
- (22533)
- (22493)
- (22242)
- (22445)
- (22569)
- (22821)
- (22730)
- (22715)
- (22304)
- (22977)
- (23007)
- (22827)
- (23123)
- (23125)
- (23105)
- (23040)
- (22850)
- (22983)
- (23030)
- (20026)
- (20059)
- (20018)
- (20035)
- (20175)
- (19963)
- (19485)
- (20283)
- (20281)
- (20415)
- (20451)
- (20307)
- (20483)
- (20581)
- (20618)
- (20565)
- (20584)
- (20648)
- (20799)
- (20800)
- (20817)
- (20968)
- (20908)
- (20976)
- (20779)
- (21024)
- (20863)
- (21285)
- (21302)
- (21174)
- (21366)
- (19802)
- (21310)
- (21468)
- (21022)
- (17955)
- (21763)
- (21608)
- (21797)
- (22110)
- (22073)
- (22075)
- (20952)
- (21858)
- (20730)
- (22160)
- (22369)
- (19089)
- (22238)
- (19014)
- (22034)
- (20545)
- (21986)
- (22397)
- (22411)
- (22757)
- (21588)
- (22229)
- (22588)
- (22594)
- (22433)
- Add magma for CUDA 10.1 to Windows docs (19914).
- Improve clarity of JIT documentation and document
torch.jit.Attribute
(19929). - Improve build-from-source instructions (20088).
- Add
ninja
to build instructions (20079). - Update explanation of module attributes in JIT type refinement docs (20912).
- Update libtorch build docs (21150).
- Updated web links on contribution_guide and governance documentation (21243).
- Improve documentation for publishing hub models (21307).
- Clarify performance implications of deterministic mode (21337).
- Update a configured copyright notice (21372).
- Suggest a faster linker in the contributing guide (21334).
- Update cuda pinned memory note to include
tensor.to
(20977). - Improve output of doxygen build (20362).
- Add CUDA C++11 and profiling notes to the contribution guide (21386).
- Update documentation of entry point in hub (21568).
- Fix a typo in reference to hubconf.py filename (21631).
- Update code comments for MAGMA functions (22618).
-
nn.CTCLoss
: Fix rendering of docs (19662). -
nn.CTCLoss
: ChangeInputs
toShape
to unify the format, and add the type ofOutput
inShape
(20422). -
nn.MultiheadAttention
: Add documentation for add_bias_kv, add_zero_attn, and attn_mask (20071). -
nn.TripleMarginLoss
: Clarify an example (20145). -
nn.init.calculate_gain
: update example (20131). -
nn.Softmax
: Fixed to specify dimension to prevent warning in 1.1.0. (20310). -
nn.MultiheadAttention
: Fix documentation for attention mask shape (20850). -
nn.functional.conv{1,2,3}d
: Remove padding_mode (20891). -
nn.functional.upsample
andnn.functional.interpolate
: Fix align corner docs (20961). -
nn.functional.gelu
: Fix formatting (21265). -
nn.functional
/nn.init
: Breaks up NN module in docs so they load faster (21291). -
nn.functional.gumbel_softmax
Fix links to Gumbel-Softmax arxiv papers (21376). -
nn.functional.one_hot
: Fix incorrect signature in docs (22929). -
nn.modules.RNN
: Fix subscripts (20949). -
nn.modules.batchnorm.SyncBatchNorm
: Update an example (20991). -
nn.module.Activation
: Improve repr of inplace (20127). -
nn.normal_
/nn.kaiming_normal_
Fix latex formular error (21000). -
nn.transformer.TransformerEncoder
/nn.transformer.TransformerDecoder
: Edit docs for nn.transformer (21746). -
torch.optim.lr_scheduler.CosineAnnealingLR
: fix a typo (20110). -
torch.eig
: Fix formatting for note (19743). -
torch.utils.tensorboard.add_video
: clarify the data type (19959). -
torch.geometric_
: Update to reflect correct tensor behavior (20091). -
torch.Tensor
: Add a warning about memory usage (20801). -
torch.multiprocessing
: Explain refcounting of CUDA tensors (19904). -
torch.optim.lr_scheduler.CyclicLR
: Clarifybase_momentum
andmax_momentum
(20880). -
torch.functional.tensordot
: Fix a typo (21510). -
torch.triangular_solve
Fix incorrect use of TeX (21649). -
torch.distributions.categorical.Categorical
: Update "log probabilities" to "log-odds" (21707). -
torch.load
/torch.save
: Improve formatting (21747). -
torch.autograd.grad_mode
: Document that no_grad is thread local. (21755). -
torch.diagflat
,torch.bincount
,torch.allclose
: Update incorrect argument names and types (21846). -
torch.arange
: Fix incorrect docs (21992). -
torch.bool
: Document the Boolean tensor type (21601). -
torch.utils.data.IterableDataset
: Update IterableDataset doc to be consistent with current behavior (22230). -
torch.utils.data.Dataloader
: Documentation RNG state consumption (22540). -
torch.irfft
: Improve irfft docs (22995). -
torch.sign
: Add the mathematical definition (22894). -
torch.as_strided
: Add documentation (22842).\
##Performance
-
torch.bmm
: Improve performance on CPU by applying TensorAccessor (20266). -
torch.matmul
Optimization for the case A.ndim <= 2 && B.ndim >= 3 (20448). -
torch.randperm
: Parallelize initialization in randperm on CPU (21529). -
torch.get_num_interop_threads
: Add get/set_num_interop_threads into torch.h (20659). -
torch.copy_
: Refactor CUDA copy kernel and improve performance (20685). -
torch.inverse
: Move workspace query and allocation outside loop to improve performance (20904). -
torch.cdist
: Improve torch.cdist performance (20605). -
torch.lerp
: Vectorize the lerp operator with TensorIterator (22038). -
torch.topk
: Optimize CPU performance using parallel and partial sort (22865). -
torch.sinh
/torch.cosh
: Move legacy TH functions to TensorIterator + Vec256 (21115). -
nn.Softmax
: Add persistent CUDA kernels that speed up SoftMax (20827). -
torch.coalesce
: Use _sparse_coo_tensor_unsafe in coalesce for speedup (21214). -
torch.normal
: Movenormal
,normal_means
,normal_stddevs
, andnormal_means_stddevs
to ATen (21287). -
torch.distributions.cauchy
: Move THCTensor_(cauchy) to ATen (21289). -
torch.bernoulli
: Speedup bernoulli_scalar_cuda_kernel with grid-stride loop (21300). -
torch.eye
: Parallelizeeye()
on CPU (21077). -
nn.Upsample
: Increase throughput of bilinear upsampling (19306). -
nn.Upsample
: Faster bilinear2d kernel (21879). -
nn.functional.layer_norm
: Optimize layer_norm forward (20345). -
nn.EmbeddingBag
: Optimize CUDA kernel (22016). -
at::parallel_for
: Port TH library to ATen/Parallel instead ofomp parallel for
(19105). -
at::parallel_for
: Port THNN to ATen/Parallel (20032). -
at::launch
: Add a benchmark (21581). - Remove explicit checks for parallelism from TH (20002).
- Intra-op parallelism microbenchmarks (19997).
- Port ATen/native to ATen/Parallel (20043).
- Move inter-op parallelization settings into ATen/Parallel (20050).
- Don't split 256-bit AVX2 load/store intrinsics (20609).
- Add a native ATen/Parallel backend (20087).
- Improve performance of advanced indexing backward (20557).
- Correctly honor OMP/MKL
NUM_THREADS
environment variables (21189). - Allow more flexibility in callback profiling (21394).
- Improve jit unpickling performance by reserving correct capacity in memoization table (21542).
- Native TBB parallel backend (20480).
- Always enable P2P access for GPU copies (21872).
- Improve performance of CUDA upsample kernel (21694).
- Limit the number of threads used by TBB (22045).
- Provide an option to use a single thread pool (22047).
- Add benchmarking options (22051).
- Add a PyTorch ThroughputBenchmark (20766).
- Use a pool of workers for each device in autograd (21911).
- Use const refs in TensorIterator to avoid copy construction (22465).
- Performance improvements for depthwise convolutions in FP16 on Volta and Turing GPUs (22302).
- Optimize RNN on CPU (22512).
- Use
mkldnn
inner product fornn.Linear()
to improve BERT performance (21851).
Other excluded changes, refactoring etc:
- Fix init_thread calls in thread pool initialization (20848).
- Split ATen/Parallel into interface and backend (20057).
-
nn.functional.layer_norm
: Add autograd for layer_norm on CPU (20883). - Future interface for ATen/Parallel (21764).
- Use lazy initialization in autograd record_function to avoid static (22317).
- Fix a race between landing diffs (22291).
- Resend "Split ATen/Parallel into interface and backend" (20825).
build related?