{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":32199982,"defaultBranch":"master","name":"samza","ownerLogin":"apache","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2015-03-14T07:00:05.000Z","ownerAvatar":"https://github.com/avatars/u/47359?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1705718772.0","currentOid":""},"activityList":{"items":[{"before":"2eb556a5bcfb4aff83f3ba00fc221108d6cba0b2","after":"f9c3241b87ce5d7a568368d3ffec5dea174f7692","ref":"refs/heads/master","pushedAt":"2024-08-02T23:43:12.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"dxichen","name":"Daniel Chen","path":"/dxichen","primaryAvatarUrl":"https://github.com/avatars/u/29577458?s=80&v=4"},"commit":{"message":"Output Current Timestamp at run-class.sh script (#1702)\n\n* print current timestamp\r\n\r\n* Fix typo\r\n\r\n* fix build issue about grolifant okhttp\r\n\r\n---------\r\n\r\nCo-authored-by: Haolan Ye ","shortMessageHtmlLink":"Output Current Timestamp at run-class.sh script (#1702)"}},{"before":"fb9f5cdcb48ab4ab8c3e309395440bc31152cb0b","after":"2eb556a5bcfb4aff83f3ba00fc221108d6cba0b2","ref":"refs/heads/master","pushedAt":"2024-04-05T17:38:42.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"dxichen","name":"Daniel Chen","path":"/dxichen","primaryAvatarUrl":"https://github.com/avatars/u/29577458?s=80&v=4"},"commit":{"message":"Create store directory paths in CSM constructor for disk space monitor (#1697)\n\n* Create store directory paths in CSM constructor to be able to monitor the disk usage of the store directories\r\n\r\n* Fix stylecheck issues\r\n\r\n* Refactor - init all store paths together and do not mutate the storeDirPaths. Added test\r\n\r\n* Remove ununsed method\r\n\r\n* Remove ununsed method\r\n\r\n* Stylecheck, Remove ununsed import","shortMessageHtmlLink":"Create store directory paths in CSM constructor for disk space monitor ("}},{"before":"93b982840a6beba8ba8a48c5c7b4645385349b07","after":"fb9f5cdcb48ab4ab8c3e309395440bc31152cb0b","ref":"refs/heads/master","pushedAt":"2024-01-22T21:36:41.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"mynameborat","name":"Bharath Kumarasubramanian","path":"/mynameborat","primaryAvatarUrl":"https://github.com/avatars/u/46942335?s=80&v=4"},"commit":{"message":"SAMZA-2799: Remove worker.opts handling in shell command builder (#1696)","shortMessageHtmlLink":"SAMZA-2799: Remove worker.opts handling in shell command builder (#1696)"}},{"before":"37b2abd5f92b362d9cd93614f4976c6234977cde","after":null,"ref":"refs/heads/dxichen/SAMZA-2784","pushedAt":"2024-01-20T02:46:12.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"dxichen","name":"Daniel Chen","path":"/dxichen","primaryAvatarUrl":"https://github.com/avatars/u/29577458?s=80&v=4"}},{"before":"082899d85c9031c2a75eae9dd01675a78d944634","after":"93b982840a6beba8ba8a48c5c7b4645385349b07","ref":"refs/heads/master","pushedAt":"2024-01-20T02:46:09.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"dxichen","name":"Daniel Chen","path":"/dxichen","primaryAvatarUrl":"https://github.com/avatars/u/29577458?s=80&v=4"},"commit":{"message":"SAMZA-2784: Remove excessive commit logs (#1695)","shortMessageHtmlLink":"SAMZA-2784: Remove excessive commit logs (#1695)"}},{"before":null,"after":"37b2abd5f92b362d9cd93614f4976c6234977cde","ref":"refs/heads/dxichen/SAMZA-2784","pushedAt":"2024-01-19T23:21:09.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"dxichen","name":"Daniel Chen","path":"/dxichen","primaryAvatarUrl":"https://github.com/avatars/u/29577458?s=80&v=4"},"commit":{"message":"SAMZA-2784: Remove excessive commit logs","shortMessageHtmlLink":"SAMZA-2784: Remove excessive commit logs"}},{"before":"8ed3572bee37c04481be7c831fa455a8305f3fe8","after":"082899d85c9031c2a75eae9dd01675a78d944634","ref":"refs/heads/master","pushedAt":"2023-12-09T00:40:47.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"prateekm","name":"Prateek Maheshwari","path":"/prateekm","primaryAvatarUrl":"https://github.com/avatars/u/1085859?s=80&v=4"},"commit":{"message":"Add MAX_BACKGROUND_JOBS config for RocksDB (#1694)","shortMessageHtmlLink":"Add MAX_BACKGROUND_JOBS config for RocksDB (#1694)"}},{"before":"e1816f3e7f09c27b3642a3e62a356198feb020f7","after":"8ed3572bee37c04481be7c831fa455a8305f3fe8","ref":"refs/heads/master","pushedAt":"2023-11-22T21:49:33.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"mynameborat","name":"Bharath Kumarasubramanian","path":"/mynameborat","primaryAvatarUrl":"https://github.com/avatars/u/46942335?s=80&v=4"},"commit":{"message":"SAMZA-2798: Populate worker.opts in environment variable only if available (#1693)\n\nDescription\r\nPopulate worker.opts in the environment variable only if available in the configs.\r\n\r\nChanges\r\nCheck if worker.opts is present and then add it to environment variable\r\n\r\nTests\r\nUpdated unit tests","shortMessageHtmlLink":"SAMZA-2798: Populate worker.opts in environment variable only if avai…"}},{"before":null,"after":"4ae386e48d4c679eafb0eb046ec70fc6c16bb97f","ref":"refs/heads/SAMZA-2798","pushedAt":"2023-11-22T20:00:49.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"mynameborat","name":"Bharath Kumarasubramanian","path":"/mynameborat","primaryAvatarUrl":"https://github.com/avatars/u/46942335?s=80&v=4"},"commit":{"message":"SAMZA-2798: Populate worker.opts in environment variable only if available","shortMessageHtmlLink":"SAMZA-2798: Populate worker.opts in environment variable only if avai…"}},{"before":"66495b677a728ff75a8674b217672cd51aece640","after":"e1816f3e7f09c27b3642a3e62a356198feb020f7","ref":"refs/heads/master","pushedAt":"2023-11-22T01:21:29.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ajothomas","name":"ajo thomas","path":"/ajothomas","primaryAvatarUrl":"https://github.com/avatars/u/950817?s=80&v=4"},"commit":{"message":"SAMZA-2797: Call flush during stop from CoordinatorStreamWriter (#1692)","shortMessageHtmlLink":"SAMZA-2797: Call flush during stop from CoordinatorStreamWriter (#1692)"}},{"before":"65f31eb6e7da19a39b082635d20f730059aac8cb","after":"66495b677a728ff75a8674b217672cd51aece640","ref":"refs/heads/master","pushedAt":"2023-11-21T19:56:08.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"mynameborat","name":"Bharath Kumarasubramanian","path":"/mynameborat","primaryAvatarUrl":"https://github.com/avatars/u/46942335?s=80&v=4"},"commit":{"message":"SAMZA-2796: Introduce config knob for framework thread sub DAG execution (#1691)\n\nDescription\r\nAs part of SAMZA-2781, we use framework thread pool to execute hand-offs and sub-DAG execution. We want to add a config knob to enable users opt-in to the feature as opposed to enable it by default.\r\n\r\nChanges\r\nIntroduce config knob to use the framework executor\r\n\r\nTests\r\nAdded unit tests\r\n\r\nUsage Instructions\r\nRefer to the configuration documentation. To enable framework thread pool for sub-DAG execution and message hand off, set job.operator.framework.executor.enabled to true","shortMessageHtmlLink":"SAMZA-2796: Introduce config knob for framework thread sub DAG execut…"}},{"before":"2e91dfeff642f75192f28be33071fcb9ce443bb4","after":"65f31eb6e7da19a39b082635d20f730059aac8cb","ref":"refs/heads/master","pushedAt":"2023-11-20T22:02:51.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"mynameborat","name":"Bharath Kumarasubramanian","path":"/mynameborat","primaryAvatarUrl":"https://github.com/avatars/u/46942335?s=80&v=4"},"commit":{"message":"SAMZA-2763: Support worker JVM opts for Samza Beam portable mode (#1689)\n\nSummary: Support JVM options for worker process in Samza Beam portable mode\r\nDescription: With portable mode support for Samza Beam, we want to tune and configure the JVM options for worker process. In this PR, we add support by introducing worker.opts configuration and autosizing integration support.\r\n\r\nChanges:\r\n- Added worker.opts configuration\r\n- Add autosizing integration support for Xmx\r\n- Updated configuration table and website\r\n\r\nAPI Changes: None\r\n\r\nUsage Instructions: worker.opts can be used similar to other samza application configuration although it only applies to Samza Beam portable execution mode and is ignored otherwise.\r\n\r\nUpgrade Instructions: None","shortMessageHtmlLink":"SAMZA-2763: Support worker JVM opts for Samza Beam portable mode (#1689)"}},{"before":null,"after":"740fd99d95fb2ca37f0fcb338b5d224f05ced7fe","ref":"refs/heads/SAMZA-2763","pushedAt":"2023-11-20T22:01:10.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"mynameborat","name":"Bharath Kumarasubramanian","path":"/mynameborat","primaryAvatarUrl":"https://github.com/avatars/u/46942335?s=80&v=4"},"commit":{"message":"Address comments","shortMessageHtmlLink":"Address comments"}},{"before":"24e530d7ec498f2cbb971e7b4a01a6768fd43198","after":"2e91dfeff642f75192f28be33071fcb9ce443bb4","ref":"refs/heads/master","pushedAt":"2023-11-08T18:27:55.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"mynameborat","name":"Bharath Kumarasubramanian","path":"/mynameborat","primaryAvatarUrl":"https://github.com/avatars/u/46942335?s=80&v=4"},"commit":{"message":"SAMZA-2795: Set thread to daemon thread for operator executor service (#1690)\n\nDescription\r\nAs part of SAMZA-2781, we introduced operator executors to manage operator handoff execution. However, the threads created by the executor service are non-daemon and hence prevent the JVM from shutting down.\r\n\r\nFor context, we don't have a clean way to shutdown the executor due to lack of clean lifecycle management of the factory. Hence shutting down the executor service within TaskInstance is not an option and the fix is to make it daemon threads.\r\n\r\nChanges\r\nMake the threads spawned by the operator executor to be non-daemon\r\n\r\nTests\r\nNone\r\n\r\nAPI Changes\r\nNone\r\n\r\nUpgrade Instructions\r\nNone\r\n\r\nUsage Instructions\r\nNone","shortMessageHtmlLink":"SAMZA-2795: Set thread to daemon thread for operator executor service ("}},{"before":"c43f423356e8cb0f487b5211be00896c08a5caca","after":"24e530d7ec498f2cbb971e7b4a01a6768fd43198","ref":"refs/heads/master","pushedAt":"2023-09-11T23:48:59.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"dxichen","name":"Daniel Chen","path":"/dxichen","primaryAvatarUrl":"https://github.com/avatars/u/29577458?s=80&v=4"},"commit":{"message":"Unique System.exit in ContainerLaunchUtil (#1686)","shortMessageHtmlLink":"Unique System.exit in ContainerLaunchUtil (#1686)"}},{"before":"65bf4ae058d4d674cc63c3f172f3f348a4513640","after":"c43f423356e8cb0f487b5211be00896c08a5caca","ref":"refs/heads/master","pushedAt":"2023-08-31T01:01:15.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"mynameborat","name":"Bharath Kumarasubramanian","path":"/mynameborat","primaryAvatarUrl":"https://github.com/avatars/u/46942335?s=80&v=4"},"commit":{"message":"SAMZA-2774: Resolve classpath symlinks (#1653)\n\nManifest generation (application-level) is pointing to container-level symlinks instead of application-level files.\r\n\r\nThis will cause issues when multiple containers for an application land on a single host and the container the manifest is pointing to gets cleaned up.\r\n\r\n[yyy@zzz filecache]$ cat /data/appcache/application_1668550466222_0989/filecache/10/xxx-0.0.672.tgz/classpath_workspace/manifest.txt|tail -n 2\r\n /data/appcache/application_1668550466222_0989/container_e114_1668550466222_0989_01_000003/__package/lib/zstd-jni-1.5.2-3.jar\r\n\r\n[yyy@zzz filecache]$\r\nThis patch is to resolve those symlinks to their application level paths.","shortMessageHtmlLink":"SAMZA-2774: Resolve classpath symlinks (#1653)"}},{"before":"aeeaf0b9782d03932da4ee3da9e15ef9e28ed313","after":"65bf4ae058d4d674cc63c3f172f3f348a4513640","ref":"refs/heads/master","pushedAt":"2023-08-29T21:49:50.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"mynameborat","name":"Bharath Kumarasubramanian","path":"/mynameborat","primaryAvatarUrl":"https://github.com/avatars/u/46942335?s=80&v=4"},"commit":{"message":"SAMZA-2791: Introduce callback timeout specific to watermark messages (#1681)\n\nDescription:\r\nCurrently, watermark is implemented as a special message within Samza. However, in terms of processing semantics, it shares similar behavior to normal messages processed by the task. i.e., task.callback.timeout.ms, a configuration to tune the time until which runloop waits for a message to be processed applies to both watermark and normal messages.\r\n\r\nHowever, this tie up constrains watermark processing logic to be bounded by the processing messages time bound. For Beam on Samza, we use watermark as a trigger to execute event timers which can take a long time depending on the number of timers accumulated. Especially, when the application is down, the timers accumulated could be too large and users will have to tune this configuration which will also impact fault tolerance behavior in case of failures/delays during processing messages.\r\n\r\nChanges:\r\n- Introduce callback timeout configuration specific to watermark\r\n- Update configuration documentation\r\n- Consolidate overload methods for TaskCallbackManager\r\n- Always use watermark specific timeout even when run loop is in draining mode\r\n\r\nAPI Changes:\r\n- Internal change to constructor\r\n\r\nUpgrade Instructions: None\r\n\r\nUsage Instructions:\r\n- Users can configure the timeout for watermark messages using task.callback.watermark.timeout.ms\r\n- Refer to the configuration documentation for more details and defaults.","shortMessageHtmlLink":"SAMZA-2791: Introduce callback timeout specific to watermark messages ("}},{"before":"fa4008ee9c3f6b9499dfdceca1427b76a665420c","after":"aeeaf0b9782d03932da4ee3da9e15ef9e28ed313","ref":"refs/heads/master","pushedAt":"2023-08-29T15:49:32.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"mynameborat","name":"Bharath Kumarasubramanian","path":"/mynameborat","primaryAvatarUrl":"https://github.com/avatars/u/46942335?s=80&v=4"},"commit":{"message":"SAMZA-2790: Cleanup RunLoop constructor explosion (#1680)\n\nDescription:\r\nRunloop currently takes in lot of parameters and the constructor has grown to the point where it is unmanageable with multiple overloads. Introducing new configuration requires lot of updates to existing tests and components even if the parameters have no effect on all of the usages.\r\n\r\nWith this PR, we should be able to decouple different users of RunLoop and enable these components to have their own scoped config. e.g., SideInputManager can now have its own set of runloop parameters without having to tie itself with TaskConfig.\r\n\r\nChanges:\r\nIntroduce RunLoopConfig, a container object to hold all required parameters for runloop from Config.\r\nRemove existing overloads of constructor\r\nSimplify the constructor to take RunLoopConfig and initialize the necessary components and fields\r\nIntroduce SideInputManagerRunLoopConfig, an overload of RunLoopConfig to be used within SideInputManager\r\nModify RunLoopFactory create method signature\r\nClean up ApplicationUtil and moved the method to ApplicationConfig and added unit tests\r\n\r\nAPI Changes:\r\nNo external API change","shortMessageHtmlLink":"SAMZA-2790: Cleanup RunLoop constructor explosion (#1680)"}},{"before":"0e8ac2e6d7ae4ee88538c508a162c5e855a36135","after":"fa4008ee9c3f6b9499dfdceca1427b76a665420c","ref":"refs/heads/master","pushedAt":"2023-08-23T22:45:23.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"dxichen","name":"Daniel Chen","path":"/dxichen","primaryAvatarUrl":"https://github.com/avatars/u/29577458?s=80&v=4"},"commit":{"message":"Ignore file owners comparison on restore when config is set (#1684)\n\n* Ignore file owners comparison on restore when config is set\r\n\r\n* Minor updates\r\n\r\n* Style fix in tests\r\n\r\n---------\r\n\r\nCo-authored-by: Shekhar Sharma ","shortMessageHtmlLink":"Ignore file owners comparison on restore when config is set (#1684)"}},{"before":"1373db238683179123a15b246272cb0955f9b37c","after":"0e8ac2e6d7ae4ee88538c508a162c5e855a36135","ref":"refs/heads/master","pushedAt":"2023-08-18T18:25:51.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"prateekm","name":"Prateek Maheshwari","path":"/prateekm","primaryAvatarUrl":"https://github.com/avatars/u/1085859?s=80&v=4"},"commit":{"message":"Close TaskRestoreManager only after all restores are complete (#1682)\n\nClose TaskRestoreManager only after all restores are complete","shortMessageHtmlLink":"Close TaskRestoreManager only after all restores are complete (#1682)"}},{"before":"3ce67acb0494a47c167f938323917c0bbca32b7d","after":"1373db238683179123a15b246272cb0955f9b37c","ref":"refs/heads/master","pushedAt":"2023-08-10T01:42:46.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"dxichen","name":"Daniel Chen","path":"/dxichen","primaryAvatarUrl":"https://github.com/avatars/u/29577458?s=80&v=4"},"commit":{"message":"[SAMZA-2787] GetDeleted API and Recover from DeletedException (#1676)\n\n* GetDeleted API and Recover from DeletedException in commit for BlobStoreBackendFactory\r\n\r\n* Fix style issues, fix breaking test cases\r\n\r\n* Fixed test failure\r\n\r\n* Fix failing integration test - move storeConsumer start after init() in ContainerStorageManager\r\n\r\n* Style check fix\r\n\r\n* Bug fixes, integration test\r\n\r\n* Delete all blob from deleted SnapshotIndex\r\n\r\n* clean up for final review\r\n\r\n* Add integ test - delete snapshotindex and recover\r\n\r\n* Fix failing integration tests related to init()\r\n\r\n* Cleanup - remove unused code and move some code to util methods\r\n\r\n* Unit test, minor refractoring\r\n\r\n* Review comments - 1st round\r\n\r\n* Refractor code\r\nRemove taskcheckpoints mutation\r\n\r\n* Address review comment\r\n\r\n* Address review comment part 2\r\n\r\n* Fix failing CSM test\r\n\r\n* Addressed final review comments - more logs, consistent naming\r\n\r\n* Checkstyle fix for test class\r\n\r\n---------\r\n\r\nCo-authored-by: Shekhar Sharma \r\nCo-authored-by: Shekhar Sharma ","shortMessageHtmlLink":"[SAMZA-2787] GetDeleted API and Recover from DeletedException (#1676)"}},{"before":"9357a9101c7000296a3fb1cac576e4988db92169","after":"3ce67acb0494a47c167f938323917c0bbca32b7d","ref":"refs/heads/master","pushedAt":"2023-08-04T21:44:11.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"mynameborat","name":"Bharath Kumarasubramanian","path":"/mynameborat","primaryAvatarUrl":"https://github.com/avatars/u/46942335?s=80&v=4"},"commit":{"message":"SAMZA-2789: Remove cap on intermediate stream partition count for stream mode (#1679)\n\nProblem: Intermediate stream partition count inference logic caps the partition size to 256 resulting in imbalances in work assignments to tasks\r\n\r\nDescription: As part of the intermediate partition size inference logic, we currently employ the following algorithm.\r\n\r\npartitionCount = Math.max(maxPartitionSize(inputStreams), maxPartitionSize(outputStreams))\r\ncap the partitionCount to MAX_INFERRED_PARTITIONS defined in the IntermediateStreamManager which is 256\r\napply the inferred partition count to intermediate streams whose partition count is uninitialized\r\nThe logic above always caps the partition size of intermediate streams to 256 for all auto-created intermediate streams. This can prevent the job from scaling up uniformly as the intermediate partition assignment is capped to 256 tasks thereby rendering other tasks imbalanced in case of number tasks > 256.\r\n\r\nChanges:\r\n\r\nApply the cap only for batch mode as 256 limit was introduced for batch mode where number of files (partition) could be large\r\nAdd unit tests for IntermediateStreamManager\r\nMinor java doc fix for DefaultTaskExecutorFactory\r\nTests: Added unit tests for the code changes\r\n\r\nAPI Changes: None\r\n\r\nUpgrade Instructions:\r\n\r\nJobs that are temporarily worked around this constraint by setting job.intermediate.stream.partitions should remove the configuration in order for samza to infer and apply the partition count as described above\r\nJobs that don't use job.intermediate.stream.partitions need no changes.\r\nUsage Instructions: Refer to upgrade instruction.","shortMessageHtmlLink":"SAMZA-2789: Remove cap on intermediate stream partition count for str…"}},{"before":"7dee9977a4cc689d80376c1f53817af90aa5accb","after":"9357a9101c7000296a3fb1cac576e4988db92169","ref":"refs/heads/master","pushedAt":"2023-07-28T18:52:16.000Z","pushType":"pr_merge","commitsCount":4,"pusher":{"login":"sborya","name":"Boris Shkolnik","path":"/sborya","primaryAvatarUrl":"https://github.com/avatars/u/959768?s=80&v=4"},"commit":{"message":"Merge pull request #1678 from Zhangyx39/master\n\nSAMZA-2788: Add new metric to emit 1 when containers are processing","shortMessageHtmlLink":"Merge pull request #1678 from Zhangyx39/master"}},{"before":"f8560047677c7661c4a69b49541464f73148f1b4","after":"7dee9977a4cc689d80376c1f53817af90aa5accb","ref":"refs/heads/master","pushedAt":"2023-07-18T20:13:59.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"shanthoosh","name":null,"path":"/shanthoosh","primaryAvatarUrl":"https://github.com/avatars/u/483745?s=80&v=4"},"commit":{"message":"Add new SamzaApplicationMaster metric to track allocated containers buffered in AM (#1677)\n\n* Add new SamzaApplicationMaster metric to track containers allocated by RM and buffered in AM\r\n\r\n* update TestApplicationMasterRestClient\r\n\r\n* Add allocated-containers-in-buffer in metrics doc","shortMessageHtmlLink":"Add new SamzaApplicationMaster metric to track allocated containers b…"}},{"before":"aa5db44e25e87e84d7fc32dc50ba51347290acd8","after":"f8560047677c7661c4a69b49541464f73148f1b4","ref":"refs/heads/master","pushedAt":"2023-06-20T18:09:08.480Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"mynameborat","name":"Bharath Kumarasubramanian","path":"/mynameborat","primaryAvatarUrl":"https://github.com/avatars/u/46942335?s=80&v=4"},"commit":{"message":"Authenticate GitHub Actions builds to ge.apache.org (#1672)\n\nThis change allows GitHub Actions builds to submit build scans to ge.apache.org by authenticating those builds. The access key has been stored as an organizational secret by the ASF Infrastructure team in the Apache GitHub organization. The access key is not available to workflows triggered from forks.\r\n\r\nThis builds on the changes in https://github.com/apache/samza/pull/1665","shortMessageHtmlLink":"Authenticate GitHub Actions builds to ge.apache.org (#1672)"}},{"before":"cc21b6ebebf829b8c7a47256d2717fc9250ba9c7","after":"aa5db44e25e87e84d7fc32dc50ba51347290acd8","ref":"refs/heads/master","pushedAt":"2023-06-17T02:20:58.315Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"mynameborat","name":"Bharath Kumarasubramanian","path":"/mynameborat","primaryAvatarUrl":"https://github.com/avatars/u/46942335?s=80&v=4"},"commit":{"message":"SAMZA-2778: Make AzureBlobOutputStream buffer initialization size configurable. (#1662)\n\nSymptom:\r\nJVM crashes due to OOM exceptions. When the system has a large number of AzureBlobWriterObjects, the memory becomes heavily fragmented and can be susceptible to crashing when a new AzureBlobOutputStream is created.\r\n\r\nCause:\r\nThe crashes are caused by inefficient memory management when using the G1GC. (The default garbage collector in Java 11+.) What causes (code paths) this issue and why this is a problem for the G1GC are examined separately.\r\n\r\nWhat causes this issue?\r\nThe underlying issue is caused by the AzureBlobOutputStream. When a new instance is created, it creates a new ByteArrayOutputStream initialized to maxFlushThresholdSize.12 The ByteArrayOutputStream is the buffer used by the parent to accumulate messages between flush intervals. It requires 10MB (current default) of memory to initialize. This allows the buffer to accumulate messages without resize operations, however, it does not prevent resizing. The maxFlushThresholdSize is enforced in the parent during write() - see AzureBlobAvroWriter.\r\n\r\nWhy this is a problem for the G1GC?\r\nThe focus here is on the G1GC and humongous objects (G1 specific).3 The G1 GC introduced a new memory management strategy that divides the heap into regions, -XX:G1HeapRegionSize=n. The GC can operate on regions concurrently and copies live objects between regions during full GC to reclaim/empty regions.4 The default behavior creates ~2048 regions, sized to a factor of 2 between 1MB and 32MB. Any object larger than half of a region size, is considered a humongous object.\r\nHumongous objects are allocated an entire region (or consecutive regions if larger than a single region) and are not copied between regions during full GC. Any space not used by the object within a region is non-addressable for the life of the object.5 A JVM heap size of 31GB, -Xmx31G, will default to 16MB regions. Considering the current default size is 10MB, each buffer requires an entire region and prevents the use of 6MB, regardless of the how much data is in the buffer. For a heap smaller than 16GB, each buffer would require multiple regions.\r\nThe 10MB buffer size can exhaust the regions and cause OOMs or create fragmentation that causes OOMs. A fragmentation caused OOM occurs in the following sequence. On new, the JVM attempts to create the object in Eden. If there is insufficient space in Eden a minor GC is performed. If there is insufficient space after minor GC, the object is immediately promoted Old Gen. If there is insufficient space in Old Gen, a full GC is performed. If a full GC cannot allocate memory or region(s) for a humongous object the JVM will exit with OOM.\r\n\r\nChanges:\r\nThe javadocs, where appropriate, have been updated to reflect changes or describe new behaviors. No public APIs were removed, they were marked deprecated and migrated to the new default initialization value. All of the changes are itemized below.\r\n\r\nAzureBlobConfig\r\nAdding two new public fields and one public method. The new configuration is made accessible in the same manner as existing configs (see #configs SEP-26), also consistent the coding-guide. There is a new public config key: SYSTEM_INIT_BUFFER_SIZE - named initBufferSize.bytes. The default value is public field SYSTEM_INIT_BUFFER_SIZE_DEFAULT. The user provided configuration value is accessible with new public method getInitBufferSizeBytes(..). The method returns the configuration value between 32 and getMaxFlushThresholdSize(..) inclusive. 32 is the default initialization size of a ByteArrayOutputStream in the parameterless constructor.\r\n\r\nAzureBlobWriterFactory\r\nThere are two changes to this interface, both to the method, getWriterInstance(..). The existing implementation is marked @Deprecated and a new method with an additional parameter is added. The new parameter is an int that is expected to be the _ initBufferSize_.\r\n\r\nAzureBlobAvroWriterFactory\r\nThe modifications here are consistent with the changes to interface AzureBlobWriterFactory. However, the deprecated implementation uses the new field AzureBlobConfig.SYSTEM_INIT_BUFFER_SIZE_DEFAULT when calling the new public API. This will migrate users to the new initialization behavior.\r\n\r\nAzureBlobAvroWriter\r\nThere are two public, one package-private, and two private changes. Both public changes are to constructors. The existing public constructor is marked deprecated and invokes the new public constructor with the new field AzureBlobConfig.SYSTEM_INIT_BUFFER_SIZE_DEFAULT. The new constructor sets the new private field initBufferSize. The package-private constructor is modified with an additional int parameter and the tests were changed accordingly. The remaining private change modifies creation of new AzureBlobOutputStream instances to include the additional private field initBufferSize.\r\n\r\nAzureBlobOutputStream\r\nThe existing public API is marked @Deprecated and the ByteArrayOutputStream is initialized with the new field AzureBlobConfig.SYSTEM_INIT_BUFFER_SIZE_DEFAULT. The new public constructor includes the new parameter initBufferSize and initializes the ByteArrayOutputStream to that size.","shortMessageHtmlLink":"SAMZA-2778: Make AzureBlobOutputStream buffer initialization size con…"}},{"before":"d138757f7e163673fe9f80dd39131d4b1d3219f8","after":"cc21b6ebebf829b8c7a47256d2717fc9250ba9c7","ref":"refs/heads/master","pushedAt":"2023-06-15T16:24:41.343Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"mynameborat","name":"Bharath Kumarasubramanian","path":"/mynameborat","primaryAvatarUrl":"https://github.com/avatars/u/46942335?s=80&v=4"},"commit":{"message":"Fix scope of caches for DirDiffUtil.areSameFile (#1671)\n\nSummary\r\nFix scope of owner and group name caches in DirDiffUtil.areSameFile\r\n\r\nDetail\r\nIn #1669 the caches were scoped incorrectly and will be re-created each time areSameFile called. Scoping the group and name caches outside of the lambda to cache as expected.\r\n\r\nTest\r\nAdded unit test TestDirDiffUtilAreSameFile.testAreSameFile_Cache to verify the caches are working as expected.\r\n./gradlew check","shortMessageHtmlLink":"Fix scope of caches for DirDiffUtil.areSameFile (#1671)"}},{"before":"03d70a5a19398b8087ab9e2de5319d027438e56f","after":"d138757f7e163673fe9f80dd39131d4b1d3219f8","ref":"refs/heads/master","pushedAt":"2023-06-14T19:11:58.582Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"prateekm","name":"Prateek Maheshwari","path":"/prateekm","primaryAvatarUrl":"https://github.com/avatars/u/1085859?s=80&v=4"},"commit":{"message":"Re-factor DirDiffUtil.getDirDiff to minimize calls to sun.nio.fs.* (#1669)\n\nRe-factor DirDiffUtil.getDirDiff to minimize calls to sun.nio.fs.*","shortMessageHtmlLink":"Re-factor DirDiffUtil.getDirDiff to minimize calls to sun.nio.fs.* (#…"}},{"before":"9111bb744a9c27cd4a568576eaa0eadd7dc0cb88","after":"03d70a5a19398b8087ab9e2de5319d027438e56f","ref":"refs/heads/master","pushedAt":"2023-06-14T14:48:46.354Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"atoomula","name":"Aditya Toomula","path":"/atoomula","primaryAvatarUrl":"https://github.com/avatars/u/24232061?s=80&v=4"},"commit":{"message":"- Fix parameter ReceiverOptions parameters usaged (#1670)\n\n- Change powermock parameters","shortMessageHtmlLink":"- Fix parameter ReceiverOptions parameters usaged (#1670)"}},{"before":"4600dd3dd0497fbffacbea21e5b7b51b2b0209bb","after":"9111bb744a9c27cd4a568576eaa0eadd7dc0cb88","ref":"refs/heads/master","pushedAt":"2023-06-13T23:57:17.687Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"mynameborat","name":"Bharath Kumarasubramanian","path":"/mynameborat","primaryAvatarUrl":"https://github.com/avatars/u/46942335?s=80&v=4"},"commit":{"message":"SAMZA-2781: Use framework thread to execute hand-offs and sub-DAG execution (#1667)\n\nDescription:\r\nCurrently, the operator implementation chains the future using synchronous APIs (thenCompose, thenApply) which results in execution of these method calls on the future completion thread which happens to be the user thread in case of asynchronous operators in application DAG.\r\n\r\nChanges:\r\nUse thread pool inject through job.container.task.executor.factory\r\nExtend the task executor factory to return operator executor \r\nWire in the task executor in TaskContext\r\nDefault implementation which uses #getTaskExecutor if enabled and job.container.thread.pool.size > 1 or fallback to single threaded executor otherwise.\r\n\r\nTesting:\r\nUnit tests for the factory\r\nTest updates for existing OperatorImpl tests.\r\n\r\nAPI Changes:\r\nAdd getOperatorExecutor to TaskExecutorFactory\r\nProvide a default implementation that reuses getTaskExecutor\r\n\r\nUsage Instructions:\r\nRefer to the config documentation to enable operator thread pool and about the task executor factory\r\n\r\nUpgrade Instructions: \r\nNone","shortMessageHtmlLink":"SAMZA-2781: Use framework thread to execute hand-offs and sub-DAG exe…"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEkJr-UAA","startCursor":null,"endCursor":null}},"title":"Activity · apache/samza"}