feat: OpenAI embeddings with GPU based KNN #2157

vonodiripsa · 2024-01-18T05:40:55Z

Added new OpenAI embeddings Quickstart demo with GPU based KNN using NVIDIA Rapids

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

It is a new docs notebook demonstrating usage of NVIDIA Rabids KNN on GPU.

How is this patch tested?

I have written tests (not required for typo or doc fix) and confirmed the proposed feature/bug-fix/change works.

Does this PR change any dependencies?

No. You can skip this section.
Yes. Make sure the dependencies are resolved correctly. It depends on GPU based compute with Init Script installing NVIDIA Rapids KNN.

Does this PR add a new feature? If so, have you added samples on website?

No. You can skip this section.
Yes.

Added OpenAI embeddings with GPU based KNN using NVIDIA Rapids

github-actions · 2024-01-18T05:41:10Z

Hey @vonodiripsa 👋!
Thank you so much for contributing to our repository 🙌.
Someone from SynapseML Team will be reviewing this pull request soon.

We use semantic commit messages to streamline the release process.
Before your pull request can be merged, you should make sure your first commit and PR title start with a semantic prefix.
This helps us to create release messages and credit you for your hard work!

Examples of commit messages with semantic prefixes:

fix: Fix LightGBM crashes with empty partitions
feat: Make HTTP on Spark back-offs configurable
docs: Update Spark Serving usage
build: Add codecov support
perf: improve LightGBM memory usage
refactor: make python code generation rely on classes
style: Remove nulls from CNTKModel
test: Add test coverage for CNTKModel

To test your commit locally, please follow our guild on building from source.
Check out the developer guide for additional guidance on testing your change.

mhamilton723 · 2024-01-18T17:04:27Z

Please clear the output in this notebook before checking it in so that the diff is minimal

mhamilton723 · 2024-01-18T17:04:34Z

/azp run

azure-pipelines · 2024-01-18T17:04:47Z

Azure Pipelines successfully started running 1 pipeline(s).

mhamilton723 · 2024-01-18T17:07:35Z

Well also want to try to get this init script to run on the databricks clusters we spin up so that it tests properly. Can you add the init script to a file in say the tools/init_scripts directory. That way we can just link people to it, and we can upload it during the build. Well also want to add this to the GPU tests on databricks, see the nbtest folder for pointers to the GPU databricks test runner

codecov-commenter · 2024-01-18T17:28:58Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (fa9ba2e) 84.49% compared to head (85e1094) 84.47%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2157      +/-   ##
==========================================
- Coverage   84.49%   84.47%   -0.03%     
==========================================
  Files         325      325              
  Lines       16959    16959              
  Branches     1524     1524              
==========================================
- Hits        14330    14326       -4     
- Misses       2629     2633       +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Added init script to install repids ml using cuda 11.8

vonodiripsa · 2024-01-18T23:45:41Z

Corrected the semantic prefix and added init script

mhamilton723 · 2024-01-22T19:36:09Z

can you remove output from the notebook please?

Removed outputs

bvonodiripsa · 2024-01-31T23:53:43Z

@microsoft-github-policy-service agree company="Microsoft"

With GPU KNN notebook test code

Added GPU test code to OpenAI with KNN notebook

bvonodiripsa · 2024-02-01T02:34:57Z

/azp run

azure-pipelines · 2024-02-01T02:35:06Z

Azure Pipelines successfully started running 1 pipeline(s).

bvonodiripsa · 2024-02-01T02:35:10Z

/azp run

azure-pipelines · 2024-02-01T02:35:20Z

Azure Pipelines successfully started running 1 pipeline(s).

vonodiripsa · 2024-02-01T02:46:06Z

@microsoft-github-policy-service agree company="NVIDIA"

bvonodiripsa · 2024-02-01T02:50:16Z

/azp run

azure-pipelines · 2024-02-01T02:50:26Z

Azure Pipelines successfully started running 1 pipeline(s).

bvonodiripsa · 2024-02-01T05:37:26Z

/azp run

azure-pipelines · 2024-02-01T05:37:37Z

Azure Pipelines successfully started running 1 pipeline(s).

Fixed style errors

Suggested by Mark

bvonodiripsa · 2024-02-03T00:33:08Z

/azp run

azure-pipelines · 2024-02-03T00:33:18Z

Azure Pipelines successfully started running 1 pipeline(s).

mhamilton723 · 2024-02-03T05:49:56Z

core/src/test/scala/com/microsoft/azure/synapse/ml/nbtest/DatabricksUtilities.scala

-    .filterNot(_.getAbsolutePath.contains("Fine-tune"))
+    .filterNot(_.getAbsolutePath.contains("GPU"))
    .filterNot(_.getAbsolutePath.contains("Explanation Dashboard")) // TODO Remove this exclusion

-  val GPUNotebooks: Seq[File] = ParallelizableNotebooks.filter(_.getAbsolutePath.contains("Fine-tune"))
+  val GPUNotebooks: Seq[File] = ParallelizableNotebooks
+    .filter(file =>
+     file.getAbsolutePath.contains("GPU"))


please keep fine-tine in the filternots, you had it right last time

you need to set this back to the || expression you had

bvonodiripsa · 2024-02-03T07:16:43Z

/azp run

azure-pipelines · 2024-02-03T07:16:53Z

Azure Pipelines successfully started running 1 pipeline(s).

core/src/test/scala/com/microsoft/azure/synapse/ml/nbtest/DatabricksUtilities.scala

…bricksUtilities.scala Added "Fine-tune" again Co-authored-by: Mark Hamilton <mhamilton723@gmail.com>

Added Fine-tune back

bvonodiripsa · 2024-02-03T18:05:47Z

/azp run

azure-pipelines · 2024-02-03T18:05:58Z

Azure Pipelines successfully started running 1 pipeline(s).

Create cluster using init script

bvonodiripsa · 2024-02-05T07:05:58Z

/azp run

azure-pipelines · 2024-02-05T07:06:09Z

Azure Pipelines successfully started running 1 pipeline(s).

Corrected parameters

Added Rapids cluster name

mhamilton723 · 2024-02-08T23:40:47Z

core/src/test/scala/com/microsoft/azure/synapse/ml/nbtest/DatabricksRapidsTests.scala

+
+class DatabricksRapidsTests extends DatabricksTestHelper {
+
+  val clusterId: String = createClusterInPool(GPUClusterName, AdbGpuRuntime, 2, GpuPoolId, RapidsInitScripts)


Suggested change

val clusterId: String = createClusterInPool(GPUClusterName, AdbGpuRuntime, 2, GpuPoolId, RapidsInitScripts)

val clusterId: String = createClusterInPool(GPUClusterName, AdbGpuRuntime, 1, GpuPoolId, RapidsInitScripts)

Reduced number of nodes to 1

bvonodiripsa · 2024-02-09T04:13:25Z

/azp run

azure-pipelines · 2024-02-09T04:13:36Z

Azure Pipelines successfully started running 1 pipeline(s).

Fixed imports

bvonodiripsa · 2024-02-09T04:39:12Z

/azp run

azure-pipelines · 2024-02-09T04:39:23Z

Azure Pipelines successfully started running 1 pipeline(s).

OpenAI embeddings with GPU based KNN

7e4fad9

Added OpenAI embeddings with GPU based KNN using NVIDIA Rapids

vonodiripsa requested a review from mhamilton723 as a code owner January 18, 2024 05:40

Added databricks rapids ml init script

69840b0

Added init script to install repids ml using cuda 11.8

vonodiripsa changed the title ~~OpenAI embeddings with GPU based KNN~~ doc: OpenAI embeddings with GPU based KNN Jan 18, 2024

vonodiripsa changed the title ~~doc: OpenAI embeddings with GPU based KNN~~ docs: OpenAI embeddings with GPU based KNN Jan 18, 2024

vonodiripsa changed the title ~~docs: OpenAI embeddings with GPU based KNN~~ feat: OpenAI embeddings with GPU based KNN Jan 18, 2024

No outputs

d02183f

Removed outputs

vonodiripsa added 2 commits January 31, 2024 18:12

Added testing code

c2c22e7

With GPU KNN notebook test code

Added GPU test code

f492496

Added GPU test code to OpenAI with KNN notebook

Merge branch 'master' into master

4d50dfb

bvonodiripsa and others added 2 commits January 31, 2024 21:32

Fixed extra bracket

bbab925

Removed extra bracket

d125f88

vonodiripsa added 2 commits February 2, 2024 16:25

Removed invalid esc char

4f05e6a

Fixed style errors

Removed Fine-tune

6f34552

Suggested by Mark

mhamilton723 reviewed Feb 3, 2024

View reviewed changes

removed init script text

6c745d7

mhamilton723 reviewed Feb 3, 2024

View reviewed changes

core/src/test/scala/com/microsoft/azure/synapse/ml/nbtest/DatabricksUtilities.scala Show resolved Hide resolved

vonodiripsa and others added 4 commits February 3, 2024 09:22

Update core/src/test/scala/com/microsoft/azure/synapse/ml/nbtest/Data…

0f287f3

…bricksUtilities.scala Added "Fine-tune" again Co-authored-by: Mark Hamilton <mhamilton723@gmail.com>

Update DatabricksUtilities.scala

fe2ad09

Added Fine-tune back

Reverse style changes

d30baaf

Corrected style

1009eea

Added GPUInitScript to createClusterInPool

c971334

Create cluster using init script

vonodiripsa added 3 commits February 8, 2024 15:30

Corrected to have a separate Rapids test

79c9557

Update DatabricksRapidsTests.scala

7a3886c

Corrected parameters

Update DatabricksUtilities.scala

0f02798

Added Rapids cluster name

mhamilton723 reviewed Feb 8, 2024

View reviewed changes

Update DatabricksRapidsTests.scala

ee1c5e3

Reduced number of nodes to 1

Update DatabricksRapidsTests.scala

85e1094

Fixed imports

mhamilton723 approved these changes Feb 9, 2024

View reviewed changes

mhamilton723 merged commit 2836cf3 into microsoft:master Feb 9, 2024
66 of 68 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: OpenAI embeddings with GPU based KNN #2157

feat: OpenAI embeddings with GPU based KNN #2157

vonodiripsa commented Jan 18, 2024

github-actions bot commented Jan 18, 2024

mhamilton723 commented Jan 18, 2024

mhamilton723 commented Jan 18, 2024

azure-pipelines bot commented Jan 18, 2024

mhamilton723 commented Jan 18, 2024

codecov-commenter commented Jan 18, 2024 •

edited

Loading

vonodiripsa commented Jan 18, 2024

mhamilton723 commented Jan 22, 2024

bvonodiripsa commented Jan 31, 2024

bvonodiripsa commented Feb 1, 2024

azure-pipelines bot commented Feb 1, 2024

bvonodiripsa commented Feb 1, 2024

azure-pipelines bot commented Feb 1, 2024

vonodiripsa commented Feb 1, 2024

bvonodiripsa commented Feb 1, 2024

azure-pipelines bot commented Feb 1, 2024

bvonodiripsa commented Feb 1, 2024

azure-pipelines bot commented Feb 1, 2024

bvonodiripsa commented Feb 3, 2024

azure-pipelines bot commented Feb 3, 2024

mhamilton723 Feb 3, 2024

mhamilton723 Feb 3, 2024

bvonodiripsa commented Feb 3, 2024

azure-pipelines bot commented Feb 3, 2024

bvonodiripsa commented Feb 3, 2024

azure-pipelines bot commented Feb 3, 2024

bvonodiripsa commented Feb 5, 2024

azure-pipelines bot commented Feb 5, 2024

mhamilton723 Feb 8, 2024

bvonodiripsa commented Feb 9, 2024

azure-pipelines bot commented Feb 9, 2024

bvonodiripsa commented Feb 9, 2024

azure-pipelines bot commented Feb 9, 2024


		class DatabricksRapidsTests extends DatabricksTestHelper {

		val clusterId: String = createClusterInPool(GPUClusterName, AdbGpuRuntime, 2, GpuPoolId, RapidsInitScripts)

feat: OpenAI embeddings with GPU based KNN #2157

feat: OpenAI embeddings with GPU based KNN #2157

Conversation

vonodiripsa commented Jan 18, 2024

Related Issues/PRs

What changes are proposed in this pull request?

How is this patch tested?

Does this PR change any dependencies?

Does this PR add a new feature? If so, have you added samples on website?

github-actions bot commented Jan 18, 2024

mhamilton723 commented Jan 18, 2024

mhamilton723 commented Jan 18, 2024

azure-pipelines bot commented Jan 18, 2024

mhamilton723 commented Jan 18, 2024

codecov-commenter commented Jan 18, 2024 • edited Loading

Codecov Report

vonodiripsa commented Jan 18, 2024

mhamilton723 commented Jan 22, 2024

bvonodiripsa commented Jan 31, 2024

bvonodiripsa commented Feb 1, 2024

azure-pipelines bot commented Feb 1, 2024

bvonodiripsa commented Feb 1, 2024

azure-pipelines bot commented Feb 1, 2024

vonodiripsa commented Feb 1, 2024

bvonodiripsa commented Feb 1, 2024

azure-pipelines bot commented Feb 1, 2024

bvonodiripsa commented Feb 1, 2024

azure-pipelines bot commented Feb 1, 2024

bvonodiripsa commented Feb 3, 2024

azure-pipelines bot commented Feb 3, 2024

mhamilton723 Feb 3, 2024

Choose a reason for hiding this comment

mhamilton723 Feb 3, 2024

Choose a reason for hiding this comment

bvonodiripsa commented Feb 3, 2024

azure-pipelines bot commented Feb 3, 2024

bvonodiripsa commented Feb 3, 2024

azure-pipelines bot commented Feb 3, 2024

bvonodiripsa commented Feb 5, 2024

azure-pipelines bot commented Feb 5, 2024

mhamilton723 Feb 8, 2024

Choose a reason for hiding this comment

bvonodiripsa commented Feb 9, 2024

azure-pipelines bot commented Feb 9, 2024

bvonodiripsa commented Feb 9, 2024

azure-pipelines bot commented Feb 9, 2024

codecov-commenter commented Jan 18, 2024 •

edited

Loading