Initial Extraction From Dolomite Engine #1

fabianlim · 2024-06-20T09:57:35Z

This is the initial extraction from the dolomite engine repo.

Extracted models:

hf_models/models/gpt_dolomite
~~hf_models/models/moe_dolomite~~

Conversion from HF supported

hf_models/model_conversion/bigcode
hf_models/model_conversion/llama
~~hf_models/model_conversion/mixtral~~

TODO:

adding CI (pypi, linting)
remove some more unused code
- modeling_utils/normalization/rmsnorm/torchtitan.py
- modeling_utils/normalization/rmsnorm/apex.py
- modeling_utils/normalization/layernorm/apex.py
- modeling_utils/normalization/layernorm/apex_persistent.py
- modeling_utils/embedding/ParameterizedEmbedding
- modeling_utils/linear/ParameterizedLinear
adding some more notices
adding the conversion utilities.

mayank31398 · 2024-06-20T17:11:22Z

@fabianlim I would also suggest dropping ParametertizedEmbedding and ParametertizedLinear and using the linear and embedding from torch directly
They are just for an experimental project I was working on.

fabianlim · 2024-06-21T08:00:24Z

@aldopareja this is more or less ok, but missing the notices. what do we want to put in the header of every file?

like this?

# this code has been extracted from https://github.com/ibm-granite/dolomite-engine

RobotSail · 2024-06-21T16:49:29Z

We should also add the same publishing CI that we use elsewhere in instructlab so that it's easy to get stuff published.

RobotSail · 2024-06-21T16:50:01Z

This one instructlab/training#31

RobotSail · 2024-06-21T16:54:15Z

This one instructlab/training#31

Sorry not that one, this one: instructlab/training#42

RobotSail · 2024-06-21T17:00:23Z

Nvm, I created a PR for publishing here, ignore the above comments: #2

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

README.md

mayank31398 · 2024-06-21T23:01:45Z

wow!!! 4k lines of code already :)

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

mayank31398 · 2024-06-21T23:57:40Z

Yikes!! all of my comments were ignored 🤣

fabianlim · 2024-06-22T04:04:29Z

@mayank31398 I thought you only gave these 2 comments

removed the Parameterized version 0b7367b
adjust the authorship 853c987

Was there anything else?

mayank31398 · 2024-06-21T22:48:52Z

src/instructlab/dolomite/gradient_checkpointing/block.py

same as above

mayank31398 · 2024-06-21T22:49:09Z

src/instructlab/dolomite/gradient_checkpointing/__init__.py

do we need this checkpointing logic?
or should this be in instructlab training repo?

mayank31398 · 2024-06-21T22:49:52Z

src/instructlab/dolomite/hf_models/config.py

We can move this code to GPTDolomiteConfig I think
Since this repo is just GPTDolomite, we don't need CommonConfig class maybe

mayank31398 · 2024-06-21T22:50:17Z

src/instructlab/dolomite/hf_models/defaults.py

lets drop this since we are removing all other normalization implementations

mayank31398 · 2024-06-21T22:52:32Z

src/instructlab/dolomite/hf_models/modeling_utils/embedding.py

do we need this as a file?

mayank31398 · 2024-06-21T22:54:05Z

src/instructlab/dolomite/hf_models/modeling_utils/normalization/rmsnorm/base.py

same as above

mayank31398 · 2024-06-21T22:55:22Z

src/instructlab/dolomite/hf_models/modeling_utils/position_embedding/rope.py

+class YaRNScaledRoPE(RoPE):
+    def __init__(
+        self,
+        head_dim: int,
+        max_position_embeddings: int = 2048,
+        base: int = 10000,
+        scale: float = 1,
+        original_max_position_embeddings: int = 2048,
+        extrapolation_factor: float = 1,
+        attn_factor: float = 1,
+        beta_fast: int = 32,
+        beta_slow: int = 1,
+    ) -> None:
+        torch.nn.Module.__init__(self)
+
+        self.head_dim = head_dim
+        self.max_position_embeddings = max_position_embeddings
+        self.base = base
+        self.scale = scale
+        self.original_max_position_embeddings = original_max_position_embeddings
+        self.extrapolation_factor = extrapolation_factor
+        self.attn_factor = attn_factor
+        self.beta_fast = beta_fast
+        self.beta_slow = beta_slow
+
+        # Get n-d magnitude scaling corrected for interpolation
+        self.mscale = _yarn_get_mscale(self.scale) * self.attn_factor
+
+        self.reset_parameters()
+
+    def reset_parameters(self) -> None:
+        pos_freqs = self.base ** (
+            torch.arange(0, self.head_dim, 2).float() / self.head_dim
+        )
+        inv_freq_extrapolation = 1.0 / pos_freqs
+        inv_freq_interpolation = 1.0 / (self.scale * pos_freqs)
+
+        low, high = _yarn_find_correction_range(
+            self.beta_fast,
+            self.beta_slow,
+            self.head_dim,
+            self.base,
+            self.original_max_position_embeddings,
+        )
+        inv_freq_mask = (
+            (1 - _yarn_linear_ramp_mask(low, high, self.head_dim // 2).float())
+            * self.extrapolation_factor
+        )  # Get n-d rotational scaling corrected for extrapolation
+        inv_freq = (
+            inv_freq_interpolation * (1 - inv_freq_mask)
+            + inv_freq_extrapolation * inv_freq_mask
+        )
+        self.register_buffer("inv_freq", inv_freq, persistent=False)
+
+        # pylint: disable=no-value-for-parameter
+        self._set_cos_sin_cache(
+            self.max_position_embeddings, dtype=torch.get_default_dtype()
+        )


lets drop yarn, I haven't tested this so not sure if logic is correct.

mayank31398 · 2024-06-21T22:55:47Z

src/instructlab/dolomite/hf_models/modeling_utils/position_embedding/rope.py

+# Inverse dim formula to find dim based on number of rotations
+def _yarn_find_correction_dim(
+    num_rotations: int, dim: int, base: int = 10000, max_position_embeddings: int = 2048
+) -> float:
+    return (dim * math.log(max_position_embeddings / (num_rotations * 2 * math.pi))) / (
+        2 * math.log(base)
+    )
+
+
+# Find dim range bounds based on rotations
+def _yarn_find_correction_range(
+    low_rot: int,
+    high_rot: int,
+    dim: int,
+    base: int = 10000,
+    max_position_embeddings: int = 2048,
+) -> int:
+    low = math.floor(
+        _yarn_find_correction_dim(low_rot, dim, base, max_position_embeddings)
+    )
+    high = math.ceil(
+        _yarn_find_correction_dim(high_rot, dim, base, max_position_embeddings)
+    )
+    return max(low, 0), min(high, dim - 1)  # Clamp values just in case
+
+
+def _yarn_linear_ramp_mask(min: float, max: float, dim: int) -> torch.Tensor:
+    if min == max:
+        max += 0.001  # Prevent singularity
+
+    linear_func = (torch.arange(dim, dtype=torch.float32) - min) / (max - min)
+    ramp_func = torch.clamp(linear_func, 0, 1)
+    return ramp_func
+
+
+def _yarn_get_mscale(scale: float = 1) -> float:
+    if scale <= 1:
+        return 1.0
+    return 0.1 * math.log(scale) + 1.0


same as above comment about dropping yarn

mayank31398 · 2024-06-21T22:57:07Z

src/instructlab/dolomite/hf_models/models/gpt_dolomite/config.py

move CommonConfig logic to this class maybe?

mayank31398 · 2024-06-21T22:59:01Z

src/instructlab/dolomite/utils/wrapper.py

this might is also not needed since this is used for gradient checkpointing/ FSDP wrapping

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* addressed missed out comments in #1, except checkpointing Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * ruff + lint Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * removed gradient checkpointing Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * moved config file and commented on rope scaling. Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> --------- Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

fabianlim mentioned this pull request Jun 20, 2024

Switch To InstructLab Dolomite Repo instructlab/training#55

Merged

fabianlim requested review from aldopareja and mayank31398 June 21, 2024 08:18

fabianlim force-pushed the initial branch from 797c1e4 to 2f8c279 Compare June 21, 2024 15:56

fabianlim changed the title ~~Initial Extraction From Dolomite Engine : DO NOT MERGE~~ Initial Extraction From Dolomite Engine Jun 21, 2024

fabianlim force-pushed the initial branch 3 times, most recently from 587d240 to 3636587 Compare June 21, 2024 16:06

fabianlim added 13 commits June 21, 2024 16:50

initial extraction

e75f539

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

add checkpoint conversion and remove titan

edae699

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

removed unneeded norms

a478a42

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

remove ParameterizedEmbedding and ParameterizedLinear

0b7367b

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

run ruff

d7e0933

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

clean up more linting

1cf5b99

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

remove logger, parallel

3585af9

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

also remove packages

823345c

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

fmt again

b091404

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

fix export

bbf6103

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

add workflow

2059242

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

fix isort

a4f14ab

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

add header

485d3d9

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

n1hility force-pushed the initial branch from 3636587 to 485d3d9 Compare June 21, 2024 21:50

n1hility approved these changes Jun 21, 2024

View reviewed changes

mayank31398 reviewed Jun 21, 2024

View reviewed changes

README.md Outdated Show resolved Hide resolved

adjust authorship

853c987

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

fabianlim merged commit 6d0760e into main Jun 21, 2024
5 checks passed

fabianlim deleted the initial branch June 21, 2024 23:56

mayank31398 reviewed Jun 22, 2024

View reviewed changes

fabianlim added a commit that referenced this pull request Jun 22, 2024

addressed missed out comments in #1, except checkpointing

a2718fa

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

fabianlim mentioned this pull request Jun 22, 2024

Addressed missed out comments in PR #1 #3

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial Extraction From Dolomite Engine #1

Initial Extraction From Dolomite Engine #1

fabianlim commented Jun 20, 2024 •

edited

Loading

mayank31398 commented Jun 20, 2024

fabianlim commented Jun 21, 2024 •

edited

Loading

RobotSail commented Jun 21, 2024

RobotSail commented Jun 21, 2024

RobotSail commented Jun 21, 2024

RobotSail commented Jun 21, 2024

mayank31398 commented Jun 21, 2024

mayank31398 commented Jun 21, 2024

fabianlim commented Jun 22, 2024

mayank31398 Jun 21, 2024

mayank31398 Jun 21, 2024

mayank31398 Jun 21, 2024

mayank31398 Jun 21, 2024

mayank31398 Jun 21, 2024

mayank31398 Jun 21, 2024

mayank31398 Jun 21, 2024

mayank31398 Jun 21, 2024

mayank31398 Jun 21, 2024

mayank31398 Jun 21, 2024

Initial Extraction From Dolomite Engine #1

Initial Extraction From Dolomite Engine #1

Conversation

fabianlim commented Jun 20, 2024 • edited Loading

mayank31398 commented Jun 20, 2024

fabianlim commented Jun 21, 2024 • edited Loading

RobotSail commented Jun 21, 2024

RobotSail commented Jun 21, 2024

RobotSail commented Jun 21, 2024

RobotSail commented Jun 21, 2024

mayank31398 commented Jun 21, 2024

mayank31398 commented Jun 21, 2024

fabianlim commented Jun 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fabianlim commented Jun 20, 2024 •

edited

Loading

fabianlim commented Jun 21, 2024 •

edited

Loading