Skip to content

Commit

Permalink
Merge branch 'main' into feature/damian/adapt_large_models
Browse files Browse the repository at this point in the history
  • Loading branch information
dbogunowicz committed May 9, 2023
2 parents 89fc184 + 4019f52 commit 3370562
Show file tree
Hide file tree
Showing 17 changed files with 99 additions and 212 deletions.
2 changes: 1 addition & 1 deletion CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ a project may be further defined and clarified by project maintainers.
## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the project team at <community@neuralmagic.com>. All
reported by contacting the project team using the [Neural Magic Contact Us Form](https://neuralmagic.com/contact). All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an incident.
Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ For documentation edits, include:

## Question or Problem

Sign up or log in to our [**Deep Sparse Community Slack**](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ). We are growing the community member by member and happy to see you there. Don’t forget to search through existing discussions to avoid duplication! Thanks!
Sign up or log in to our [**Neural Magic Community Slack**](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ). We are growing the community member by member and happy to see you there. Don’t forget to search through existing discussions to avoid duplication! Thanks!

## Developing SparseML

Expand Down
4 changes: 2 additions & 2 deletions DEVELOPING.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,10 @@ limitations under the License.

# Developing SparseML

SparseML is developed and tested using Python 3.7-3.9.
SparseML is developed and tested using Python 3.7-3.10.
To develop SparseML, you will also need the development dependencies and to follow the styling guidelines.

Here's some details to get started.
Here are some details to get started.

## Basic Commands

Expand Down
14 changes: 5 additions & 9 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -95,23 +95,19 @@ ARG MODE
RUN \
if [ -n "$BRANCH" ] ; then \
echo Installing from BRANCH && \
$VENV/bin/pip install --no-cache-dir "./sparseml[onnxruntime,torchvision,ultralytics]"; \
$VENV/bin/pip install --no-cache-dir "./sparseml[onnxruntime,torchvision,transformers,yolov5,ultralytics]"; \
elif [ "$MODE" = "nightly" ] ; then \
if [ -z $VERSION] ; then \
$VENV/bin/pip install --no-cache-dir "sparseml-nightly[onnxruntime,torchvision,ultralytics]"; \
$VENV/bin/pip install --no-cache-dir "sparseml-nightly[onnxruntime,torchvision,transformers,yolov5,ultralytics]"; \
else \
$VENV/bin/pip install --no-cache-dir "sparseml-nightly[onnxruntime,torchvision,ultralytics]==$VERSION"; \
$VENV/bin/pip install --no-cache-dir "sparseml-nightly[onnxruntime,torchvision,transformers,yolov5,ultralytics]==$VERSION"; \
fi; \
elif [ -z $VERSION] ; then \
$VENV/bin/pip install --no-cache-dir "sparseml[onnxruntime,torchvision,ultralytics]"; \
$VENV/bin/pip install --no-cache-dir "sparseml[onnxruntime,torchvision,transformers,yolov5,ultralytics]"; \
else \
$VENV/bin/pip install --no-cache-dir "sparseml[onnxruntime,torchvision,ultralytics]==$VERSION"; \
$VENV/bin/pip install --no-cache-dir "sparseml[onnxruntime,torchvision,transformers,yolov5,ultralytics]==$VERSION"; \
fi;

RUN sparseml.transformers.question_answering --help \
&& sparseml.yolov5.train --help \
&& sparseml.ultralytics.train --help


FROM cuda_builder AS container_branch_dev
ARG VENV
Expand Down
2 changes: 1 addition & 1 deletion docs/source/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ limitations under the License.

# Installation

This repository is tested on Python 3.7-3.9, and Linux/Debian systems.
This repository is tested on Python 3.7-3.10, and Linux/Debian systems.
It is recommended to install in a [virtual environment](https://docs.python.org/3/library/venv.html) to keep your system in order.
Currently supported ML Frameworks are the following: `torch>=1.1.0,<=1.8.0`, `tensorflow>=1.8.0,<=2.0.0`, `tensorflow.keras >= 2.2.0`.

Expand Down
2 changes: 1 addition & 1 deletion research/mfac/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,4 +102,4 @@ and [gradual](https://github.com/neuralmagic/sparseml/blob/main/research/mfac/tu
pruning with M-FAC.

## Need Help?
For Neural Magic Support, sign up or log in to our [**Deep Sparse Community Slack**](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ). Bugs, feature requests, or additional questions can also be posted to our [GitHub Issue Queue.](https://github.com/neuralmagic/sparseml/issues)
For Neural Magic Support, sign up or log in to our [**Neural Magic Community Slack**](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ). Bugs, feature requests, or additional questions can also be posted to our [GitHub Issue Queue.](https://github.com/neuralmagic/sparseml/issues)
4 changes: 2 additions & 2 deletions research/mfac/tutorials/gradual_pruning_with_mfac.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ should be used.

## Need Help?

For Neural Magic Support, sign up or log in to our [**Deep Sparse Community Slack**](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ). Bugs, feature requests, or additional questions can also be posted to our [GitHub Issue Queue.](https://github.com/neuralmagic/sparseml/issues)
For Neural Magic Support, sign up or log in to our [**Neural Magic Community Slack**](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ). Bugs, feature requests, or additional questions can also be posted to our [GitHub Issue Queue.](https://github.com/neuralmagic/sparseml/issues)

## Setting Up

Expand Down Expand Up @@ -145,4 +145,4 @@ In this tutorial you applied both M-FAC and magnitude pruning with SparseML and
their results. More information about M-FAC pruning and other tutorials can be found
[here](https://github.com/neuralmagic/sparseml/blob/main/research/mfac).

For Neural Magic Support, sign up or log in to our [**Deep Sparse Community Slack**](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ). Bugs, feature requests, or additional questions can also be posted to our [GitHub Issue Queue.](https://github.com/neuralmagic/sparseml/issues)
For Neural Magic Support, sign up or log in to our [**Neural Magic Community Slack**](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ). Bugs, feature requests, or additional questions can also be posted to our [GitHub Issue Queue.](https://github.com/neuralmagic/sparseml/issues)
12 changes: 12 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,16 @@
"torchvision>=0.3.0,<=0.14",
]
_pytorch_vision_deps = _pytorch_deps + ["torchvision>=0.3.0,<=0.14"]
_transformers_deps = _pytorch_deps + [
f"{'nm-transformers' if is_release else 'nm-transformers-nightly'}"
f"~={version_nm_deps}",
"datasets<=1.18.4",
"scikit-learn",
"seqeval",
]
_yolov5_deps = _pytorch_vision_deps + [
f"{'nm-yolov5' if is_release else 'nm-yolov5-nightly'}~={version_nm_deps}"
]
_tensorflow_v1_deps = ["tensorflow<2.0.0", "tensorboard<2.0.0", "tf2onnx>=1.0.0,<1.6"]
_tensorflow_v1_gpu_deps = [
"tensorflow-gpu<2.0.0",
Expand Down Expand Up @@ -132,10 +142,12 @@ def _setup_extras() -> Dict:
"torch": _pytorch_deps,
"torch_all": _pytorch_all_deps,
"torchvision": _pytorch_vision_deps,
"transformers": _transformers_deps,
"tf_v1": _tensorflow_v1_deps,
"tf_v1_gpu": _tensorflow_v1_gpu_deps,
"tf_keras": _keras_deps,
"ultralytics": _ultralytics_deps,
"yolov5": _yolov5_deps,
}


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,7 @@ To learn more, refer to the [appropriate documentation in the DeepSparse reposit
## Support
For Neural Magic Support, sign up or log in to our [Deep Sparse Community Slack](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ). Bugs, feature requests, or additional questions can also be posted to our [GitHub Issue Queue](https://github.com/neuralmagic/sparseml/issues).
For Neural Magic Support, sign up or log in to our [Neural Magic Community Slack](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ). Bugs, feature requests, or additional questions can also be posted to our [GitHub Issue Queue](https://github.com/neuralmagic/sparseml/issues).
[torch]: https://pytorch.org/
Expand Down
24 changes: 24 additions & 0 deletions src/sparseml/pytorch/torchvision/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -267,6 +267,8 @@ def load_data(traindir, valdir, args):
traindir,
presets.ClassificationPresetTrain(
crop_size=train_crop_size,
mean=args.rgb_mean,
std=args.rgb_std,
interpolation=interpolation,
auto_augment_policy=auto_augment_policy,
random_erase_prob=random_erase_prob,
Expand All @@ -289,6 +291,8 @@ def load_data(traindir, valdir, args):
else:
preprocessing = presets.ClassificationPresetEval(
crop_size=val_crop_size,
mean=args.rgb_mean,
std=args.rgb_std,
resize_size=val_resize_size,
interpolation=interpolation,
)
Expand Down Expand Up @@ -1212,6 +1216,26 @@ def new_func(*args, **kwargs):
"Note: Will be read from the checkpoint if not specified"
),
)
@click.option(
"--rgb-mean",
nargs=3,
default=(0.485, 0.456, 0.406),
type=float,
help=(
"RGB mean values used to shift input RGB values; "
"Note: Will use ImageNet values if not specified."
),
)
@click.option(
"--rgb-std",
default=(0.229, 0.224, 0.225),
nargs=3,
type=float,
help=(
"RGB standard-deviation values used to normalize input RGB values; "
"Note: Will use ImageNet values if not specified."
),
)
@click.pass_context
def cli(ctx, **kwargs):
"""
Expand Down
97 changes: 3 additions & 94 deletions src/sparseml/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,111 +20,20 @@

import logging as _logging

import pkg_resources
from sparseml.analytics import sparseml_analytics as _analytics


_analytics.send_event("python__transformers__init")

_EXPECTED_VERSION = "4.23.1"


_LOGGER = _logging.getLogger(__name__)
_NM_TRANSFORMERS_TAR_TEMPLATE = (
"https://github.com/neuralmagic/transformers/releases/download/"
"{version}/transformers-4.23.1-py3-none-any.whl"
)
_NM_TRANSFORMERS_NIGHTLY = _NM_TRANSFORMERS_TAR_TEMPLATE.format(version="nightly")


def _install_transformers_and_deps():

import subprocess as _subprocess
import sys as _sys

import sparseml as _sparseml

nm_transformers_release = (
"nightly" if not _sparseml.is_release else f"v{_sparseml.version_major_minor}"
)
transformers_requirement = _NM_TRANSFORMERS_TAR_TEMPLATE.format(
version=nm_transformers_release
)
try:
_subprocess.check_call(
[
_sys.executable,
"-m",
"pip",
"install",
transformers_requirement,
"datasets<=1.18.4",
"scikit-learn",
"seqeval",
]
)

import transformers as _transformers

_LOGGER.info("sparseml-transformers and dependencies successfully installed")
except Exception:
raise ValueError(
"Unable to install and import sparseml-transformers dependencies check "
"that transformers is installed, if not, install via "
f"`pip install {_NM_TRANSFORMERS_NIGHTLY}`"
)


def _check_transformers_install():
transformers_version = next(
(
pkg.version
for pkg in pkg_resources.working_set
if pkg.project_name.lower() == "transformers"
),
None,
)

# Either no transformers install is found or wrong version installed
if transformers_version != _EXPECTED_VERSION:
import os

if os.getenv("NM_NO_AUTOINSTALL_TRANSFORMERS", False):
_LOGGER.warning(
"Unable to import, skipping auto installation "
"due to NM_NO_AUTOINSTALL_TRANSFORMERS"
)
# skip any further checks
return
else:
_LOGGER.warning(
f"sparseml-transformers v{_EXPECTED_VERSION} installation not "
f"detected. Installing sparseml-transformers v{_EXPECTED_VERSION} "
"dependencies if transformers is already installed in the "
"environment, it will be overwritten. Set environment variable "
"NM_NO_AUTOINSTALL_TRANSFORMERS to disable"
)
_install_transformers_and_deps()

else:
import transformers as _transformers

# Edge case where user has expected version of transformers installed, but
# not the nm integrated one
if not _transformers.NM_INTEGRATED:
_install_transformers_and_deps()
raise RuntimeError(
"Installed transformers package has been overwritten with "
"sparseml-transformers. Stopping process as this is likely to cause "
"import issues. Please re-run command"
)

# re check import after potential install
try:
import transformers as _transformers
# check for NM integration in transformers version
import transformers as _transformers

assert _transformers.NM_INTEGRATED
except Exception:
if not _transformers.NM_INTEGRATED:
_LOGGER.warning(
"the neuralmagic fork of transformers may not be installed. it can be "
"installed via "
Expand Down
2 changes: 1 addition & 1 deletion src/sparseml/transformers/export.py
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,7 @@ def load_task_dataset(

data_training_args = DataTrainingArguments(**data_args)
return get_tokenized_token_classification_dataset(
data_args=data_training_args, tokenizer=tokenizer, model=model
data_args=data_training_args, tokenizer=tokenizer, model=model or config
)

if (
Expand Down
21 changes: 12 additions & 9 deletions src/sparseml/transformers/text_classification.py
Original file line number Diff line number Diff line change
Expand Up @@ -497,6 +497,7 @@ def compute_metrics(p: EvalPrediction):
label_list=label_list,
model=model,
num_labels=num_labels,
config=config,
)
id_to_label = {id_: label for label, id_ in label_to_id.items()}

Expand Down Expand Up @@ -687,7 +688,7 @@ def _get_label_info(data_args, raw_datasets):
def _get_tokenized_and_preprocessed_raw_datasets(
config,
data_args: DataTrainingArguments,
model: Module,
model: Optional[Module],
raw_datasets,
tokenizer: transformers.PreTrainedTokenizerBase,
teacher_tokenizer=None,
Expand All @@ -705,6 +706,7 @@ def _get_tokenized_and_preprocessed_raw_datasets(
) = _get_label_info(data_args, raw_datasets)

train_dataset = predict_dataset = eval_dataset = None
config = model.config if model else config
if not main_process_func:
main_process_func = lambda desc: nullcontext(desc) # noqa: E731

Expand Down Expand Up @@ -753,15 +755,15 @@ def _get_tokenized_and_preprocessed_raw_datasets(
# Some models have set the order of the labels to use, so let's make sure
# we do use it
label_to_id = _get_label_to_id(
data_args, is_regression, label_list, model, num_labels
data_args, is_regression, label_list, model, num_labels, config=config
)

if label_to_id is not None:
model.config.label2id = label_to_id
model.config.id2label = {id: label for label, id in config.label2id.items()}
config.label2id = label_to_id
config.id2label = {id: label for label, id in config.label2id.items()}
elif data_args.task_name is not None and not is_regression:
model.config.label2id = {l: i for i, l in enumerate(label_list)}
model.config.id2label = {id: label for label, id in config.label2id.items()}
config.label2id = {l: i for i, l in enumerate(label_list)}
config.id2label = {id: label for label, id in config.label2id.items()}

max_seq_length = data_args.max_seq_length
if max_seq_length > tokenizer.model_max_length:
Expand Down Expand Up @@ -841,15 +843,16 @@ def preprocess_function(examples):
return tokenized_datasets, raw_datasets


def _get_label_to_id(data_args, is_regression, label_list, model, num_labels):
def _get_label_to_id(data_args, is_regression, label_list, model, num_labels, config):
label_to_id = None
config = model.config if model else config
if (
model.config.label2id != PretrainedConfig(num_labels=num_labels).label2id
config.label2id != PretrainedConfig(num_labels=num_labels).label2id
and data_args.task_name is not None
and not is_regression
):
# Some have all caps in their config, some don't.
label_name_to_id = {k.lower(): v for k, v in model.config.label2id.items()}
label_name_to_id = {k.lower(): v for k, v in config.label2id.items()}
if list(sorted(label_name_to_id.keys())) == list(sorted(label_list)):
label_to_id = {
i: int(label_name_to_id[label_list[i]]) for i in range(num_labels)
Expand Down
Loading

0 comments on commit 3370562

Please sign in to comment.