From 27fb8f0387d95774745227a363cb6320c7b2d56f Mon Sep 17 00:00:00 2001 From: welisheva22 Date: Wed, 14 Aug 2024 01:42:52 -0400 Subject: [PATCH] =?UTF-8?q?Update=20data=5Fclassification=5Fpolicy.rst=20-?= =?UTF-8?q?--=20copy=20edits=20(grammar,=20consis=E2=80=A6=20(#1139)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Update data_classification_policy.rst --- copy edits (grammar, consistency, clarity) Signed-off-by: welisheva22 Co-authored-by: Yoav Katz <68273864+yoavkatz@users.noreply.github.com> --- docs/docs/data_classification_policy.rst | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/docs/data_classification_policy.rst b/docs/docs/data_classification_policy.rst index 1742593ba..9b163b235 100644 --- a/docs/docs/data_classification_policy.rst +++ b/docs/docs/data_classification_policy.rst @@ -12,11 +12,11 @@ The section discusses how to properly handle sensitive data in Unitxt in order t proprietary/confidential/personal data to unauthorized services or 3rd parties. For example, sending sensitive data for inference by an external API in LLM as Judge metric. -The problem is exacerbated since the person who owns the data and uses the metric in their card, -may not know what 3rd services are used by internally by the metric. +The problem is exacerbated since the person who owns the data and uses the metric in their card +may not know what 3rd party services are used internally by the metric. -To address this Unitxt allows the data owner to specify the data classification of their data, and require that -any metric (or other component) that processes the data, must be explicitly allowed to process data with this classification. +To address this, Unitxt allows the data owner to specify the data classification of their data, and similarly it requires that +any metric (or other component) that processes the data must be explicitly allowed to process data with this classification. Data classification policy @@ -28,14 +28,14 @@ You can define your own data classification identifiers. Each component that processes data in Unitxt ( operators, metrics, inference engines, etc.) also has a parameter called `data_classification_policy`. This parameter determines which kinds of data -it can process. The parameter is also a list of string identifiers, which are names of allowed data classification. +it can process. The parameter is also a list of string identifiers, each of which is a name of allowed data classification. Before processing the data, the component verifies that the `data_classification_policy` of the data meets its `data_classification_policy`. If the policies for a component include the classification of the data, then the data may be further processed. Otherwise, an error will be raised. -For example, a LLM as judge that calls an external api, may set `data_classification_policy` to `['public']`. +For example, an LLM as judge that calls an external api may set `data_classification_policy` to `['public']`. If data marked [`confidential`] is passed to the metric, it will not process the data and fail. -If the data has multiple `data_classification_policy`s then the component must be allowed to handle all of them. +If the data has multiple values under `data_classification_policy` then the component must be allowed to handle all of them. If the `data_classification_policy` is not set, the component can handle all data. It is possible to override the `data_classification_policy` of a component with an environment variable. See below. @@ -45,7 +45,7 @@ Adding `data_classification_policy` for data Data classification information is added to streams of data by the use of Unitxt loaders. Existing loaders have default data classification policies. For example, LoadHF sets the policy to `['public']` for datasets -downloaded from the Huggingface and `['proprietary']` for datasets loaded from local files. You can override this by setting +downloaded from the HuggingFace and `['proprietary']` for datasets loaded from local files. You can override this by setting the `data_classification_policy` parameter of the loader. The data classification value is added as an additional field to all instances within a stream. @@ -105,8 +105,8 @@ Example: 1. **Overriding default policy during environment variable **: -You can override the data classification of artifacts that was saved in the catalog, by setting the the `UNITXT_DATA_CLASSIFICATION_POLICY` env variable accordingly. -It should be of string representation of type `Dict[str, List[str]]`, where a key is a name of a given artifact, and a corresponding value of allowed data classification. For example: +You can override the data classification of artifacts that was saved in the catalog by setting the `UNITXT_DATA_CLASSIFICATION_POLICY` env variable accordingly. +It should be a string representation of type `Dict[str, List[str]]`, where a key is a name of a given artifact, and a corresponding value is the allowed data classification. For example: .. code-block:: bash