API for customizing `OutputField` names #361

vruusmann · 2022-11-04T06:33:35Z

Inspired by this (again!):
#359

There should be a Python-accessible API for instructing the PMML conversion engine to override default field names with user-specified field names.

The Alias decorator typically creates a copy of the field. As a result, the PMML model schema will contain both the old "badly named" field declaration, plus the new "well named" field declaration. This is confusing for model end users.

I'm thinking about some PMMLPipeline-level attribute, which could be set using a convenience method:

pipeline = PMMLPipeline([...])
pipeline.fit(X, y)

# THIS!
mapping = {
  "probability(0)" : "proba_no",
  "probability(1)" : "proba_yes"
}
pipeline.rename_pmml_fields(mapping)

sklearn2pmml(pipeline, "pipeline.pmml")

The exported PMML file would contain only "proba_no" and "proba_yes" fields.

The text was updated successfully, but these errors were encountered:

DTKx · 2022-12-04T14:56:48Z

Hello!
a question there related to naming of attributes currently how do I pass the names of my attributes when exporting my model to my pmml file? Is there any example link with an example of usage you can pass me?

More specifically I am trying to load the model in java using this library and I cannot find a way to get the names.

        <dependency>
            <groupId>org.jpmml</groupId>
            <artifactId>pmml-evaluator-metro</artifactId>
            <version>1.6.4</version>
        </dependency>

Any example link you could share would be helpful.
Many thanks in advance

vruusmann · 2022-12-04T15:59:57Z

I am trying to load the model in java using this library and I cannot find a way to get the names.

See https://github.com/jpmml/jpmml-evaluator#querying-the-data-schema-of-models

how do I pass the names of my attributes when exporting my model to my pmml file?

The values of PMMLPipeline.active_fields show up as o.j.e.Evaluator#getActiveFields().
The values of PMMLPipeline.target_fields (ie. primary results) show up as o.j.e.Evaluator#getTargetFields()
The values of estimator object methods (ie. secondary outputs, such as Scikit-Learn's predict_proba, apply etc. methods) show up as o.j.e.Evaluator#getOutputFields().

Renaming active and target fields is straightforward - just initialize your PMMLPipeline object correctly.

Renaming output fields will be addressed by this issue. Right now they are generated using fairly reasonable patterns (eg. all probability outputs are named probability(<target category value>)). However, some people do not like my field naming conventions, and want to go with their own custom names instead.

There's also a distinct category of fields called "derived fields", which correspond to Scikit-Learn transformer objects. They are named after Scikit-Learn class names by default (eg. a StandardScaler object will give rise to one or more "standardScaler" derived fields). These defaults can be overriden by setting the pmml_name_ attribute.

vruusmann · 2022-12-04T16:03:55Z

Any example link you could share would be helpful.

If you have any field naming questions - backed by concrete Scikit-Learn/Python code snippets - please ask them here. Will do my best to answer them, and maybe it will give me some new ideas for designing a better fix for this issue.

Once this issue is resolved, I'll hope to do a quick overview in the form of a small technical article at https://openscoring.io/blog/

vruusmann · 2024-03-04T05:25:04Z

Currently doable using the Model Customization API:

from sklearn2pmml.util.pmml import make_element

pipeline = PMMLPipeline([...])
pipeline.fit(X, y)

# Define a "skeletal" PMML element, which defines changeable attributes.
# Here, only the OutputField@name attribute will be changed, all other attributes will remain as-is
updated_output_field = make_element("OutputField", name = "p(0)")

# Point the update action towards the existing OutputField element that is named "probability(0)"
pipeline.customize(command = "update", xpath_expr = "//:OutputField[@name='probability(0)']", pmml_element = updated_output_field.tostring().decode("utf-8"))

vruusmann mentioned this issue Nov 9, 2022

Ability to suppress the (default-) Output element jpmml/jpmml-sklearn#180

Open

vruusmann changed the title ~~API for renaming fields in place~~ API for customizing OutputField names Dec 28, 2022

vruusmann mentioned this issue Apr 24, 2023

Disambiguating the output fields of multi-output models (example: The value for field "probability(0)" has already been defined) jpmml/jpmml-sklearn#184

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API for customizing `OutputField` names #361

API for customizing `OutputField` names #361

vruusmann commented Nov 4, 2022

DTKx commented Dec 4, 2022

vruusmann commented Dec 4, 2022

vruusmann commented Dec 4, 2022

vruusmann commented Mar 4, 2024

API for customizing OutputField names #361

API for customizing OutputField names #361

Comments

vruusmann commented Nov 4, 2022

DTKx commented Dec 4, 2022

vruusmann commented Dec 4, 2022

vruusmann commented Dec 4, 2022

vruusmann commented Mar 4, 2024

API for customizing `OutputField` names #361

API for customizing `OutputField` names #361