-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comparing predictions with modified feature values in a pmml pipeline #427
Comments
This is a tricky one. The difficulty lies in the "make two predictions with the same model, first for the original dataset and then for the (slightly modified-) dataset" part, not in the post-processing part, This task is especially difficult to solve using Scikit-Learn APIs. Is there are way to simplify the pipeline somehow? What is the intended type of the estimator class? The example above uses I'm asking, because if I needed to solve this problem for the If the model type is something other than |
Something along those lines: class MyComparingPredictor(RegressorMixin):
def __init__(self, estimator):
self.estimator = estimator
def fit(self, X, y):
self.estimator_ = clone(self.estimator)
self.estimator_.fit(X, y)
return self
def predict(self, X):
y = self.estimator_.predict(X)
# Very simplified approach
X['day'] = "Thu"
y_thu = self.estimator_.predict(X)
# Very simplified approach
return numpy.maximum(y, y_thu) The idea of introducing a I don't see any easy way of achieving the same using Scikit-Learn built-in estimator and meta-estimator classes. |
I cannot see any alternative to this idea, I just want to implement the dynamic comparision of prediction between two inputs as per the day_of_week value. Make changes if it doesnt satisfy my condition. And wrap everything up inside the pipeline so that I can save it as xml file for future inference as a balck box model. I want to implement it using XGBoost regressor right now. But if its possible to implement using Linear functions like linear regressor, it would be fine too. |
I'm not challenging your idea here. I'm just telling, that based on the mining function/type of your model, there are different paths available. For example, when using
How would you implement the "prediction with two inputs" part using Scikit-Learn core APIs? Feel free to ignore the PMML conversion part right now (because that's easy). |
There doesn't seem to be any interest to this issue anymore, so closing it with some final comments. First, temporal features (such as "day", "time", etc) that are represented as strings, are not suitable input for Scikit-Learn's For example, the fitted OE transformer translates to the following PMML markup: <DerivedField name="encoder(day)" optype="categorical" dataType="double">
<MapValues outputColumn="data:output">
<FieldColumnPair field="day" column="data:input"/>
<InlineTable>
<row>
<data:input>Fri</data:input>
<data:output>0.0</data:output>
</row>
<row>
<data:input>Sat</data:input>
<data:output>1.0</data:output>
</row>
<row>
<data:input>Sun</data:input>
<data:output>2.0</data:output>
</row>
<row>
<data:input>Thur</data:input>
<data:output>3.0</data:output>
</row>
</InlineTable>
</MapValues>
</DerivedField> Note the effective ordering of category levels: "Fri" < "Sat" < "Sun" < "Thur". Anyway, the above means that the "Thur" level is one notch higher than the "Sun" level. It can be seen from the <RegressionTable intercept="0.9125054862129849">
<NumericPredictor name="total_bill" coefficient="0.09409416367475298"/>
<NumericPredictor name="continuous(encoder(sex))" coefficient="-0.027662269634805225"/>
<NumericPredictor name="continuous(encoder(smoker))" coefficient="-0.08640116920553004"/>
<NumericPredictor name="continuous(encoder(day))" coefficient="-0.005437194272218469"/>
<NumericPredictor name="continuous(encoder(time))" coefficient="0.0023380980487601316"/>
<NumericPredictor name="continuous(encoder(size))" coefficient="0.18066261819092744"/>
</RegressionTable>
Putting the above two observations together, it can be seen that the prediction for "Thur" case (
The condition |
Is there any way where i get to do two predcition. First the model has predicted with original day value ,
let say we get prediction p .Now suppose i want to replace 'Sun' with 'Thu' keeping all features same and
do prediction in pmml pipeline to get prediction q
Now we want to compare these p and q , if p<q then make p=q+1 .
Is there any way we can achive this within the pipeline, so we can save the pipeline in a file to infer later?
The text was updated successfully, but these errors were encountered: