Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for TfidfVectorizer.norm attribute #98

Open
mathlf2015 opened this issue Jul 6, 2018 · 3 comments
Open

Support for TfidfVectorizer.norm attribute #98

mathlf2015 opened this issue Jul 6, 2018 · 3 comments

Comments

@mathlf2015
Copy link

mathlf2015 commented Jul 6, 2018

I was recently looking for a solution to transfer machine learning model across platforms between python and java. i want to use the TfidfVectorizer .however .the model can fit succsess.but can't save.the code as follows.
anaconda python 3.6
linux

import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn2pmml.decoration import ContinuousDomain
from sklearn2pmml.pipeline import PMMLPipeline
from sklearn2pmml import sklearn2pmml
from sklearn_pandas import DataFrameMapper

testdata = pd.DataFrame({'pet': ['cat aaa', 'dog  ddd', 'dog  ccc', 'fish eee fff', 'cat ccc aaa ddd', 'dog ddd fff', 'cat ccc', 'fish fff'
        ], 'age': [4., 6, 3, 3, 2, 3, 5, 4], 'salary': [90, 24, 44, 27, 32, 59, 36, 27]})

mapper = DataFrameMapper([
        ('pet', TfidfVectorizer()),
        ])
vod_pipeline = PMMLPipeline([
        ("mapper", mapper),
    ("classifier", LogisticRegression()
     )  ])

testdata['label'] = [1,1,1,1,1,0,0,0]
vod_pipeline.fit(testdata,testdata['label'])
print(vod_pipeline.score(testdata,testdata['label']))

sklearn2pmml(vod_pipeline, '11.pmml', with_repr=True,debug=True)

the debug as follows

0.75
python: 3.6.1
sklearn: 0.19.1
sklearn.externals.joblib: 0.11
pandas: 0.23.0
sklearn_pandas: 1.6.0
sklearn2pmml: 0.36.1
java: 1.8.0_131
Executing command:
java -cp /root/anaconda3/lib/python3.6/site-packages/sklearn2pmml/resources/guava-25.1-jre.jar:/root/anaconda3/lib/python3.6/site-packages/sklearn2pmml/resources/jaxb-api-2.3.0.jar:/root/anaconda3/lib/python3.6/site-packages/sklearn2pmml/resources/javax.activation-api-1.2.0.jar:/root/anaconda3/lib/python3.6/site-packages/sklearn2pmml/resources/jaxb-runtime-2.3.0.1.jar:/root/anaconda3/lib/python3.6/site-packages/sklearn2pmml/resources/jcommander-1.72.jar:/root/anaconda3/lib/python3.6/site-packages/sklearn2pmml/resources/jpmml-xgboost-1.3.1.jar:/root/anaconda3/lib/python3.6/site-packages/sklearn2pmml/resources/slf4j-api-1.7.25.jar:/root/anaconda3/lib/python3.6/site-packages/sklearn2pmml/resources/istack-commons-runtime-3.0.5.jar:/root/anaconda3/lib/python3.6/site-packages/sklearn2pmml/resources/jaxb-core-2.3.0.1.jar:/root/anaconda3/lib/python3.6/site-packages/sklearn2pmml/resources/jpmml-sklearn-1.5.4.jar:/root/anaconda3/lib/python3.6/site-packages/sklearn2pmml/resources/jpmml-lightgbm-1.2.1.jar:/root/anaconda3/lib/python3.6/site-packages/sklearn2pmml/resources/pmml-model-1.4.2.jar:/root/anaconda3/lib/python3.6/site-packages/sklearn2pmml/resources/pyrolite-4.20.jar:/root/anaconda3/lib/python3.6/site-packages/sklearn2pmml/resources/slf4j-jdk14-1.7.25.jar:/root/anaconda3/lib/python3.6/site-packages/sklearn2pmml/resources/serpent-1.23.jar:/root/anaconda3/lib/python3.6/site-packages/sklearn2pmml/resources/pmml-model-metro-1.4.2.jar:/root/anaconda3/lib/python3.6/site-packages/sklearn2pmml/resources/pmml-agent-1.4.2.jar:/root/anaconda3/lib/python3.6/site-packages/sklearn2pmml/resources/jpmml-converter-1.3.2.jar org.jpmml.sklearn.Main --pkl-pipeline-input /tmp/pipeline-vbufsjpg.pkl.z --pmml-output 11.pmml
Standard output is empty
Standard error:
Jul 06, 2018 12:52:57 AM org.jpmml.sklearn.Main run
INFO: Parsing PKL..
Jul 06, 2018 12:52:57 AM org.jpmml.sklearn.Main run
INFO: Parsed PKL in 30 ms.
Jul 06, 2018 12:52:57 AM org.jpmml.sklearn.Main run
INFO: Converting..
Jul 06, 2018 12:52:57 AM org.jpmml.sklearn.Main run
SEVERE: Failed to convert
java.lang.IllegalArgumentException: l2
	at sklearn.feature_extraction.text.TfidfVectorizer.encodeFeatures(TfidfVectorizer.java:73)
	at sklearn_pandas.DataFrameMapper.initializeFeatures(DataFrameMapper.java:75)
	at sklearn.Initializer.encodeFeatures(Initializer.java:41)
	at sklearn.pipeline.Pipeline.encodeFeatures(Pipeline.java:81)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:192)
	at org.jpmml.sklearn.Main.run(Main.java:145)
	at org.jpmml.sklearn.Main.main(Main.java:94)

Exception in thread "main" java.lang.IllegalArgumentException: l2
	at sklearn.feature_extraction.text.TfidfVectorizer.encodeFeatures(TfidfVectorizer.java:73)
	at sklearn_pandas.DataFrameMapper.initializeFeatures(DataFrameMapper.java:75)
	at sklearn.Initializer.encodeFeatures(Initializer.java:41)
	at sklearn.pipeline.Pipeline.encodeFeatures(Pipeline.java:81)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:192)
	at org.jpmml.sklearn.Main.run(Main.java:145)
	at org.jpmml.sklearn.Main.main(Main.java:94)

Preserved joblib dump file(s): /tmp/pipeline-vbufsjpg.pkl.z
Traceback (most recent call last):
  File "test4.py", line 31, in <module>
    sklearn2pmml(vod_pipeline, '11.pmml', with_repr=True,debug=True)
  File "/root/anaconda3/lib/python3.6/site-packages/sklearn2pmml/__init__.py", line 237, in sklearn2pmml
    raise RuntimeError("The JPMML-SkLearn conversion application has failed. The Java executable should have printed more information about the failure into its standard output and/or standard error streams")
RuntimeError: The JPMML-SkLearn conversion application has failed. The Java executable should have printed more information about the failure into its standard output and/or standard error streams
@mathlf2015
Copy link
Author

i find the same problem here ,but can't get the idea to sovle this problem
https://stackoverflow.com/questions/44560823/generate-pmml-for-text-classification-pipeline-in-python

@vruusmann
Copy link
Member

Exception in thread "main" java.lang.IllegalArgumentException: l2

The TfidfVectorizer.norm attribute is not supported.

You have it set to "l2", but you need to set it to None.

@vruusmann vruusmann changed the title sklearn2pmml save machine learning model error Support for TfidfVectorizer.norm attribute Jul 6, 2018
@mathlf2015
Copy link
Author

mathlf2015 commented Jul 6, 2018

thank you very much . and best regards. i can't solve this problem without your help.
and finally the model saved succsess.
the code change as follows.

from sklearn2pmml.feature_extraction.text import Splitter
#before change
mapper = DataFrameMapper([
        ('pet', TfidfVectorizer()),
        ])

#under change
mapper = DataFrameMapper([
        ('pet', TfidfVectorizer(norm=None,analyzer = "word", tokenizer = Splitter())),
        ])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants