Data set requires multiple #7

diziet · 2019-03-26T20:00:14Z

I tried a simple dataset to play around with this, and I am running into

ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.

I believe this is because scikit wants there to be multiple iterations of the variable you're trying to predict. Might want to add it to the docs~

Input:
square.txt

Full log if needed:

>$ automl_gs square.csv square
/usr/local/lib/python3.7/site-packages/automl_gs/utils_automl.py:270: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  metrics = yaml.load(f)
Solving a classification problem, maximizing accuracy using tensorflow.

Modeling with field specifications:
real: numeric
fake1: numeric
fake2: numeric
fake3: numeric
fake4: numeric
text: categorical
bool: categorical
/usr/local/lib/python3.7/site-packages/automl_gs/utils_automl.py:126: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  hps = yaml.load(f)
  0%|                                                                                                                                                                         | 0/100 [00:00<?, ?trial/s/usr/local/lib/python3.7/site-packages/automl_gs/utils_automl.py:199: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  metrics = yaml.load(f)[problem_type]
Traceback (most recent call last):
  File "model.py", line 47, in <module>
    model_train(df, encoders, args, model)
  File "../automodel/automl_train/pipeline.py", line 408, in model_train
    for train_indices, val_indices in split.split(np.zeros(y.shape[0]), y):
  File "/usr/local/lib/python3.7/site-packages/sklearn/model_selection/_split.py", line 1315, in split
    for train, test in self._iter_indices(X, y, groups):
  File "/usr/local/lib/python3.7/site-packages/sklearn/model_selection/_split.py", line 1695, in _iter_indices
    raise ValueError("The least populated class in y has only 1"
ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.
                                                                                                                                                                                                        Traceback (most recent call last):                                                                                                                                              | 0/20 [00:00<?, ?epoch/s]
  File "/usr/local/bin/automl_gs", line 10, in <module>
    sys.exit(cmd())
  File "/usr/local/lib/python3.7/site-packages/automl_gs/automl_gs.py", line 175, in cmd
    tpu_address=args.tpu_address)
  File "/usr/local/lib/python3.7/site-packages/automl_gs/automl_gs.py", line 87, in automl_grid_search
    "metadata", "results.csv"))
  File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 702, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 429, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 895, in __init__
    self._make_engine(self.engine)
  File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 1122, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/usr/local/lib/python3.7/site-packages/pandas/io/parsers.py", line 1853, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 387, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 705, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] File b'automl_train/metadata/results.csv' does not exist: b'automl_train/metadata/results.csv'

The text was updated successfully, but these errors were encountered:

minimaxir · 2019-03-26T20:15:35Z

From the provided dataset, the issue here is that it's trying to do a classification problem instead of a regression problem. I don't believe that error is otherwise wrong.

I removed problem_type as a parameter since I thought the heuristic was fine for that; I was apparently wrong. I may need to refine it.

In the meantime, if square is a float (has a decimal), it should work.

avinash-mishra · 2019-04-08T01:17:23Z

Although I have not gone through with code, but I think it can be a problem of stratification also. I have seen this kind of issue when I use stratify inside train_test_split.
I'll check and update.

germanjoey · 2019-04-24T00:49:53Z

Allowing problem_type as a parameter would be very appreciated.

minimaxir added the bug Something isn't working label Mar 26, 2019

kevinsimper mentioned this issue Apr 4, 2019

Survived is put as string minimaxir/automl-gs-examples#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data set requires multiple #7

Data set requires multiple #7

diziet commented Mar 26, 2019 •

edited

Loading

minimaxir commented Mar 26, 2019 •

edited

Loading

avinash-mishra commented Apr 8, 2019

germanjoey commented Apr 24, 2019

Data set requires multiple #7

Data set requires multiple #7

Comments

diziet commented Mar 26, 2019 • edited Loading

minimaxir commented Mar 26, 2019 • edited Loading

avinash-mishra commented Apr 8, 2019

germanjoey commented Apr 24, 2019

diziet commented Mar 26, 2019 •

edited

Loading

minimaxir commented Mar 26, 2019 •

edited

Loading