Skip to content
Randall O'Reilly edited this page Aug 8, 2021 · 2 revisions

The split package has key functions that create and manipulate the Splits list of IdxView index views into a Table.

GroupBy

GroupBy creates splits by grouping together all rows that have the same value in a given (set of) column(s). Here's an example from the dataproc code:

	byMethod := split.GroupBy(PlanetsAll, []string{"method"})
	split.Agg(byMethod, "orbital_period", agg.AggMedian)
	GpMethodOrbit = byMethod.AggsToTable(etable.AddAggName) // etable.AddAggName or etable.ColNameOnly for naming cols

This creates the splits for the "method" column (PlanetsAll is an IdxView of the full set of data), and then Agg computes the Median of the column "orbital_period" for each group. The last line caches out the aggregated data into a new table that can be viewed etc.

Agg

See Agg

Permuted

Permuted splits rows in a random (shuffled, permuted) ordering according to various proportions of row numbers. For example, here's code that creates a random train / test split of a set of patterns:

	all := etable.NewIdxView(ss.Pats)
	splits, _ := split.Permuted(all, []float64{.8, .2}, []string{"Train", "Test"})
	ss.TrainEnv.Table = splits.Splits[0] // IdxView of "Train", 80% of rows
	ss.TestEnv.Table = splits.Splits[1]  // IdxView of "Test", 20% of rows
Clone this wiki locally