API Specification

The modules and subpackages listed here are the basis of a stable API of ggml, intended for end users.

Core

Ignite cache API.

class ggml.core.Cache(proxy, cache_filter=None, preprocessor=None)

Bases: ggml.common.Proxy

Internal constructor that creates a wrapper of Apache Ignite cache. User is expected to use Ignite object to create cache instead of this constructor.

__init__(proxy, cache_filter=None, preprocessor=None)

Constructs a wrapper of Apache Ignite cache. It’s internal method, user is expected to use Ignite methods to create or get cache.

Parameters:
  • proxy – Py4J proxy that represents Apache Ignite Cache,
  • cache_filter – Py4J proxy that represents filter,
  • preprocessor – Py4J proxy that represents preprocessor.
filter(cache_filter)

Filters this cache using specified filter.

Parameters:filter – Filter to be used to filter cache.
get(key)

Returns value (float array) by the specified key.

Parameters:key – Key to be taken from cache.
head(n=5)

Returns top N elements represented as a pandas dataset.

Parameters:n – Number of rows to be returned.
put(key, value)

Puts value (float array) by the specified key.

Parameters:
  • key – Key to be put into cache,
  • value – value to be taken from cache.
transform(preprocessor)

Transform this cache using specfied preprocessor.

Parameters:preprocessor – Preprocessor to be used to transform cache.
class ggml.core.Ignite(cfg=None)

Bases: object

__init__(cfg=None)

Constructs a new instance of Ignite that is required to work with Cache, IGFS storage and distributed inference.

Parameters:cfg – Path to Apache Ignite configuration file.
create_cache(name, excl_neighbors=False, parts=10)

Creates a new Apache Ignite Cache using specified name and configuration. This module is built with assumption that Ignite Cache contains integer keys and double[] values.

Parameters:
  • name – Name of the Apache Ignite cache,
  • excl_neighbors – (optional, False by default) exclude neighbours,
  • parts – (optional, 10 by default) number of partitions.
get_cache(name)

Returns existing Apache Ignite Cache by name. This module is built with assumption that Ignite Cache contains integer keys and double[] values.

Parameters:name – Name of the Apache Ignite cache.

Regression

Regression trainers.

class ggml.regression.DecisionTreeRegressionTrainer(env_builder=<ggml.common.LearningEnvironmentBuilder object>, max_deep=5, min_impurity_decrease=0.0, compressor=None, use_index=True)

Bases: ggml.regression.RegressionTrainer

DecisionTree regression trainer.

__init__(env_builder=<ggml.common.LearningEnvironmentBuilder object>, max_deep=5, min_impurity_decrease=0.0, compressor=None, use_index=True)

Constructs a new instance of DecisionTree regression trainer.

env_builder : Environment builder. max_deep : Max deep. min_impurity_decrease : Min impurity decrease. compressor : Compressor.

class ggml.regression.KNNRegressionTrainer(env_builder=<ggml.common.LearningEnvironmentBuilder object>)

Bases: ggml.regression.RegressionTrainer

KNN regression trainer.

__init__(env_builder=<ggml.common.LearningEnvironmentBuilder object>)

Constructs a new instance of linear regression trainer.

env_builder : Environment builder.

class ggml.regression.LinearRegressionTrainer(env_builder=<ggml.common.LearningEnvironmentBuilder object>)

Bases: ggml.regression.RegressionTrainer

Linear regression trainer.

__init__(env_builder=<ggml.common.LearningEnvironmentBuilder object>)

Constructs a new instance of linear regression trainer.

env_builder : Environment builder.

class ggml.regression.MLPArchitecture(input_size)

Bases: ggml.common.Proxy

MLP architecture.

__init__(input_size)

Constructs a new instance of MLP architecture.

input_size : Input size.

with_layer(neurons, has_bias=True, activator='sigmoid')

Add layer.

neurons : Number of neurons. has_bias : Has bias or not (default value is True). activator : Activation function (‘sigmoid’, ‘relu’ or ‘linear’, default value is ‘sigmoid’)

class ggml.regression.MLPRegressionTrainer(arch, env_builder=<ggml.common.LearningEnvironmentBuilder object>, loss='mse', learning_rate=0.1, max_iter=1000, batch_size=100, loc_iter=10, seed=None)

Bases: ggml.regression.RegressionTrainer

MLP regression trainer.

__init__(arch, env_builder=<ggml.common.LearningEnvironmentBuilder object>, loss='mse', learning_rate=0.1, max_iter=1000, batch_size=100, loc_iter=10, seed=None)

Constructs a new instance of MLP regression trainer.

env_builder : Environment builder. arch : Architecture. loss : Loss function (‘mse’, ‘log’, ‘l2’, ‘l1’ or ‘hinge’, default value is ‘mse’). update_strategy : Update strategy. max_iter : Max number of iterations. batch_size : Batch size. loc_iter : Number of local iterations. seed : Seed.

class ggml.regression.RandomForestRegressionTrainer(features, env_builder=<ggml.common.LearningEnvironmentBuilder object>, trees=1, sub_sample_size=1.0, max_depth=5, min_impurity_delta=0.0, seed=None)

Bases: ggml.regression.RegressionTrainer

RandomForest classification trainer.

__init__(features, env_builder=<ggml.common.LearningEnvironmentBuilder object>, trees=1, sub_sample_size=1.0, max_depth=5, min_impurity_delta=0.0, seed=None)

Constructs a new instance of RandomForest classification trainer.

features : Number of features. env_builder : Environment builder. trees : Number of trees. sub_sample_size : Sub sample size. max_depth : Max depth. min_impurity_delta : Min impurity delta. seed : Seed.

class ggml.regression.RegressionTrainer(proxy, multiple_labels=False, accepts_matrix=False)

Bases: ggml.common.SupervisedTrainer, ggml.common.Proxy

Regression.

__init__(proxy, multiple_labels=False, accepts_matrix=False)

Constructs a new instance of regression trainer.

fit(X, y=None)

Trains model based on data.

X : x. y : y.

fit_on_cache(cache)

Trains model based on data.

cache : Apache Ignite cache.

Classification

Classification trainers.

class ggml.classification.ANNClassificationTrainer(env_builder=<ggml.common.LearningEnvironmentBuilder object>, k=2, max_iter=10, eps=0.0001, distance='euclidean')

Bases: ggml.classification.ClassificationTrainer

ANN classification trainer.

__init__(env_builder=<ggml.common.LearningEnvironmentBuilder object>, k=2, max_iter=10, eps=0.0001, distance='euclidean')

Constructs a new instance of ANN classification trainer.

env_builder : Environment builder. k : Number of clusters. max_iter : Max number of iterations. eps : Epsilon, delta of convergence. distance : Distance measure (‘euclidean’, ‘hamming’, ‘manhattan’).

class ggml.classification.ClassificationTrainer(proxy)

Bases: ggml.common.SupervisedTrainer, ggml.common.Proxy

Classification trainer.

__init__(proxy)

Constructs a new instance of classification trainer.

fit(X, y=None)

Trains model based on data.

X : x. y : y.

fit_on_cache(cache)

Trains model based on data.

cache : Apache Ignite cache.

class ggml.classification.DecisionTreeClassificationTrainer(env_builder=<ggml.common.LearningEnvironmentBuilder object>, max_deep=5, min_impurity_decrease=0.0, compressor=None, use_index=True)

Bases: ggml.classification.ClassificationTrainer

DecisionTree classification trainer.

__init__(env_builder=<ggml.common.LearningEnvironmentBuilder object>, max_deep=5, min_impurity_decrease=0.0, compressor=None, use_index=True)

Constructs a new instance of DecisionTree classification trainer.

env_builder : Environment builder. max_deep : Max deep. min_impurity_decrease : Min impurity decrease. compressor : Compressor. use_index : Use index.

class ggml.classification.KNNClassificationTrainer(env_builder=<ggml.common.LearningEnvironmentBuilder object>)

Bases: ggml.classification.ClassificationTrainer

KNN classification trainer.

__init__(env_builder=<ggml.common.LearningEnvironmentBuilder object>)

Constructs a new instance of KNN classification trainer.

env_builder : Environment builder.

class ggml.classification.LogRegClassificationTrainer(env_builder=<ggml.common.LearningEnvironmentBuilder object>, max_iter=100, batch_size=100, max_loc_iter=100, seed=1234)

Bases: ggml.classification.ClassificationTrainer

LogisticRegression classification trainer.

__init__(env_builder=<ggml.common.LearningEnvironmentBuilder object>, max_iter=100, batch_size=100, max_loc_iter=100, seed=1234)

Constructs a new instance of LogisticRegression classification trainer.

env_builder : Environment builder. max_iter : Max number of iterations. batch_size : Batch size. max_loc_iter : Max number of local iterations. update_strategy : Update strategy. seed : Seed.

class ggml.classification.MLPClassificationTrainer(arch, env_builder=<ggml.common.LearningEnvironmentBuilder object>, loss='mse', learning_rate=0.1, max_iter=1000, batch_size=100, loc_iter=10, seed=None)

Bases: ggml.classification.ClassificationTrainer

MLP regression trainer.

__init__(arch, env_builder=<ggml.common.LearningEnvironmentBuilder object>, loss='mse', learning_rate=0.1, max_iter=1000, batch_size=100, loc_iter=10, seed=None)

Constructs a new instance of MLP regression trainer.

env_builder : Environment builder. arch : Architecture. loss : Loss function (‘mse’, ‘log’, ‘l2’, ‘l1’ or ‘hinge’, default value is ‘mse’). update_strategy : Update strategy. max_iter : Max number of iterations. batch_size : Batch size. loc_iter : Number of local iterations. seed : Seed.

class ggml.classification.RandomForestClassificationTrainer(features, env_builder=<ggml.common.LearningEnvironmentBuilder object>, trees=1, sub_sample_size=1.0, max_depth=5, min_impurity_delta=0.0, seed=None)

Bases: ggml.classification.ClassificationTrainer

RandomForest classification trainer.

__init__(features, env_builder=<ggml.common.LearningEnvironmentBuilder object>, trees=1, sub_sample_size=1.0, max_depth=5, min_impurity_delta=0.0, seed=None)

Constructs a new instance of RandomForest classification trainer.

features : Number of features. env_builder : Environment builder. trees : Number of trees. sub_sample_size : Sub sample size. max_depth : Max depth. min_impurity_delta : Min impurity delta. seed : Seed.

class ggml.classification.SVMClassificationTrainer(env_builder=<ggml.common.LearningEnvironmentBuilder object>, l=0.4, max_iter=200, max_local_iter=100, seed=1234)

Bases: ggml.classification.ClassificationTrainer

SVM classification trainer.

__init__(env_builder=<ggml.common.LearningEnvironmentBuilder object>, l=0.4, max_iter=200, max_local_iter=100, seed=1234)

Constructs a new instance of SVM classification trainer.

env_builder : Environment builder. l : Lambda. max_iter : Max number of iterations. max_loc_iter : Max number of local iterations. seed : Seed.

Clustering

Clusterer.

class ggml.clustering.ClusteringTrainer(proxy)

Bases: ggml.common.UnsupervisedTrainer, ggml.common.Proxy

Clustering trainer.

__init__(proxy)

Constructs a new instance of ClusteringTrainer.

fit(X)

Trains model based on data.

X : x.

fit_on_cache(cache)

Trains model based on data.

cache : Apache Ignite cache.

class ggml.clustering.GMMClusteringTrainer(env_builder=<ggml.common.LearningEnvironmentBuilder object>, eps=0.001, count_of_components=2, max_iter=10, max_count_of_init_tries=3, max_count_of_clusters=2, max_likelihood_divirgence=5.0, min_elements_for_new_cluster=300, min_cluster_probability=0.05)

Bases: ggml.clustering.ClusteringTrainer

GMM clustring trainer.

__init__(env_builder=<ggml.common.LearningEnvironmentBuilder object>, eps=0.001, count_of_components=2, max_iter=10, max_count_of_init_tries=3, max_count_of_clusters=2, max_likelihood_divirgence=5.0, min_elements_for_new_cluster=300, min_cluster_probability=0.05)

Constructs a new instance of GMM clustring trainer.

env_builder : Environment builder. count_of_components : Count of components. max_iter : Max number of iterations. max_count_of_init_tries : Max count of init tries. max_count_of_clusters : Max count of clusters. max_likelihood_divirgence : Max likelihood divirgence. min_elements_for_new_cluster : Min elements for new cluster. min_cluster_probability : Min cluster probability.

class ggml.clustering.KMeansClusteringTrainer(env_builder=<ggml.common.LearningEnvironmentBuilder object>, amount_of_clusters=2, max_iter=10, eps=0.0001, distance='euclidean')

Bases: ggml.clustering.ClusteringTrainer

KMeans clustring trainer.

__init__(env_builder=<ggml.common.LearningEnvironmentBuilder object>, amount_of_clusters=2, max_iter=10, eps=0.0001, distance='euclidean')

Constructs a new instance of KMeans clustering trainer.

env_builder : Environment builder. amount_of_clusters : Amount of clusters. max_iter : Max number of iterations. eps : Epsilon. distance : Distance measure (‘euclidean’, ‘hamming’, ‘manhattan’).

Preprocessing

Preprocessors.

class ggml.preprocessing.BinarizationTrainer(threshold=0.0)

Bases: ggml.preprocessing.PreprocessingTrainer

Binarization trainer.

__init__(threshold=0.0)

Constructs a new instance of binarization trainer.

threshold : Threshold (Default value is 0).

class ggml.preprocessing.EncoderTrainer(encoded_features=[], encoder_indexing_strategy='frequency_desc', encoder_type='one_hot')

Bases: ggml.preprocessing.PreprocessingTrainer

Encoder trainer.

__init__(encoded_features=[], encoder_indexing_strategy='frequency_desc', encoder_type='one_hot')

Constructs a new instance of encoder trainer.

encoder_features : Encoded features (Default value is []). encoder_indexing_strategy : Encoder indexing strategy (‘frequency_desc’, ‘frequency_asc’, default value is ‘frequency_desc’). encoder_type : Encoder type (‘one_hot’, ‘string’, default value is ‘one_hot’).

fit(X)

Trains model based on data.

X : x.

fit_on_cache(cache)

Trains model based on data.

cache : Apache Ignite cache.

class ggml.preprocessing.ImputerTrainer(imputing_strategy='mean')

Bases: ggml.preprocessing.PreprocessingTrainer

Imputer trainer.

__init__(imputing_strategy='mean')

Constructs a new instance of imputer trainer.

imputing_strategy : Imputing strategy (‘mean’, ‘most_frequent’, default value is ‘mean’).

class ggml.preprocessing.MaxAbsScalerTrainer

Bases: ggml.preprocessing.PreprocessingTrainer

Max absolute scaler trainer.

__init__()

Constructs a new instance of max absolute scaler trainer.

class ggml.preprocessing.MinMaxScalerTrainer

Bases: ggml.preprocessing.PreprocessingTrainer

Min-max scaler trainer.

__init__()

Constructs a new instance of min-max scaler trainer.

class ggml.preprocessing.NormalizationTrainer(p=2)

Bases: ggml.preprocessing.PreprocessingTrainer

Normalization trainer.

__init__(p=2)

Constructs a new instance of normalization trainer.

p : Degree of L space parameter value.

class ggml.preprocessing.PreprocessingModel(proxy)

Bases: ggml.common.Proxy

Preprocessing model.

__init__(proxy)

Constructs a new instance of preprocessing model.

transform(X)
class ggml.preprocessing.PreprocessingTrainer(proxy)

Bases: ggml.common.UnsupervisedTrainer

Preprocessing trainer.

__init__(proxy)

Constructs a new instance of PreprocessingTrainer.

proxy : Java proxy.

fit(X)

Trains model based on data.

X : x.

fit_on_cache(cache)

Trains model based on data.

cache : Apache Ignite cache.

class ggml.preprocessing.StandardScalerTrainer

Bases: ggml.preprocessing.PreprocessingTrainer

Standard scaler trainer.

__init__()

Constructs a new instance of standard scaler trainer.

Model Selection

ggml.model_selection.cross_val_score(trainer, cache, cv=5, scoring='accuracy')

Makes cross validation for given trainer, cache and scoring.

trainer : Trainer. cache : Cache. cv : Number of folds. scoring : Metric to be scored.

ggml.model_selection.train_test_split(cache, test_size=0.25, train_size=0.75, random_state=None)

Splits given cache on two parts: test and train with given sizes.

cache : Ignite cache. test_size : Test size. train_size : Train size. random_state : Random state.

Inference

Ignite inference functionality.

class ggml.inference.DistributedModel(ignite, reader, parser, instances=1, max_per_node=1)

Bases: ggml.common.Model

__init__(ignite, reader, parser, instances=1, max_per_node=1)

Constructs a new instance of distributed model.

ignite : Ignite instance. reader : Model reader. parser : Model parser. mdl : Model. instances : Number of worker instances. max_per_node : Max number of worker per node.

class ggml.inference.IgniteDistributedModel(ignite, mdl, instances=1, max_per_node=1)

Bases: ggml.inference.DistributedModel

Ignite distributed model.

ignite : Ignite instance. mdl : Model. instances : Number of instances. max_per_node : Max number of instance per node.

__init__(ignite, mdl, instances=1, max_per_node=1)

Constructs a new instance of Ignite distributed model.

ignite : Ignite instance. reader : Model reader. parser : Model parser. instances : Number of worker instances. max_per_node : Max number of worker instances per ignite node.

class ggml.inference.XGBoostDistributedModel(ignite, mdl, instances=1, max_per_node=1)

Bases: ggml.inference.DistributedModel

__init__(ignite, mdl, instances=1, max_per_node=1)

Constructs a new instance of distributed model.

ignite : Ignite instance. reader : Model reader. parser : Model parser. mdl : Model. instances : Number of worker instances. max_per_node : Max number of worker per node.

predict(X)

Predicts a result.

X : Features.