API Specification¶
The modules and subpackages listed here are the basis of a stable API of ggml, intended for end users.
Core¶
Ignite cache API.
-
class
ggml.core.
Cache
(proxy, cache_filter=None, preprocessor=None)¶ Bases:
ggml.common.Proxy
Internal constructor that creates a wrapper of Apache Ignite cache. User is expected to use Ignite object to create cache instead of this constructor.
-
__init__
(proxy, cache_filter=None, preprocessor=None)¶ Constructs a wrapper of Apache Ignite cache. It’s internal method, user is expected to use Ignite methods to create or get cache.
Parameters: - proxy – Py4J proxy that represents Apache Ignite Cache,
- cache_filter – Py4J proxy that represents filter,
- preprocessor – Py4J proxy that represents preprocessor.
-
filter
(cache_filter)¶ Filters this cache using specified filter.
Parameters: filter – Filter to be used to filter cache.
-
get
(key)¶ Returns value (float array) by the specified key.
Parameters: key – Key to be taken from cache.
-
head
(n=5)¶ Returns top N elements represented as a pandas dataset.
Parameters: n – Number of rows to be returned.
-
put
(key, value)¶ Puts value (float array) by the specified key.
Parameters: - key – Key to be put into cache,
- value – value to be taken from cache.
-
transform
(preprocessor)¶ Transform this cache using specfied preprocessor.
Parameters: preprocessor – Preprocessor to be used to transform cache.
-
-
class
ggml.core.
Ignite
(cfg=None)¶ Bases:
object
-
__init__
(cfg=None)¶ Constructs a new instance of Ignite that is required to work with Cache, IGFS storage and distributed inference.
Parameters: cfg – Path to Apache Ignite configuration file.
-
create_cache
(name, excl_neighbors=False, parts=10)¶ Creates a new Apache Ignite Cache using specified name and configuration. This module is built with assumption that Ignite Cache contains integer keys and double[] values.
Parameters: - name – Name of the Apache Ignite cache,
- excl_neighbors – (optional, False by default) exclude neighbours,
- parts – (optional, 10 by default) number of partitions.
-
get_cache
(name)¶ Returns existing Apache Ignite Cache by name. This module is built with assumption that Ignite Cache contains integer keys and double[] values.
Parameters: name – Name of the Apache Ignite cache.
-
Regression¶
Regression trainers.
-
class
ggml.regression.
DecisionTreeRegressionTrainer
(env_builder=<ggml.common.LearningEnvironmentBuilder object>, max_deep=5, min_impurity_decrease=0.0, compressor=None, use_index=True)¶ Bases:
ggml.regression.RegressionTrainer
DecisionTree regression trainer.
-
__init__
(env_builder=<ggml.common.LearningEnvironmentBuilder object>, max_deep=5, min_impurity_decrease=0.0, compressor=None, use_index=True)¶ Constructs a new instance of DecisionTree regression trainer.
env_builder : Environment builder. max_deep : Max deep. min_impurity_decrease : Min impurity decrease. compressor : Compressor.
-
-
class
ggml.regression.
KNNRegressionTrainer
(env_builder=<ggml.common.LearningEnvironmentBuilder object>)¶ Bases:
ggml.regression.RegressionTrainer
KNN regression trainer.
-
__init__
(env_builder=<ggml.common.LearningEnvironmentBuilder object>)¶ Constructs a new instance of linear regression trainer.
env_builder : Environment builder.
-
-
class
ggml.regression.
LinearRegressionTrainer
(env_builder=<ggml.common.LearningEnvironmentBuilder object>)¶ Bases:
ggml.regression.RegressionTrainer
Linear regression trainer.
-
__init__
(env_builder=<ggml.common.LearningEnvironmentBuilder object>)¶ Constructs a new instance of linear regression trainer.
env_builder : Environment builder.
-
-
class
ggml.regression.
MLPArchitecture
(input_size)¶ Bases:
ggml.common.Proxy
MLP architecture.
-
__init__
(input_size)¶ Constructs a new instance of MLP architecture.
input_size : Input size.
-
with_layer
(neurons, has_bias=True, activator='sigmoid')¶ Add layer.
neurons : Number of neurons. has_bias : Has bias or not (default value is True). activator : Activation function (‘sigmoid’, ‘relu’ or ‘linear’, default value is ‘sigmoid’)
-
-
class
ggml.regression.
MLPRegressionTrainer
(arch, env_builder=<ggml.common.LearningEnvironmentBuilder object>, loss='mse', learning_rate=0.1, max_iter=1000, batch_size=100, loc_iter=10, seed=None)¶ Bases:
ggml.regression.RegressionTrainer
MLP regression trainer.
-
__init__
(arch, env_builder=<ggml.common.LearningEnvironmentBuilder object>, loss='mse', learning_rate=0.1, max_iter=1000, batch_size=100, loc_iter=10, seed=None)¶ Constructs a new instance of MLP regression trainer.
env_builder : Environment builder. arch : Architecture. loss : Loss function (‘mse’, ‘log’, ‘l2’, ‘l1’ or ‘hinge’, default value is ‘mse’). update_strategy : Update strategy. max_iter : Max number of iterations. batch_size : Batch size. loc_iter : Number of local iterations. seed : Seed.
-
-
class
ggml.regression.
RandomForestRegressionTrainer
(features, env_builder=<ggml.common.LearningEnvironmentBuilder object>, trees=1, sub_sample_size=1.0, max_depth=5, min_impurity_delta=0.0, seed=None)¶ Bases:
ggml.regression.RegressionTrainer
RandomForest classification trainer.
-
__init__
(features, env_builder=<ggml.common.LearningEnvironmentBuilder object>, trees=1, sub_sample_size=1.0, max_depth=5, min_impurity_delta=0.0, seed=None)¶ Constructs a new instance of RandomForest classification trainer.
features : Number of features. env_builder : Environment builder. trees : Number of trees. sub_sample_size : Sub sample size. max_depth : Max depth. min_impurity_delta : Min impurity delta. seed : Seed.
-
-
class
ggml.regression.
RegressionTrainer
(proxy, multiple_labels=False, accepts_matrix=False)¶ Bases:
ggml.common.SupervisedTrainer
,ggml.common.Proxy
Regression.
-
__init__
(proxy, multiple_labels=False, accepts_matrix=False)¶ Constructs a new instance of regression trainer.
-
fit
(X, y=None)¶ Trains model based on data.
X : x. y : y.
-
fit_on_cache
(cache)¶ Trains model based on data.
cache : Apache Ignite cache.
-
Classification¶
Classification trainers.
-
class
ggml.classification.
ANNClassificationTrainer
(env_builder=<ggml.common.LearningEnvironmentBuilder object>, k=2, max_iter=10, eps=0.0001, distance='euclidean')¶ Bases:
ggml.classification.ClassificationTrainer
ANN classification trainer.
-
__init__
(env_builder=<ggml.common.LearningEnvironmentBuilder object>, k=2, max_iter=10, eps=0.0001, distance='euclidean')¶ Constructs a new instance of ANN classification trainer.
env_builder : Environment builder. k : Number of clusters. max_iter : Max number of iterations. eps : Epsilon, delta of convergence. distance : Distance measure (‘euclidean’, ‘hamming’, ‘manhattan’).
-
-
class
ggml.classification.
ClassificationTrainer
(proxy)¶ Bases:
ggml.common.SupervisedTrainer
,ggml.common.Proxy
Classification trainer.
-
__init__
(proxy)¶ Constructs a new instance of classification trainer.
-
fit
(X, y=None)¶ Trains model based on data.
X : x. y : y.
-
fit_on_cache
(cache)¶ Trains model based on data.
cache : Apache Ignite cache.
-
-
class
ggml.classification.
DecisionTreeClassificationTrainer
(env_builder=<ggml.common.LearningEnvironmentBuilder object>, max_deep=5, min_impurity_decrease=0.0, compressor=None, use_index=True)¶ Bases:
ggml.classification.ClassificationTrainer
DecisionTree classification trainer.
-
__init__
(env_builder=<ggml.common.LearningEnvironmentBuilder object>, max_deep=5, min_impurity_decrease=0.0, compressor=None, use_index=True)¶ Constructs a new instance of DecisionTree classification trainer.
env_builder : Environment builder. max_deep : Max deep. min_impurity_decrease : Min impurity decrease. compressor : Compressor. use_index : Use index.
-
-
class
ggml.classification.
KNNClassificationTrainer
(env_builder=<ggml.common.LearningEnvironmentBuilder object>)¶ Bases:
ggml.classification.ClassificationTrainer
KNN classification trainer.
-
__init__
(env_builder=<ggml.common.LearningEnvironmentBuilder object>)¶ Constructs a new instance of KNN classification trainer.
env_builder : Environment builder.
-
-
class
ggml.classification.
LogRegClassificationTrainer
(env_builder=<ggml.common.LearningEnvironmentBuilder object>, max_iter=100, batch_size=100, max_loc_iter=100, seed=1234)¶ Bases:
ggml.classification.ClassificationTrainer
LogisticRegression classification trainer.
-
__init__
(env_builder=<ggml.common.LearningEnvironmentBuilder object>, max_iter=100, batch_size=100, max_loc_iter=100, seed=1234)¶ Constructs a new instance of LogisticRegression classification trainer.
env_builder : Environment builder. max_iter : Max number of iterations. batch_size : Batch size. max_loc_iter : Max number of local iterations. update_strategy : Update strategy. seed : Seed.
-
-
class
ggml.classification.
MLPClassificationTrainer
(arch, env_builder=<ggml.common.LearningEnvironmentBuilder object>, loss='mse', learning_rate=0.1, max_iter=1000, batch_size=100, loc_iter=10, seed=None)¶ Bases:
ggml.classification.ClassificationTrainer
MLP regression trainer.
-
__init__
(arch, env_builder=<ggml.common.LearningEnvironmentBuilder object>, loss='mse', learning_rate=0.1, max_iter=1000, batch_size=100, loc_iter=10, seed=None)¶ Constructs a new instance of MLP regression trainer.
env_builder : Environment builder. arch : Architecture. loss : Loss function (‘mse’, ‘log’, ‘l2’, ‘l1’ or ‘hinge’, default value is ‘mse’). update_strategy : Update strategy. max_iter : Max number of iterations. batch_size : Batch size. loc_iter : Number of local iterations. seed : Seed.
-
-
class
ggml.classification.
RandomForestClassificationTrainer
(features, env_builder=<ggml.common.LearningEnvironmentBuilder object>, trees=1, sub_sample_size=1.0, max_depth=5, min_impurity_delta=0.0, seed=None)¶ Bases:
ggml.classification.ClassificationTrainer
RandomForest classification trainer.
-
__init__
(features, env_builder=<ggml.common.LearningEnvironmentBuilder object>, trees=1, sub_sample_size=1.0, max_depth=5, min_impurity_delta=0.0, seed=None)¶ Constructs a new instance of RandomForest classification trainer.
features : Number of features. env_builder : Environment builder. trees : Number of trees. sub_sample_size : Sub sample size. max_depth : Max depth. min_impurity_delta : Min impurity delta. seed : Seed.
-
-
class
ggml.classification.
SVMClassificationTrainer
(env_builder=<ggml.common.LearningEnvironmentBuilder object>, l=0.4, max_iter=200, max_local_iter=100, seed=1234)¶ Bases:
ggml.classification.ClassificationTrainer
SVM classification trainer.
-
__init__
(env_builder=<ggml.common.LearningEnvironmentBuilder object>, l=0.4, max_iter=200, max_local_iter=100, seed=1234)¶ Constructs a new instance of SVM classification trainer.
env_builder : Environment builder. l : Lambda. max_iter : Max number of iterations. max_loc_iter : Max number of local iterations. seed : Seed.
-
Clustering¶
Clusterer.
-
class
ggml.clustering.
ClusteringTrainer
(proxy)¶ Bases:
ggml.common.UnsupervisedTrainer
,ggml.common.Proxy
Clustering trainer.
-
__init__
(proxy)¶ Constructs a new instance of ClusteringTrainer.
-
fit
(X)¶ Trains model based on data.
X : x.
-
fit_on_cache
(cache)¶ Trains model based on data.
cache : Apache Ignite cache.
-
-
class
ggml.clustering.
GMMClusteringTrainer
(env_builder=<ggml.common.LearningEnvironmentBuilder object>, eps=0.001, count_of_components=2, max_iter=10, max_count_of_init_tries=3, max_count_of_clusters=2, max_likelihood_divirgence=5.0, min_elements_for_new_cluster=300, min_cluster_probability=0.05)¶ Bases:
ggml.clustering.ClusteringTrainer
GMM clustring trainer.
-
__init__
(env_builder=<ggml.common.LearningEnvironmentBuilder object>, eps=0.001, count_of_components=2, max_iter=10, max_count_of_init_tries=3, max_count_of_clusters=2, max_likelihood_divirgence=5.0, min_elements_for_new_cluster=300, min_cluster_probability=0.05)¶ Constructs a new instance of GMM clustring trainer.
env_builder : Environment builder. count_of_components : Count of components. max_iter : Max number of iterations. max_count_of_init_tries : Max count of init tries. max_count_of_clusters : Max count of clusters. max_likelihood_divirgence : Max likelihood divirgence. min_elements_for_new_cluster : Min elements for new cluster. min_cluster_probability : Min cluster probability.
-
-
class
ggml.clustering.
KMeansClusteringTrainer
(env_builder=<ggml.common.LearningEnvironmentBuilder object>, amount_of_clusters=2, max_iter=10, eps=0.0001, distance='euclidean')¶ Bases:
ggml.clustering.ClusteringTrainer
KMeans clustring trainer.
-
__init__
(env_builder=<ggml.common.LearningEnvironmentBuilder object>, amount_of_clusters=2, max_iter=10, eps=0.0001, distance='euclidean')¶ Constructs a new instance of KMeans clustering trainer.
env_builder : Environment builder. amount_of_clusters : Amount of clusters. max_iter : Max number of iterations. eps : Epsilon. distance : Distance measure (‘euclidean’, ‘hamming’, ‘manhattan’).
-
Preprocessing¶
Preprocessors.
-
class
ggml.preprocessing.
BinarizationTrainer
(threshold=0.0)¶ Bases:
ggml.preprocessing.PreprocessingTrainer
Binarization trainer.
-
__init__
(threshold=0.0)¶ Constructs a new instance of binarization trainer.
threshold : Threshold (Default value is 0).
-
-
class
ggml.preprocessing.
EncoderTrainer
(encoded_features=[], encoder_indexing_strategy='frequency_desc', encoder_type='one_hot')¶ Bases:
ggml.preprocessing.PreprocessingTrainer
Encoder trainer.
-
__init__
(encoded_features=[], encoder_indexing_strategy='frequency_desc', encoder_type='one_hot')¶ Constructs a new instance of encoder trainer.
encoder_features : Encoded features (Default value is []). encoder_indexing_strategy : Encoder indexing strategy (‘frequency_desc’, ‘frequency_asc’, default value is ‘frequency_desc’). encoder_type : Encoder type (‘one_hot’, ‘string’, default value is ‘one_hot’).
-
fit
(X)¶ Trains model based on data.
X : x.
-
fit_on_cache
(cache)¶ Trains model based on data.
cache : Apache Ignite cache.
-
-
class
ggml.preprocessing.
ImputerTrainer
(imputing_strategy='mean')¶ Bases:
ggml.preprocessing.PreprocessingTrainer
Imputer trainer.
-
__init__
(imputing_strategy='mean')¶ Constructs a new instance of imputer trainer.
imputing_strategy : Imputing strategy (‘mean’, ‘most_frequent’, default value is ‘mean’).
-
-
class
ggml.preprocessing.
MaxAbsScalerTrainer
¶ Bases:
ggml.preprocessing.PreprocessingTrainer
Max absolute scaler trainer.
-
__init__
()¶ Constructs a new instance of max absolute scaler trainer.
-
-
class
ggml.preprocessing.
MinMaxScalerTrainer
¶ Bases:
ggml.preprocessing.PreprocessingTrainer
Min-max scaler trainer.
-
__init__
()¶ Constructs a new instance of min-max scaler trainer.
-
-
class
ggml.preprocessing.
NormalizationTrainer
(p=2)¶ Bases:
ggml.preprocessing.PreprocessingTrainer
Normalization trainer.
-
__init__
(p=2)¶ Constructs a new instance of normalization trainer.
p : Degree of L space parameter value.
-
-
class
ggml.preprocessing.
PreprocessingModel
(proxy)¶ Bases:
ggml.common.Proxy
Preprocessing model.
-
__init__
(proxy)¶ Constructs a new instance of preprocessing model.
-
transform
(X)¶
-
-
class
ggml.preprocessing.
PreprocessingTrainer
(proxy)¶ Bases:
ggml.common.UnsupervisedTrainer
Preprocessing trainer.
-
__init__
(proxy)¶ Constructs a new instance of PreprocessingTrainer.
proxy : Java proxy.
-
fit
(X)¶ Trains model based on data.
X : x.
-
fit_on_cache
(cache)¶ Trains model based on data.
cache : Apache Ignite cache.
-
-
class
ggml.preprocessing.
StandardScalerTrainer
¶ Bases:
ggml.preprocessing.PreprocessingTrainer
Standard scaler trainer.
-
__init__
()¶ Constructs a new instance of standard scaler trainer.
-
Model Selection¶
-
ggml.model_selection.
cross_val_score
(trainer, cache, cv=5, scoring='accuracy')¶ Makes cross validation for given trainer, cache and scoring.
trainer : Trainer. cache : Cache. cv : Number of folds. scoring : Metric to be scored.
-
ggml.model_selection.
train_test_split
(cache, test_size=0.25, train_size=0.75, random_state=None)¶ Splits given cache on two parts: test and train with given sizes.
cache : Ignite cache. test_size : Test size. train_size : Train size. random_state : Random state.
Inference¶
Ignite inference functionality.
-
class
ggml.inference.
DistributedModel
(ignite, reader, parser, instances=1, max_per_node=1)¶ Bases:
ggml.common.Model
-
__init__
(ignite, reader, parser, instances=1, max_per_node=1)¶ Constructs a new instance of distributed model.
ignite : Ignite instance. reader : Model reader. parser : Model parser. mdl : Model. instances : Number of worker instances. max_per_node : Max number of worker per node.
-
-
class
ggml.inference.
IgniteDistributedModel
(ignite, mdl, instances=1, max_per_node=1)¶ Bases:
ggml.inference.DistributedModel
Ignite distributed model.
ignite : Ignite instance. mdl : Model. instances : Number of instances. max_per_node : Max number of instance per node.
-
__init__
(ignite, mdl, instances=1, max_per_node=1)¶ Constructs a new instance of Ignite distributed model.
ignite : Ignite instance. reader : Model reader. parser : Model parser. instances : Number of worker instances. max_per_node : Max number of worker instances per ignite node.
-
-
class
ggml.inference.
XGBoostDistributedModel
(ignite, mdl, instances=1, max_per_node=1)¶ Bases:
ggml.inference.DistributedModel
-
__init__
(ignite, mdl, instances=1, max_per_node=1)¶ Constructs a new instance of distributed model.
ignite : Ignite instance. reader : Model reader. parser : Model parser. mdl : Model. instances : Number of worker instances. max_per_node : Max number of worker per node.
-
predict
(X)¶ Predicts a result.
X : Features.
-