ztlearn package¶
Subpackages¶
- ztlearn.datasets package
- ztlearn.dl package
- Subpackages
- Module contents
- ztlearn.ml package
- ztlearn.toolkit package
- ztlearn.utils package
Submodules¶
ztlearn.activations module¶
-
class
ztlearn.activations.
ActivationFunction
(name, activation_dict={})[source] Bases:
object
-
backward
(input_signal)[source]
-
forward
(input_signal)[source]
-
name
-
-
class
ztlearn.activations.
ELU
(activation_dict)[source] Bases:
object
Exponential Linear Units (ELUs)
ELUs are exponential functions which have negative values that allow them to push mean unit activations closer to zero like batch normalization but with lower computational complexity.
References
- [1] Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)
- [Djork-Arné Clevert et. al., 2016] https://arxiv.org/abs/1511.07289
- [PDF] https://arxiv.org/pdf/1511.07289.pdf
Parameters: alpha (float32) – controls the value to which an ELU saturates for negative net inputs -
activation
(input_signal)[source] ELU activation applied to input provided
Parameters: input_signal (numpy.array) – the input numpy array Returns: the output of the ELU function applied to the input Return type: numpy.array
-
activation_name
-
derivative
(input_signal)[source] ELU derivative applied to input provided
Parameters: input_signal (numpy.array) – the input numpy array Returns: the output of the ELU derivative applied to the input Return type: numpy.array
-
class
ztlearn.activations.
ElliotSigmoid
(activation_dict)[source] Bases:
object
Elliot Sigmoid Activation Function
Elliot Sigmoid squashes each element of the input from the interval ranging [-inf, inf] to the interval ranging [-1, 1] with an ‘S-shaped’ function. The fucntion is fast to calculate on simple computing hardware as it does not require any exponential or trigonometric functions
References
- [1] A better Activation Function for Artificial Neural Networks
- [David L. Elliott, et. al., 1993] https://goo.gl/qqBdne
- [PDF] https://goo.gl/fPLPcr
-
activation
(input_signal)[source] ElliotSigmoid activation applied to input provided
Parameters: input_signal (numpy.array) – the input numpy array Returns: the output of the ElliotSigmoid function applied to the input Return type: numpy.array
-
activation_name
-
derivative
(input_signal)[source] ElliotSigmoid derivative applied to input provided
Parameters: input_signal (numpy.array) – the input numpy array Returns: the output of the ElliotSigmoid derivative applied to the input Return type: numpy.array
-
class
ztlearn.activations.
LeakyReLU
(activation_dict)[source] Bases:
object
LeakyReLU Activation Functions
Leaky ReLUs allow a small non-zero gradient to propagate through the network when the unit is not active hence avoiding bottlenecks that can prevent learning in the Neural Network.
References
- [1] Rectifier Nonlinearities Improve Neural Network Acoustic Models
- [Andrew L. Mass, et. al., 2013] https://goo.gl/k9fhEZ
- [PDF] https://goo.gl/v48yXT
- [2] Empirical Evaluation of Rectified Activations in Convolutional Network
- [Bing Xu, et. al., 2015] https://arxiv.org/abs/1505.00853
- [PDF] https://arxiv.org/pdf/1505.00853.pdf
Parameters: alpha (float32) – provides for a small non-zero gradient (e.g. 0.01) when the unit is not active. -
activation
(input_signal)[source] LeakyReLU activation applied to input provided
Parameters: input_signal (numpy.array) – the input numpy array Returns: the output of the LeakyReLU function applied to the input Return type: numpy.array
-
activation_name
-
derivative
(input_signal)[source] LeakyReLU derivative applied to input provided
Parameters: input_signal (numpy.array) – the input numpy array Returns: the output of the LeakyReLU derivative applied to the input Return type: numpy.array
-
class
ztlearn.activations.
Linear
(activation_dict)[source] Bases:
object
Linear Activation Function
Linear Activation applies identity operation on your data such that the output data is proportional to the input data. The function always returns the same value that was used as its argument.
References
- [1] Identity Function
- [Wikipedia Article] https://en.wikipedia.org/wiki/Identity_function
-
activation
(input_signal)[source] Linear activation applied to input provided
Parameters: input_signal (numpy.array) – the input numpy array Returns: the output of the Linear function applied to the input Return type: numpy.array
-
activation_name
-
derivative
(input_signal)[source] Linear derivative applied to input provided
Parameters: input_signal (numpy.array) – the input numpy array Returns: the output of the Linear derivative applied to the input Return type: numpy.array
-
class
ztlearn.activations.
ReLU
(activation_dict)[source] Bases:
object
Rectified Linear Units (ReLUs)
Rectifying neurons are an even better model of biological neurons yielding equal or better performance than hyperbolic tangent networks in-spite of the hard non-linearity and non-differentiability at zero hence creating sparse representations with true zeros which seem remarkably suitable for naturally sparse data.
References
- [1] Deep Sparse Rectifier Neural Networks
- [Xavier Glorot., et. al., 2011] http://proceedings.mlr.press/v15/glorot11a.html
- [PDF] http://proceedings.mlr.press/v15/glorot11a/glorot11a.pdf
- [2] Delving Deep into Rectifiers
- [Kaiming He, et. al., 2015] https://arxiv.org/abs/1502.01852
- [PDF] https://arxiv.org/pdf/1502.01852.pdf
-
activation
(input_signal)[source] ReLU activation applied to input provided
Parameters: input_signal (numpy.array) – the input numpy array Returns: the output of the ReLU function applied to the input Return type: numpy.array
-
activation_name
-
derivative
(input_signal)[source] ReLU derivative applied to input provided
Parameters: input_signal (numpy.array) – the input numpy array Returns: the output of the ReLU derivative applied to the input Return type: numpy.array
-
class
ztlearn.activations.
SELU
(activation_dict)[source] Bases:
object
Scaled Exponential Linear Units (SELUs)
SELUs are activations which induce self-normalizing properties and are used in Self-Normalizing Neural Networks (SNNs). SNNs enable high-level abstract representations that tend to automatically converge towards zero mean and unit variance.
References
- [1] Self-Normalizing Neural Networks (SELUs)
- [Klambauer, G., et. al., 2017] https://arxiv.org/abs/1706.02515
- [PDF] https://arxiv.org/pdf/1706.02515.pdf
Parameters: - ALPHA (float32) – 1.6732632423543772848170429916717
- _LAMBDA (float32) – 1.0507009873554804934193349852946
-
ALPHA
= 1.6732632423543772
-
activation
(input_signal)[source] SELU activation applied to input provided
Parameters: input_signal (numpy.array) – the input numpy array Returns: the output of the SELU function applied to the input Return type: numpy.array
-
activation_name
-
derivative
(input_signal)[source] SELU derivative applied to input provided
Parameters: input_signal (numpy.array) – the input numpy array Returns: the output of the SELU derivative applied to the input Return type: numpy.array
-
class
ztlearn.activations.
Sigmoid
(activation_dict)[source] Bases:
object
Sigmoid Activation Function
A Sigmoid function, often used as the output activation function for binary classification problems as it outputs values that are in the range (0, 1). Sigmoid functions are real-valued and differentiable, producing a curve that is ‘S-shaped’ and feature one local minimum, and one local maximum
References
- [1] The influence of the sigmoid function parameters on the speed of backpropagation learning
- [PDF] https://goo.gl/MavJjj
-
activation
(input_signal)[source] Sigmoid activation applied to input provided
Parameters: input_signal (numpy.array) – the input numpy array Returns: the output of the Sigmoid function applied to the input Return type: numpy.array
-
activation_name
-
derivative
(input_signal)[source] Sigmoid derivative applied to input provided
Parameters: input_signal (numpy.array) – the input numpy array Returns: the output of the Sigmoid derivative applied to the input Return type: numpy.array
-
class
ztlearn.activations.
SoftPlus
(activation_dict)[source] Bases:
object
SoftPlus Activation Function
A Softplus function is a smooth approximation to the rectifier linear units (ReLUs). Near point 0, it is smooth and differentiable and produces outputs in scale of (0, +inf).
References
- [1] Incorporating Second-Order Functional Knowledge for Better Option Pricing
- [Charles Dugas, et. al., 2001] https://goo.gl/z3jeYc
- [PDF] https://goo.gl/z3jeYc
-
activation
(input_signal)[source] SoftPlus activation applied to input provided
Parameters: input_signal (numpy.array) – the input numpy array Returns: the output of the SoftPlus function applied to the input Return type: numpy.array
-
activation_name
-
derivative
(input_signal)[source] SoftPlus derivative applied to input provided
Parameters: input_signal (numpy.array) – the input numpy array Returns: the output of the SoftPlus derivative applied to the input Return type: numpy.array
-
class
ztlearn.activations.
Softmax
(activation_dict)[source] Bases:
object
Softmax Activation Function
The Softmax Activation Function is a generalization of the logistic function that squashes the outputs of each unit to real values in the range [0, 1] but it also divides each output such that the total sum of the outputs is equal to 1.
References
- [1] Softmax Regression
- [UFLDL Tutorial] https://goo.gl/1qgqdg
- [2] Deep Learning using Linear Support Vector Machines
- [Yichuan Tang, 2015] https://arxiv.org/abs/1306.0239
- [PDF] https://arxiv.org/pdf/1306.0239.pdf
- [3] Probabilistic Interpretation of Feedforward Network Outputs
- [Mario Costa, 1989] [PDF] https://goo.gl/ZhBY4r
-
activation
(input_signal)[source] Softmax activation applied to input provided
Parameters: input_signal (numpy.array) – the input numpy array Returns: the output of the Softmax function applied to the input Return type: numpy.array
-
activation_name
-
derivative
(input_signal)[source] Softmax derivative applied to input provided
Parameters: input_signal (numpy.array) – the input numpy array Returns: the output of the Softmax derivative applied to the input Return type: numpy.array
-
class
ztlearn.activations.
TanH
(activation_dict)[source] Bases:
object
Tangent Hyperbolic (TanH)
The Tangent Hyperbolic function, a rescaled version of the sigmoid function that produces outputs in scale of [-1, +1]. As an activation function it gives an output for every input value hence making is a continuous function.
References
- [1] Hyperbolic Functions
- [Mathematics Education Centre] https://goo.gl/4Dkkrd
- [PDF] https://goo.gl/xPSnif
-
activation
(input_signal)[source] TanH activation applied to input provided
Parameters: input_signal (numpy.array) – the input numpy array Returns: the output of the TanH function applied to the input Return type: numpy.array
-
activation_name
-
derivative
(input_signal)[source] TanH derivative applied to input provided
Parameters: input_signal (numpy.array) – the input numpy array Returns: the output of the TanH derivative applied to the input Return type: numpy.array
ztlearn.decayers module¶
-
class
ztlearn.decayers.
Decay
(lr, decay, epoch, min_lr, max_lr)[source] Bases:
object
-
clip_lr
-
-
class
ztlearn.decayers.
DecayFunction
(lr=0.001, name='inverse', decay=1e-06, epoch=1, min_lr=0.0, max_lr=inf, step_size=10.0)[source] Bases:
object
-
decompose
-
name
-
-
class
ztlearn.decayers.
ExponetialDecay
(lr, decay, epoch, min_lr, max_lr, step_size)[source] Bases:
ztlearn.decayers.Decay
-
decay_name
-
decompose
-
-
class
ztlearn.decayers.
InverseTimeDecay
(lr, decay, epoch, min_lr, max_lr, step_size)[source] Bases:
ztlearn.decayers.Decay
-
decay_name
-
decompose
-
-
class
ztlearn.decayers.
NaturalExponentialDecay
(lr, decay, epoch, min_lr, max_lr, step_size)[source] Bases:
ztlearn.decayers.Decay
-
decay_name
-
decompose
-
-
class
ztlearn.decayers.
StepDecay
(lr, decay, epoch, min_lr, max_lr, step_size)[source] Bases:
ztlearn.decayers.Decay
decay the learning rate every after step_size steps
-
decay_name
-
decompose
-
ztlearn.initializers module¶
-
class
ztlearn.initializers.
GlorotNormal
[source] Bases:
ztlearn.initializers.WeightInitializer
Glorot Normal (GlorotNormal)
GlorotNormal, more famously known as the Xavier initialization is based on the effort to try mantain the same variance of the gradients of the weights for all the layers. Glorot normal is an implementation based on Gaussian distribution
References
- [1] Understanding the difficulty of training deep feedforward neural networks
- [Xavier Glorot, 2010] http://proceedings.mlr.press/v9/glorot10a.html
- [PDF] http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf
- [2] Initialization Of Deep Feedfoward Networks
- [DeepGrid Article - Jefkine Kafunah] https://goo.gl/E2XrGe
-
init_name
-
weights
(shape, random_seed)[source]
-
class
ztlearn.initializers.
GlorotUniform
[source] Bases:
ztlearn.initializers.WeightInitializer
Glorot Uniform (GlorotUniform)
GlorotUniform, more famously known as the Xavier initialization is based on the effort to try mantain the same variance of the gradients of the weights for all the layers. Glorot uniform is an implementation based on Uniform distribution
References
- [1] Understanding the difficulty of training deep feedforward neural networks
- [Xavier Glorot, 2010] http://proceedings.mlr.press/v9/glorot10a.html
- [PDF] http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf
- [2] Initialization Of Deep Feedfoward Networks
- [DeepGrid Article - Jefkine Kafunah] https://goo.gl/E2XrGe
-
init_name
-
weights
(shape, random_seed)[source]
-
class
ztlearn.initializers.
HeNormal
[source] Bases:
ztlearn.initializers.WeightInitializer
He Normal (HeNormal)
HeNormal is a robust initialization method that particularly considers the rectifier nonlinearities. He normal is an implementation based on Gaussian distribution
References
- [1] Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
- [Kaiming He, 2015] https://arxiv.org/abs/1502.01852
- [PDF] https://arxiv.org/pdf/1502.01852.pdf
- [2] Initialization Of Deep Networks Case of Rectifiers
- [DeepGrid Article - Jefkine Kafunah] https://goo.gl/TBNw5t
-
init_name
-
weights
(shape, random_seed)[source]
-
class
ztlearn.initializers.
HeUniform
[source] Bases:
ztlearn.initializers.WeightInitializer
He Normal (HeNormal)
HeNormal is a robust initialization method that particularly considers the rectifier nonlinearities. He uniform is an implementation based on Uniform distribution
References
- [1] Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
- [Kaiming He, 2015] https://arxiv.org/abs/1502.01852
- [PDF] https://arxiv.org/pdf/1502.01852.pdf
- [2] Initialization Of Deep Networks Case of Rectifiers
- [DeepGrid Article - Jefkine Kafunah] https://goo.gl/TBNw5t
-
init_name
-
weights
(shape, random_seed)[source]
-
class
ztlearn.initializers.
Identity
[source] Bases:
ztlearn.initializers.WeightInitializer
Identity (Identity)
Identity is an implementation of weight initialization that returns an identity matrix of size shape
-
init_name
-
weights
(shape, random_seed)[source]
-
-
class
ztlearn.initializers.
InitializeWeights
(name)[source] Bases:
object
-
initialize_weights
(shape, random_seed=None)[source]
-
name
-
-
class
ztlearn.initializers.
LeCunNormal
[source] Bases:
ztlearn.initializers.WeightInitializer
LeCun Normal (LeCunNormal)
Weights should be randomly chosen but in such a way that the sigmoid is primarily activated in its linear region. LeCun uniform is an implementation based on Gaussian distribution
References
- [1] Efficient Backprop
- [LeCun, 1998][PDF] http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf
-
init_name
-
weights
(shape, random_seed)[source]
-
class
ztlearn.initializers.
LeCunUniform
[source] Bases:
ztlearn.initializers.WeightInitializer
LeCun Uniform (LeCunUniform)
Weights should be randomly chosen but in such a way that the sigmoid is primarily activated in its linear region. LeCun uniform is an implementation based on Uniform distribution
References
- [1] Efficient Backprop
- [LeCun, 1998][PDF] http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf
-
init_name
-
weights
(shape, random_seed)[source]
-
class
ztlearn.initializers.
One
[source] Bases:
ztlearn.initializers.WeightInitializer
One (One)
One is an implementation of weight initialization that returns all ones
-
init_name
-
weights
(shape, random_seed)[source]
-
-
class
ztlearn.initializers.
RandomNormal
[source] Bases:
ztlearn.initializers.WeightInitializer
Random Normal (RandomNormal)
Random uniform, an implementation of weight initialization based on Gaussian distribution
-
init_name
-
weights
(shape, random_seed)[source]
-
-
class
ztlearn.initializers.
RandomUniform
[source] Bases:
ztlearn.initializers.WeightInitializer
Random Uniform (RandomUniform)
Random uniform, an implementation of weight initialization based on Uniform distribution
-
init_name
-
weights
(shape, random_seed)[source]
-
-
class
ztlearn.initializers.
Zero
[source] Bases:
ztlearn.initializers.WeightInitializer
Zero (Zero)
Zero is an implementation of weight initialization that returns all zeros
-
init_name
-
weights
(shape, random_seed)[source]
-
ztlearn.objectives module¶
-
class
ztlearn.objectives.
BinaryCrossEntropy
[source] Bases:
ztlearn.objectives.Objective
Binary Cross Entropy
Binary CrossEntropy measures the performance of a classification model whose output is a probability value between 0 & 1. ‘Binary’ is meant for discrete classification tasks in which the classes are independent and not mutually exclusive. Targets here could be either 0 or 1 scalar
References
- [1] Cross Entropy
- [Wikipedia Article] https://en.wikipedia.org/wiki/Cross_entropy
-
accuracy
(predictions, targets, threshold=0.5)[source] Calculates the BinaryCrossEntropy Accuracy Score given prediction and targets
Parameters: - predictions (numpy.array) – the predictions numpy array
- targets (numpy.array) – the targets numpy array
- threshold (numpy.float32) – the threshold value
Returns: the output of BinaryCrossEntropy Accuracy Score
Return type: numpy.float32
-
derivative
(predictions, targets, np_type)[source] Applies the BinaryCrossEntropy Derivative to prediction and targets provided
Parameters: - predictions (numpy.array) – the predictions numpy array
- targets (numpy.array) – the targets numpy array
Returns: the output of BinaryCrossEntropy Derivative to prediction and targets
Return type: numpy.array
-
loss
(predictions, targets, np_type)[source] Applies the BinaryCrossEntropy Loss to prediction and targets provided
Parameters: - predictions (numpy.array) – the predictions numpy array
- targets (numpy.array) – the targets numpy array
Returns: the output of BinaryCrossEntropy Loss to prediction and targets
Return type: numpy.array
-
objective_name
-
class
ztlearn.objectives.
CategoricalCrossEntropy
[source] Bases:
ztlearn.objectives.Objective
Categorical Cross Entropy
Categorical Cross Entropy measures the performance of a classification model whose output is a probability value between 0 and 1. ‘Categorical’ is meant for discrete classification tasks in which the classes are mutually exclusive.
References
- [1] Cross Entropy
- [Wikipedia Article] https://en.wikipedia.org/wiki/Cross_entropy
-
accuracy
(predictions, targets)[source] Calculates the CategoricalCrossEntropy Accuracy Score given prediction and targets
Parameters: - predictions (numpy.array) – the predictions numpy array
- targets (numpy.array) – the targets numpy array
Returns: the output of CategoricalCrossEntropy Accuracy Score
Return type: numpy.float32
-
derivative
(predictions, targets, np_type)[source] Applies the CategoricalCrossEntropy Derivative to prediction and targets provided
Parameters: - predictions (numpy.array) – the predictions numpy array
- targets (numpy.array) – the targets numpy array
Returns: the output of CategoricalCrossEntropy Derivative to prediction and targets
Return type: numpy.array
-
loss
(predictions, targets, np_type)[source] Applies the CategoricalCrossEntropy Loss to prediction and targets provided
Parameters: - predictions (numpy.array) – the predictions numpy array
- targets (numpy.array) – the targets numpy array
Returns: the output of CategoricalCrossEntropy Loss to prediction and targets
Return type: numpy.array
-
objective_name
-
class
ztlearn.objectives.
HellingerDistance
[source] Bases:
object
Hellinger Distance
Hellinger Distance is used to quantify the similarity between two probability distributions.
References
- [1] Hellinger Distance
- [Wikipedia Article] https://en.wikipedia.org/wiki/Hellinger_distance
-
SQRT_2
= 1.4142135623730951
-
accuracy
(predictions, targets, threshold=0.5)[source]
-
derivative
(predictions, targets, np_type)[source] Applies the HellingerDistance Derivative to prediction and targets provided
Parameters: - predictions (numpy.array) – the predictions numpy array
- targets (numpy.array) – the targets numpy array
Returns: the output of HellingerDistance Derivative to prediction and targets
Return type: numpy.array
-
loss
(predictions, targets, np_type)[source] Applies the HellingerDistance Loss to prediction and targets provided
Parameters: - predictions (numpy.array) – the predictions numpy array
- targets (numpy.array) – the targets numpy array
Returns: the output of HellingerDistance Loss to prediction and targets
Return type: numpy.array
-
objective_name
-
sqrt_difference
(predictions, targets)[source]
-
class
ztlearn.objectives.
HingeLoss
[source] Bases:
object
Hinge Loss
Hinge Loss also known as SVM Loss is used “maximum-margin” classification, most notably for support vector machines (SVMs)
References
- [1] Hinge loss
- [Wikipedia Article] https://en.wikipedia.org/wiki/Hinge_loss
-
accuracy
(predictions, targets, threshold=0.5)[source] Calculates the Hinge-Loss Accuracy Score given prediction and targets
Parameters: - predictions (numpy.array) – the predictions numpy array
- targets (numpy.array) – the targets numpy array
Returns: the output of Hinge-Loss Accuracy Score
Return type: numpy.float32
-
derivative
(predictions, targets, np_type)[source] Applies the Hinge-Loss Derivative to prediction and targets provided
Parameters: - predictions (numpy.array) – the predictions numpy array
- targets (numpy.array) – the targets numpy array
Returns: the output of Hinge-Loss Derivative to prediction and targets
Return type: numpy.array
-
loss
(predictions, targets, np_type)[source] Applies the Hinge-Loss to Loss prediction and targets provided
Parameters: - predictions (numpy.array) – the predictions numpy array
- targets (numpy.array) – the targets numpy array
Returns: the output of Hinge-Loss Loss to prediction and targets
Return type: numpy.array
-
objective_name
-
class
ztlearn.objectives.
HuberLoss
[source] Bases:
ztlearn.objectives.Objective
Huber Loss
Huber Loss: is a loss function used in robust regression where it is found to be less sensitive to outliers in data than the squared error loss.
- References:
- [1] Huber Loss
- [Wikipedia Article] https://en.wikipedia.org/wiki/Huber_loss
- [2] Huber loss
- [Wikivisually Article] https://wikivisually.com/wiki/Huber_loss
-
accuracy
(predictions, targets)[source] Calculates the HuberLoss Accuracy Score given prediction and targets
Parameters: - predictions (numpy.array) – the predictions numpy array
- targets (numpy.array) – the targets numpy array
Returns: the output of KLDivergence Accuracy Score
Return type: numpy.float32
-
derivative
(predictions, targets, np_type, delta=1.0)[source] Applies the HuberLoss Derivative to prediction and targets provided
Parameters: - predictions (numpy.array) – the predictions numpy array
- targets (numpy.array) – the targets numpy array
Returns: the output of KLDivergence Derivative to prediction and targets
Return type: numpy.array
-
loss
(predictions, targets, np_type, delta=1.0)[source] Applies the HuberLoss Loss to prediction and targets provided
Parameters: - predictions (numpy.array) – the predictions numpy array
- targets (numpy.array) – the targets numpy array
Returns: the output of KLDivergence Loss to prediction and targets
Return type: numpy.array
-
objective_name
-
class
ztlearn.objectives.
KLDivergence
[source] Bases:
ztlearn.objectives.Objective
KL Divergence
Kullback–Leibler divergence (also called relative entropy) is a measure of divergence between two probability distributions.-
accuracy
(predictions, targets)[source] Calculates the KLDivergence Accuracy Score given prediction and targets
Parameters: - predictions (numpy.array) – the predictions numpy array
- targets (numpy.array) – the targets numpy array
Returns: the output of KLDivergence Accuracy Score
Return type: numpy.float32
-
derivative
(predictions, targets, np_type)[source] Applies the KLDivergence Derivative to prediction and targets provided
Parameters: - predictions (numpy.array) – the predictions numpy array
- targets (numpy.array) – the targets numpy array
Returns: the output of KLDivergence Derivative to prediction and targets
Return type: numpy.array
-
loss
(predictions, targets, np_type)[source] Applies the KLDivergence Loss to prediction and targets provided
Parameters: - predictions (numpy.array) – the predictions numpy array
- targets (numpy.array) – the targets numpy array
Returns: the output of KLDivergence Loss to prediction and targets
Return type: numpy.array
-
objective_name
-
-
class
ztlearn.objectives.
MeanSquaredError
[source] Bases:
object
Mean Squared error (MSE)
MSE measures the average squared difference between the predictions and the targets. The closer the predictions are to the targets the more efficient the estimator.
References
- [1] Mean Squared error
- [Wikipedia Article] https://en.wikipedia.org/wiki/Mean_squared_error
-
accuracy
(predictions, targets, threshold=0.5)[source]
-
derivative
(predictions, targets, np_type)[source] Applies the MeanSquaredError Derivative to prediction and targets provided
Parameters: - predictions (numpy.array) – the predictions numpy array
- targets (numpy.array) – the targets numpy array
Returns: the output of MeanSquaredError Derivative to prediction and targets
Return type: numpy.array
-
loss
(predictions, targets, np_type)[source] Applies the MeanSquaredError Loss to prediction and targets provided
Parameters: - predictions (numpy.array) – the predictions numpy array
- targets (numpy.array) – the targets numpy array
Returns: the output of MeanSquaredError Loss to prediction and targets
Return type: numpy.array
-
objective_name
ztlearn.optimizers module¶
-
class
ztlearn.optimizers.
AdaGrad
(**kwargs)[source] Bases:
ztlearn.optimizers.Optimizer
Adaptive Gradient Algorithm (AdaGrad)
AdaGrad is an optimization method that allows different step sizes for different features. It increases the influence of rare but informative features
References
- [1] An overview of gradient descent optimization algorithms
- [Sebastien Ruder, 2016] https://arxiv.org/abs/1609.04747
- [PDF] https://arxiv.org/pdf/1609.04747.pdf
- [2] Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
- [John Duchi et. al., 2011] http://jmlr.org/papers/v12/duchi11a.html
- [PDF] http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf
Parameters: kwargs – Arbitrary keyword arguments. -
optimization_name
-
update
(weights, grads, epoch_num, batch_num, batch_size)[source]
-
class
ztlearn.optimizers.
Adadelta
(**kwargs)[source] Bases:
ztlearn.optimizers.Optimizer
An Adaptive Learning Rate Method (Adadelta)
Adadelta is an extension of Adagrad that seeks to avoid setting the learing rate to an aggresively monotonically decreasing rate. This is achieved via a dynamic learning rate i.e a diffrent learning rate is computed for each training sample
References
- [1] An overview of gradient descent optimization algorithms
- [Sebastien Ruder, 2016] https://arxiv.org/abs/1609.04747
- [PDF] https://arxiv.org/pdf/1609.04747.pdf
- [2] ADADELTA: An Adaptive Learning Rate Method
- [Matthew D. Zeiler, 2012] https://arxiv.org/abs/1212.5701
- [PDF] https://arxiv.org/pdf/1212.5701.pdf
Parameters: kwargs – Arbitrary keyword arguments. -
optimization_name
-
update
(weights, grads, epoch_num, batch_num, batch_size)[source]
-
class
ztlearn.optimizers.
Adam
(**kwargs)[source] Bases:
ztlearn.optimizers.Optimizer
Adaptive Moment Estimation (Adam)
Adam computes adaptive learning rates for by updating each of the training samples while storing an exponentially decaying average of past squared gradients. Adam also keeps an exponentially decaying average of past gradients.
References
- [1] An overview of gradient descent optimization algorithms
- [Sebastien Ruder, 2016] https://arxiv.org/abs/1609.04747
- [PDF] https://arxiv.org/pdf/1609.04747.pdf
- [2] Adam: A Method for Stochastic Optimization
- [Diederik P. Kingma et. al., 2014] https://arxiv.org/abs/1412.6980
- [PDF] https://arxiv.org/pdf/1412.6980.pdf
Parameters: kwargs – Arbitrary keyword arguments. -
optimization_name
-
update
(weights, grads, epoch_num, batch_num, batch_size)[source]
-
class
ztlearn.optimizers.
Adamax
(**kwargs)[source] Bases:
ztlearn.optimizers.Optimizer
Admax
AdaMax is a variant of Adam based on the infinity norm. The Adam update rule for individual weights is to scale their gradients inversely proportional to a (scaled) L2 norm of their individual c urrent and past gradients. For Adamax we generalize the L2 norm based update rule to a Lp norm based update rule. These variants are numerically unstable for large p. but have special cases where as p tens to infinity, a simple and stable algorithm emerges.
References
- [1] An overview of gradient descent optimization algorithms
- [Sebastien Ruder, 2016] https://arxiv.org/abs/1609.04747
- [PDF] https://arxiv.org/pdf/1609.04747.pdf
- [2] Adam: A Method for Stochastic Optimization
- [Diederik P. Kingma et. al., 2014] https://arxiv.org/abs/1412.6980
- [PDF] https://arxiv.org/pdf/1412.6980.pdf
Parameters: kwargs – Arbitrary keyword arguments. -
optimization_name
-
update
(weights, grads, epoch_num, batch_num, batch_size)[source]
-
class
ztlearn.optimizers.
GD
[source] Bases:
object
Gradient Descent (GD)
GD optimizes parameters theta of an objective function J(theta) by updating all of the training samples in the dataset. The update is perfomed in the opposite direction of the gradient of the objective function d/d_theta J(theta) - with respect to the parameters (theta). The learning rate eta helps determine the size of teh steps we take to the minima
References
- [1] An overview of gradient descent optimization algorithms
- [Sebastien Ruder, 2016] https://arxiv.org/abs/1609.04747
- [PDF] https://arxiv.org/pdf/1609.04747.pdf
-
class
ztlearn.optimizers.
NesterovAcceleratedGradient
(**kwargs)[source] Bases:
ztlearn.optimizers.Optimizer
Nesterov Accelerated Gradient (NAG)
NAG is an improvement in SGDMomentum where the the previous parameter values are smoothed and a gradient descent step is taken from this smoothed value. This enables a more intelligent way of arriving at the minima
References
- [1] An overview of gradient descent optimization algorithms
- [Sebastien Ruder, 2016] https://arxiv.org/abs/1609.04747
- [PDF] https://arxiv.org/pdf/1609.04747.pdf
- [2] A method for unconstrained convex minimization problem with the rate of convergence
- [Nesterov, Y. 1983][PDF] https://goo.gl/X8313t
- [3] Nesterov’s Accelerated Gradient and Momentum as approximations to Regularised Update Descent
- [Aleksandar Botev, 2016] https://arxiv.org/abs/1607.01981
- [PDF] https://arxiv.org/pdf/1607.01981.pdf
Parameters: kwargs – Arbitrary keyword arguments. -
optimization_name
-
update
(weights, grads, epoch_num, batch_num, batch_size)[source]
-
class
ztlearn.optimizers.
OptimizationFunction
(optimizer_kwargs)[source] Bases:
object
-
name
-
update
(weights, grads, epoch_num, batch_num, batch_size)[source]
-
-
class
ztlearn.optimizers.
Optimizer
(**kwargs)[source] Bases:
object
-
get_learning_rate
(current_epoch)[source]
-
-
class
ztlearn.optimizers.
RMSprop
(**kwargs)[source] Bases:
ztlearn.optimizers.Optimizer
Root Mean Squared Propagation (RMSprop)
RMSprop utilizes the magnitude of recent gradients to normalize gradients. A moving average over the root mean squared (RMS) gradients is kept and then divided by the current gradient. Parameters are recomended to be set as follows rho = 0.9 and eta (learning rate) = 0.001
References
- [1] An overview of gradient descent optimization algorithms
- [Sebastien Ruder, 2016] https://arxiv.org/abs/1609.04747
- [PDF] https://arxiv.org/pdf/1609.04747.pdf
- [2] Lecture 6.5 - rmsprop, COURSERA: Neural Networks for Machine Learning
- [Tieleman, T. and Hinton, G. 2012][PDF] https://goo.gl/Dhkvpk
Parameters: kwargs – Arbitrary keyword arguments. -
optimization_name
-
update
(weights, grads, epoch_num, batch_num, batch_size)[source]
-
class
ztlearn.optimizers.
SGD
(**kwargs)[source] Bases:
ztlearn.optimizers.Optimizer
Stochastic Gradient Descent (SGD)
SGD optimizes parameters theta of an objective function J(theta) by updating each of the training samples inputs(i) and targets(i) for all samples in the dataset. The update is perfomed in the opposite direction of the gradient of the objective function d/d_theta J(theta) - with respect to the parameters (theta). The learning rate eta helps determine the size of the steps we take to the minima
References
- [1] An overview of gradient descent optimization algorithms
- [Sebastien Ruder, 2016] https://arxiv.org/abs/1609.04747
- [PDF] https://arxiv.org/pdf/1609.04747.pdf
- [2] Large-Scale Machine Learning with Stochastic Gradient Descent
- [Leon Botou, 2011][PDF] http://leon.bottou.org/publications/pdf/compstat-2010.pdf
Parameters: kwargs – Arbitrary keyword arguments. -
optimization_name
-
update
(weights, grads, epoch_num, batch_num, batch_size)[source]
-
class
ztlearn.optimizers.
SGDMomentum
(**kwargs)[source] Bases:
ztlearn.optimizers.Optimizer
Stochastic Gradient Descent with Momentum (SGDMomentum)
The objective function regularly forms places on the contour map in which the surface curves more steeply than others (ravines). Standard SGD will tend to oscillate across the narrow ravine since the negative gradient will point down one of the steep sides rather than along the ravine towards the optimum. Momentum hepls to push the objective more quickly along the shallow ravine towards the global minima
References
- [1] An overview of gradient descent optimization algorithms
- [Sebastien Ruder, 2016] https://arxiv.org/abs/1609.04747
- [PDF] https://arxiv.org/pdf/1609.04747.pdf
- [2] On the Momentum Term in Gradient Descent Learning Algorithms
- [Ning Qian, 199] https://goo.gl/7fhr14
- [PDF] https://goo.gl/91HtDt
- [3] Two problems with backpropagation and other steepest-descent learning procedures for networks.
- [Sutton, R. S., 1986][PDF] https://goo.gl/M3VFM1
Parameters: kwargs – Arbitrary keyword arguments. -
optimization_name
-
update
(weights, grads, epoch_num, batch_num, batch_size)[source]
-
ztlearn.optimizers.
register_opt
(**kwargs)[source]
ztlearn.regularizers module¶
-
class
ztlearn.regularizers.
ElasticNetRegularization
(_lambda, l1_ratio)[source] Bases:
object
Elastic Net Regularization (ElasticNetRegularization)
ElasticNetRegularization adds both absolute value of magnitude and squared magnitude of coefficient as penalty term to the loss function
References
- [1] Regularization (mathematics)
- [Wikipedia Article] https://en.wikipedia.org/wiki/Regularization_(mathematics)
Parameters: - _lambda (float32) – controls the weight of the penalty term
- l1_ratio (float32) – controls the value l1 penalty as a ratio of total penalty added to the loss function
-
derivative
(weights)[source]
-
regulate
(weights)[source]
-
regulation_name
-
class
ztlearn.regularizers.
L1Regularization
(_lambda, **kwargs)[source] Bases:
object
Lasso Regression (L1Regularization)
L1Regularization adds sum of the absolute value magnitudes of parameters as penalty term to the loss function
References
- [1] Regularization (mathematics)
- [Wikipedia Article] https://en.wikipedia.org/wiki/Regularization_(mathematics)
- [2] Regression shrinkage and selection via the lasso
- [R Tibshirani, 1996] https://goo.gl/Yh9bBU
- [PDF] https://goo.gl/mQP5mA
- [3] Feature selection, L1 vs. L2 regularization, and rotational invariance
- [Andrew Y. Ng, ] [PDF] https://goo.gl/rbwNCt
Parameters: _lambda (float32) – controls the weight of the penalty term -
derivative
(weights)[source]
-
regulate
(weights)[source]
-
regulation_name
-
class
ztlearn.regularizers.
L2Regularization
(_lambda, **kwargs)[source] Bases:
object
Lasso Regression (L2Regularization)
L1Regularization adds sum of the squared magnitudes of parameters as penalty term to the loss function
References
- [1] Regularization (mathematics)
- [Wikipedia Article] https://en.wikipedia.org/wiki/Regularization_(mathematics)
- [2] Regression shrinkage and selection via the lasso
- [R Tibshirani, 1996] https://goo.gl/Yh9bBU
- [PDF] https://goo.gl/mQP5mA
- [3] Feature selection, L1 vs. L2 regularization, and rotational invariance
- [Andrew Y. Ng, ] [PDF] https://goo.gl/rbwNCt
Parameters: _lambda (float32) – controls the weight of the penalty term -
derivative
(weights)[source]
-
regulate
(weights)[source]
-
regulation_name