# ztlearn package¶

## ztlearn.activations module¶

class ztlearn.activations.ActivationFunction(name, activation_dict={})[source]

Bases: object

backward(input_signal)[source]
forward(input_signal)[source]
name
class ztlearn.activations.ELU(activation_dict)[source]

Bases: object

Exponential Linear Units (ELUs)

ELUs are exponential functions which have negative values that allow them to push mean unit activations closer to zero like batch normalization but with lower computational complexity.

References

[1] Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)
Parameters: alpha (float32) – controls the value to which an ELU saturates for negative net inputs
activation(input_signal)[source]

ELU activation applied to input provided

Parameters: input_signal (numpy.array) – the input numpy array the output of the ELU function applied to the input numpy.array
activation_name
derivative(input_signal)[source]

ELU derivative applied to input provided

Parameters: input_signal (numpy.array) – the input numpy array the output of the ELU derivative applied to the input numpy.array
class ztlearn.activations.ElliotSigmoid(activation_dict)[source]

Bases: object

Elliot Sigmoid Activation Function

Elliot Sigmoid squashes each element of the input from the interval ranging [-inf, inf] to the interval ranging [-1, 1] with an ‘S-shaped’ function. The fucntion is fast to calculate on simple computing hardware as it does not require any exponential or trigonometric functions

References

[1] A better Activation Function for Artificial Neural Networks
activation(input_signal)[source]

ElliotSigmoid activation applied to input provided

Parameters: input_signal (numpy.array) – the input numpy array the output of the ElliotSigmoid function applied to the input numpy.array
activation_name
derivative(input_signal)[source]

ElliotSigmoid derivative applied to input provided

Parameters: input_signal (numpy.array) – the input numpy array the output of the ElliotSigmoid derivative applied to the input numpy.array
class ztlearn.activations.LeakyReLU(activation_dict)[source]

Bases: object

LeakyReLU Activation Functions

Leaky ReLUs allow a small non-zero gradient to propagate through the network when the unit is not active hence avoiding bottlenecks that can prevent learning in the Neural Network.

References

[1] Rectifier Nonlinearities Improve Neural Network Acoustic Models
[2] Empirical Evaluation of Rectified Activations in Convolutional Network
Parameters: alpha (float32) – provides for a small non-zero gradient (e.g. 0.01) when the unit is not active.
activation(input_signal)[source]

LeakyReLU activation applied to input provided

Parameters: input_signal (numpy.array) – the input numpy array the output of the LeakyReLU function applied to the input numpy.array
activation_name
derivative(input_signal)[source]

LeakyReLU derivative applied to input provided

Parameters: input_signal (numpy.array) – the input numpy array the output of the LeakyReLU derivative applied to the input numpy.array
class ztlearn.activations.Linear(activation_dict)[source]

Bases: object

Linear Activation Function

Linear Activation applies identity operation on your data such that the output data is proportional to the input data. The function always returns the same value that was used as its argument.

References

[1] Identity Function
activation(input_signal)[source]

Linear activation applied to input provided

Parameters: input_signal (numpy.array) – the input numpy array the output of the Linear function applied to the input numpy.array
activation_name
derivative(input_signal)[source]

Linear derivative applied to input provided

Parameters: input_signal (numpy.array) – the input numpy array the output of the Linear derivative applied to the input numpy.array
class ztlearn.activations.ReLU(activation_dict)[source]

Bases: object

Rectified Linear Units (ReLUs)

Rectifying neurons are an even better model of biological neurons yielding equal or better performance than hyperbolic tangent networks in-spite of the hard non-linearity and non-differentiability at zero hence creating sparse representations with true zeros which seem remarkably suitable for naturally sparse data.

References

[1] Deep Sparse Rectifier Neural Networks
[2] Delving Deep into Rectifiers
activation(input_signal)[source]

ReLU activation applied to input provided

Parameters: input_signal (numpy.array) – the input numpy array the output of the ReLU function applied to the input numpy.array
activation_name
derivative(input_signal)[source]

ReLU derivative applied to input provided

Parameters: input_signal (numpy.array) – the input numpy array the output of the ReLU derivative applied to the input numpy.array
class ztlearn.activations.SELU(activation_dict)[source]

Bases: object

Scaled Exponential Linear Units (SELUs)

SELUs are activations which induce self-normalizing properties and are used in Self-Normalizing Neural Networks (SNNs). SNNs enable high-level abstract representations that tend to automatically converge towards zero mean and unit variance.

References

[1] Self-Normalizing Neural Networks (SELUs)
Parameters: ALPHA (float32) – 1.6732632423543772848170429916717 _LAMBDA (float32) – 1.0507009873554804934193349852946
ALPHA = 1.6732632423543772
activation(input_signal)[source]

SELU activation applied to input provided

Parameters: input_signal (numpy.array) – the input numpy array the output of the SELU function applied to the input numpy.array
activation_name
derivative(input_signal)[source]

SELU derivative applied to input provided

Parameters: input_signal (numpy.array) – the input numpy array the output of the SELU derivative applied to the input numpy.array
class ztlearn.activations.Sigmoid(activation_dict)[source]

Bases: object

Sigmoid Activation Function

A Sigmoid function, often used as the output activation function for binary classification problems as it outputs values that are in the range (0, 1). Sigmoid functions are real-valued and differentiable, producing a curve that is ‘S-shaped’ and feature one local minimum, and one local maximum

References

[1] The influence of the sigmoid function parameters on the speed of backpropagation learning
activation(input_signal)[source]

Sigmoid activation applied to input provided

Parameters: input_signal (numpy.array) – the input numpy array the output of the Sigmoid function applied to the input numpy.array
activation_name
derivative(input_signal)[source]

Sigmoid derivative applied to input provided

Parameters: input_signal (numpy.array) – the input numpy array the output of the Sigmoid derivative applied to the input numpy.array
class ztlearn.activations.SoftPlus(activation_dict)[source]

Bases: object

SoftPlus Activation Function

A Softplus function is a smooth approximation to the rectifier linear units (ReLUs). Near point 0, it is smooth and differentiable and produces outputs in scale of (0, +inf).

References

[1] Incorporating Second-Order Functional Knowledge for Better Option Pricing
activation(input_signal)[source]

SoftPlus activation applied to input provided

Parameters: input_signal (numpy.array) – the input numpy array the output of the SoftPlus function applied to the input numpy.array
activation_name
derivative(input_signal)[source]

SoftPlus derivative applied to input provided

Parameters: input_signal (numpy.array) – the input numpy array the output of the SoftPlus derivative applied to the input numpy.array
class ztlearn.activations.Softmax(activation_dict)[source]

Bases: object

Softmax Activation Function

The Softmax Activation Function is a generalization of the logistic function that squashes the outputs of each unit to real values in the range [0, 1] but it also divides each output such that the total sum of the outputs is equal to 1.

References

[1] Softmax Regression
[2] Deep Learning using Linear Support Vector Machines
[3] Probabilistic Interpretation of Feedforward Network Outputs
activation(input_signal)[source]

Softmax activation applied to input provided

Parameters: input_signal (numpy.array) – the input numpy array the output of the Softmax function applied to the input numpy.array
activation_name
derivative(input_signal)[source]

Softmax derivative applied to input provided

Parameters: input_signal (numpy.array) – the input numpy array the output of the Softmax derivative applied to the input numpy.array
class ztlearn.activations.TanH(activation_dict)[source]

Bases: object

Tangent Hyperbolic (TanH)

The Tangent Hyperbolic function, a rescaled version of the sigmoid function that produces outputs in scale of [-1, +1]. As an activation function it gives an output for every input value hence making is a continuous function.

References

[1] Hyperbolic Functions
activation(input_signal)[source]

TanH activation applied to input provided

Parameters: input_signal (numpy.array) – the input numpy array the output of the TanH function applied to the input numpy.array
activation_name
derivative(input_signal)[source]

TanH derivative applied to input provided

Parameters: input_signal (numpy.array) – the input numpy array the output of the TanH derivative applied to the input numpy.array

## ztlearn.decayers module¶

class ztlearn.decayers.Decay(lr, decay, epoch, min_lr, max_lr)[source]

Bases: object

clip_lr
class ztlearn.decayers.DecayFunction(lr=0.001, name='inverse', decay=1e-06, epoch=1, min_lr=0.0, max_lr=inf, step_size=10.0)[source]

Bases: object

decompose
name
class ztlearn.decayers.ExponetialDecay(lr, decay, epoch, min_lr, max_lr, step_size)[source]

Bases: ztlearn.decayers.Decay

decay_name
decompose
class ztlearn.decayers.InverseTimeDecay(lr, decay, epoch, min_lr, max_lr, step_size)[source]

Bases: ztlearn.decayers.Decay

decay_name
decompose
class ztlearn.decayers.NaturalExponentialDecay(lr, decay, epoch, min_lr, max_lr, step_size)[source]

Bases: ztlearn.decayers.Decay

decay_name
decompose
class ztlearn.decayers.StepDecay(lr, decay, epoch, min_lr, max_lr, step_size)[source]

Bases: ztlearn.decayers.Decay

decay the learning rate every after step_size steps

decay_name
decompose

## ztlearn.initializers module¶

class ztlearn.initializers.GlorotNormal[source]

Glorot Normal (GlorotNormal)

GlorotNormal, more famously known as the Xavier initialization is based on the effort to try mantain the same variance of the gradients of the weights for all the layers. Glorot normal is an implementation based on Gaussian distribution

References

[1] Understanding the difficulty of training deep feedforward neural networks
[2] Initialization Of Deep Feedfoward Networks
init_name
weights(shape, random_seed)[source]
class ztlearn.initializers.GlorotUniform[source]

Glorot Uniform (GlorotUniform)

GlorotUniform, more famously known as the Xavier initialization is based on the effort to try mantain the same variance of the gradients of the weights for all the layers. Glorot uniform is an implementation based on Uniform distribution

References

[1] Understanding the difficulty of training deep feedforward neural networks
[2] Initialization Of Deep Feedfoward Networks
init_name
weights(shape, random_seed)[source]
class ztlearn.initializers.HeNormal[source]

He Normal (HeNormal)

HeNormal is a robust initialization method that particularly considers the rectifier nonlinearities. He normal is an implementation based on Gaussian distribution

References

[1] Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
[2] Initialization Of Deep Networks Case of Rectifiers
init_name
weights(shape, random_seed)[source]
class ztlearn.initializers.HeUniform[source]

He Normal (HeNormal)

HeNormal is a robust initialization method that particularly considers the rectifier nonlinearities. He uniform is an implementation based on Uniform distribution

References

[1] Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
[2] Initialization Of Deep Networks Case of Rectifiers
init_name
weights(shape, random_seed)[source]
class ztlearn.initializers.Identity[source]

Identity (Identity)

Identity is an implementation of weight initialization that returns an identity matrix of size shape

init_name
weights(shape, random_seed)[source]
class ztlearn.initializers.InitializeWeights(name)[source]

Bases: object

initialize_weights(shape, random_seed=None)[source]
name
class ztlearn.initializers.LeCunNormal[source]

LeCun Normal (LeCunNormal)

Weights should be randomly chosen but in such a way that the sigmoid is primarily activated in its linear region. LeCun uniform is an implementation based on Gaussian distribution

References

[1] Efficient Backprop
init_name
weights(shape, random_seed)[source]
class ztlearn.initializers.LeCunUniform[source]

LeCun Uniform (LeCunUniform)

Weights should be randomly chosen but in such a way that the sigmoid is primarily activated in its linear region. LeCun uniform is an implementation based on Uniform distribution

References

[1] Efficient Backprop
init_name
weights(shape, random_seed)[source]
class ztlearn.initializers.One[source]

One (One)

One is an implementation of weight initialization that returns all ones

init_name
weights(shape, random_seed)[source]
class ztlearn.initializers.RandomNormal[source]

Random Normal (RandomNormal)

Random uniform, an implementation of weight initialization based on Gaussian distribution

init_name
weights(shape, random_seed)[source]
class ztlearn.initializers.RandomUniform[source]

Random Uniform (RandomUniform)

Random uniform, an implementation of weight initialization based on Uniform distribution

init_name
weights(shape, random_seed)[source]
class ztlearn.initializers.WeightInitializer[source]

Bases: object

compute_fans(shape)[source]
class ztlearn.initializers.Zero[source]

Zero (Zero)

Zero is an implementation of weight initialization that returns all zeros

init_name
weights(shape, random_seed)[source]

## ztlearn.objectives module¶

class ztlearn.objectives.BinaryCrossEntropy[source]

Binary Cross Entropy

Binary CrossEntropy measures the performance of a classification model whose output is a probability value between 0 & 1. ‘Binary’ is meant for discrete classification tasks in which the classes are independent and not mutually exclusive. Targets here could be either 0 or 1 scalar

References

[1] Cross Entropy
accuracy(predictions, targets, threshold=0.5)[source]

Calculates the BinaryCrossEntropy Accuracy Score given prediction and targets

Parameters: predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array threshold (numpy.float32) – the threshold value the output of BinaryCrossEntropy Accuracy Score numpy.float32
derivative(predictions, targets, np_type)[source]

Applies the BinaryCrossEntropy Derivative to prediction and targets provided

Parameters: predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array the output of BinaryCrossEntropy Derivative to prediction and targets numpy.array
loss(predictions, targets, np_type)[source]

Applies the BinaryCrossEntropy Loss to prediction and targets provided

Parameters: predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array the output of BinaryCrossEntropy Loss to prediction and targets numpy.array
objective_name
class ztlearn.objectives.CategoricalCrossEntropy[source]

Categorical Cross Entropy

Categorical Cross Entropy measures the performance of a classification model whose output is a probability value between 0 and 1. ‘Categorical’ is meant for discrete classification tasks in which the classes are mutually exclusive.

References

[1] Cross Entropy
accuracy(predictions, targets)[source]

Calculates the CategoricalCrossEntropy Accuracy Score given prediction and targets

Parameters: predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array the output of CategoricalCrossEntropy Accuracy Score numpy.float32
derivative(predictions, targets, np_type)[source]

Applies the CategoricalCrossEntropy Derivative to prediction and targets provided

Parameters: predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array the output of CategoricalCrossEntropy Derivative to prediction and targets numpy.array
loss(predictions, targets, np_type)[source]

Applies the CategoricalCrossEntropy Loss to prediction and targets provided

Parameters: predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array the output of CategoricalCrossEntropy Loss to prediction and targets numpy.array
objective_name
class ztlearn.objectives.HellingerDistance[source]

Bases: object

Hellinger Distance

Hellinger Distance is used to quantify the similarity between two probability distributions.

References

[1] Hellinger Distance
SQRT_2 = 1.4142135623730951
accuracy(predictions, targets, threshold=0.5)[source]
derivative(predictions, targets, np_type)[source]

Applies the HellingerDistance Derivative to prediction and targets provided

Parameters: predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array the output of HellingerDistance Derivative to prediction and targets numpy.array
loss(predictions, targets, np_type)[source]

Applies the HellingerDistance Loss to prediction and targets provided

Parameters: predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array the output of HellingerDistance Loss to prediction and targets numpy.array
objective_name
sqrt_difference(predictions, targets)[source]
class ztlearn.objectives.HingeLoss[source]

Bases: object

Hinge Loss

Hinge Loss also known as SVM Loss is used “maximum-margin” classification, most notably for support vector machines (SVMs)

References

[1] Hinge loss
accuracy(predictions, targets, threshold=0.5)[source]

Calculates the Hinge-Loss Accuracy Score given prediction and targets

Parameters: predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array the output of Hinge-Loss Accuracy Score numpy.float32
derivative(predictions, targets, np_type)[source]

Applies the Hinge-Loss Derivative to prediction and targets provided

Parameters: predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array the output of Hinge-Loss Derivative to prediction and targets numpy.array
loss(predictions, targets, np_type)[source]

Applies the Hinge-Loss to Loss prediction and targets provided

Parameters: predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array the output of Hinge-Loss Loss to prediction and targets numpy.array
objective_name
class ztlearn.objectives.HuberLoss[source]

Huber Loss

Huber Loss: is a loss function used in robust regression where it is found to be less sensitive to outliers in data than the squared error loss.

References:
[1] Huber Loss
[2] Huber loss
accuracy(predictions, targets)[source]

Calculates the HuberLoss Accuracy Score given prediction and targets

Parameters: predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array the output of KLDivergence Accuracy Score numpy.float32
derivative(predictions, targets, np_type, delta=1.0)[source]

Applies the HuberLoss Derivative to prediction and targets provided

Parameters: predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array the output of KLDivergence Derivative to prediction and targets numpy.array
loss(predictions, targets, np_type, delta=1.0)[source]

Applies the HuberLoss Loss to prediction and targets provided

Parameters: predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array the output of KLDivergence Loss to prediction and targets numpy.array
objective_name
class ztlearn.objectives.KLDivergence[source]

KL Divergence

Kullback–Leibler divergence (also called relative entropy) is a measure of divergence between two probability distributions.
accuracy(predictions, targets)[source]

Calculates the KLDivergence Accuracy Score given prediction and targets

Parameters: predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array the output of KLDivergence Accuracy Score numpy.float32
derivative(predictions, targets, np_type)[source]

Applies the KLDivergence Derivative to prediction and targets provided

Parameters: predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array the output of KLDivergence Derivative to prediction and targets numpy.array
loss(predictions, targets, np_type)[source]

Applies the KLDivergence Loss to prediction and targets provided

Parameters: predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array the output of KLDivergence Loss to prediction and targets numpy.array
objective_name
class ztlearn.objectives.MeanSquaredError[source]

Bases: object

Mean Squared error (MSE)

MSE measures the average squared difference between the predictions and the targets. The closer the predictions are to the targets the more efficient the estimator.

References

[1] Mean Squared error
accuracy(predictions, targets, threshold=0.5)[source]
derivative(predictions, targets, np_type)[source]

Applies the MeanSquaredError Derivative to prediction and targets provided

Parameters: predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array the output of MeanSquaredError Derivative to prediction and targets numpy.array
loss(predictions, targets, np_type)[source]

Applies the MeanSquaredError Loss to prediction and targets provided

Parameters: predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array the output of MeanSquaredError Loss to prediction and targets numpy.array
objective_name
class ztlearn.objectives.Objective[source]

Bases: object

add_fuzz_factor(np_array, epsilon=1e-05)[source]
clip(predictions, epsilon=1e-15)[source]
error(predictions, targets)[source]
objective_name
class ztlearn.objectives.ObjectiveFunction(name)[source]

Bases: object

accuracy(predictions, targets)[source]
backward(predictions, targets, np_type=<class 'numpy.float32'>)[source]
forward(predictions, targets, np_type=<class 'numpy.float32'>)[source]
name

## ztlearn.optimizers module¶

class ztlearn.optimizers.AdaGrad(**kwargs)[source]

AdaGrad is an optimization method that allows different step sizes for different features. It increases the influence of rare but informative features

References

[1] An overview of gradient descent optimization algorithms
Parameters: kwargs – Arbitrary keyword arguments.
optimization_name
update(weights, grads, epoch_num, batch_num, batch_size)[source]
class ztlearn.optimizers.Adadelta(**kwargs)[source]

Adadelta is an extension of Adagrad that seeks to avoid setting the learing rate to an aggresively monotonically decreasing rate. This is achieved via a dynamic learning rate i.e a diffrent learning rate is computed for each training sample

References

[1] An overview of gradient descent optimization algorithms
Parameters: kwargs – Arbitrary keyword arguments.
optimization_name
update(weights, grads, epoch_num, batch_num, batch_size)[source]
class ztlearn.optimizers.Adam(**kwargs)[source]

Adam computes adaptive learning rates for by updating each of the training samples while storing an exponentially decaying average of past squared gradients. Adam also keeps an exponentially decaying average of past gradients.

References

[1] An overview of gradient descent optimization algorithms
[2] Adam: A Method for Stochastic Optimization
Parameters: kwargs – Arbitrary keyword arguments.
optimization_name
update(weights, grads, epoch_num, batch_num, batch_size)[source]
class ztlearn.optimizers.Adamax(**kwargs)[source]

AdaMax is a variant of Adam based on the infinity norm. The Adam update rule for individual weights is to scale their gradients inversely proportional to a (scaled) L2 norm of their individual c urrent and past gradients. For Adamax we generalize the L2 norm based update rule to a Lp norm based update rule. These variants are numerically unstable for large p. but have special cases where as p tens to infinity, a simple and stable algorithm emerges.

References

[1] An overview of gradient descent optimization algorithms
[2] Adam: A Method for Stochastic Optimization
Parameters: kwargs – Arbitrary keyword arguments.
optimization_name
update(weights, grads, epoch_num, batch_num, batch_size)[source]
class ztlearn.optimizers.GD[source]

Bases: object

GD optimizes parameters theta of an objective function J(theta) by updating all of the training samples in the dataset. The update is perfomed in the opposite direction of the gradient of the objective function d/d_theta J(theta) - with respect to the parameters (theta). The learning rate eta helps determine the size of teh steps we take to the minima

References

[1] An overview of gradient descent optimization algorithms
class ztlearn.optimizers.NesterovAcceleratedGradient(**kwargs)[source]

NAG is an improvement in SGDMomentum where the the previous parameter values are smoothed and a gradient descent step is taken from this smoothed value. This enables a more intelligent way of arriving at the minima

References

[1] An overview of gradient descent optimization algorithms
[2] A method for unconstrained convex minimization problem with the rate of convergence
[3] Nesterov’s Accelerated Gradient and Momentum as approximations to Regularised Update Descent
Parameters: kwargs – Arbitrary keyword arguments.
optimization_name
update(weights, grads, epoch_num, batch_num, batch_size)[source]
class ztlearn.optimizers.OptimizationFunction(optimizer_kwargs)[source]

Bases: object

name
update(weights, grads, epoch_num, batch_num, batch_size)[source]
class ztlearn.optimizers.Optimizer(**kwargs)[source]

Bases: object

get_learning_rate(current_epoch)[source]
class ztlearn.optimizers.RMSprop(**kwargs)[source]

Root Mean Squared Propagation (RMSprop)

RMSprop utilizes the magnitude of recent gradients to normalize gradients. A moving average over the root mean squared (RMS) gradients is kept and then divided by the current gradient. Parameters are recomended to be set as follows rho = 0.9 and eta (learning rate) = 0.001

References

[1] An overview of gradient descent optimization algorithms
[2] Lecture 6.5 - rmsprop, COURSERA: Neural Networks for Machine Learning
Parameters: kwargs – Arbitrary keyword arguments.
optimization_name
update(weights, grads, epoch_num, batch_num, batch_size)[source]
class ztlearn.optimizers.SGD(**kwargs)[source]

SGD optimizes parameters theta of an objective function J(theta) by updating each of the training samples inputs(i) and targets(i) for all samples in the dataset. The update is perfomed in the opposite direction of the gradient of the objective function d/d_theta J(theta) - with respect to the parameters (theta). The learning rate eta helps determine the size of the steps we take to the minima

References

[1] An overview of gradient descent optimization algorithms
[2] Large-Scale Machine Learning with Stochastic Gradient Descent
Parameters: kwargs – Arbitrary keyword arguments.
optimization_name
update(weights, grads, epoch_num, batch_num, batch_size)[source]
class ztlearn.optimizers.SGDMomentum(**kwargs)[source]

Stochastic Gradient Descent with Momentum (SGDMomentum)

The objective function regularly forms places on the contour map in which the surface curves more steeply than others (ravines). Standard SGD will tend to oscillate across the narrow ravine since the negative gradient will point down one of the steep sides rather than along the ravine towards the optimum. Momentum hepls to push the objective more quickly along the shallow ravine towards the global minima

References

[1] An overview of gradient descent optimization algorithms
[2] On the Momentum Term in Gradient Descent Learning Algorithms
[3] Two problems with backpropagation and other steepest-descent learning procedures for networks.
Parameters: kwargs – Arbitrary keyword arguments.
optimization_name
update(weights, grads, epoch_num, batch_num, batch_size)[source]
ztlearn.optimizers.register_opt(**kwargs)[source]

## ztlearn.regularizers module¶

class ztlearn.regularizers.ElasticNetRegularization(_lambda, l1_ratio)[source]

Bases: object

Elastic Net Regularization (ElasticNetRegularization)

ElasticNetRegularization adds both absolute value of magnitude and squared magnitude of coefficient as penalty term to the loss function

References

[1] Regularization (mathematics)
Parameters: _lambda (float32) – controls the weight of the penalty term l1_ratio (float32) – controls the value l1 penalty as a ratio of total penalty added to the loss function
derivative(weights)[source]
regulate(weights)[source]
regulation_name
class ztlearn.regularizers.L1Regularization(_lambda, **kwargs)[source]

Bases: object

Lasso Regression (L1Regularization)

L1Regularization adds sum of the absolute value magnitudes of parameters as penalty term to the loss function

References

[1] Regularization (mathematics)
[2] Regression shrinkage and selection via the lasso
[3] Feature selection, L1 vs. L2 regularization, and rotational invariance
Parameters: _lambda (float32) – controls the weight of the penalty term
derivative(weights)[source]
regulate(weights)[source]
regulation_name
class ztlearn.regularizers.L2Regularization(_lambda, **kwargs)[source]

Bases: object

Lasso Regression (L2Regularization)

L1Regularization adds sum of the squared magnitudes of parameters as penalty term to the loss function

References

[1] Regularization (mathematics)
[2] Regression shrinkage and selection via the lasso
[3] Feature selection, L1 vs. L2 regularization, and rotational invariance
Parameters: _lambda (float32) – controls the weight of the penalty term
derivative(weights)[source]
regulate(weights)[source]
regulation_name
class ztlearn.regularizers.RegularizationFunction(name='lasso', _lambda=0.5, l1_ratio=0.5)[source]

Bases: object

derivative(weights)[source]
name
regulate(weights)[source]