ztlearn package¶

Subpackages¶

Submodules¶

ztlearn.activations module¶

class ztlearn.activations.ActivationFunction(name, activation_dict={})[source]

Bases: object

backward(input_signal)[source]

forward(input_signal)[source]

name

class ztlearn.activations.ELU(activation_dict)[source]

Bases: object

Exponential Linear Units (ELUs)

ELUs are exponential functions which have negative values that allow them to push mean unit activations closer to zero like batch normalization but with lower computational complexity.

References

[1] Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)

[Djork-Arné Clevert et. al., 2016] https://arxiv.org/abs/1511.07289
[PDF] https://arxiv.org/pdf/1511.07289.pdf

Parameters:	alpha (float32) – controls the value to which an ELU saturates for negative net inputs

activation(input_signal)[source]

ELU activation applied to input provided

Parameters:	input_signal (numpy.array) – the input numpy array
Returns:	the output of the ELU function applied to the input
Return type:	numpy.array

activation_name

derivative(input_signal)[source]

ELU derivative applied to input provided

Parameters:	input_signal (numpy.array) – the input numpy array
Returns:	the output of the ELU derivative applied to the input
Return type:	numpy.array

class ztlearn.activations.ElliotSigmoid(activation_dict)[source]

Bases: object

Elliot Sigmoid Activation Function

Elliot Sigmoid squashes each element of the input from the interval ranging [-inf, inf] to the interval ranging [-1, 1] with an ‘S-shaped’ function. The fucntion is fast to calculate on simple computing hardware as it does not require any exponential or trigonometric functions

References

[1] A better Activation Function for Artificial Neural Networks

[David L. Elliott, et. al., 1993] https://goo.gl/qqBdne
[PDF] https://goo.gl/fPLPcr

activation(input_signal)[source]

ElliotSigmoid activation applied to input provided

Parameters:	input_signal (numpy.array) – the input numpy array
Returns:	the output of the ElliotSigmoid function applied to the input
Return type:	numpy.array

activation_name

derivative(input_signal)[source]

ElliotSigmoid derivative applied to input provided

Parameters:	input_signal (numpy.array) – the input numpy array
Returns:	the output of the ElliotSigmoid derivative applied to the input
Return type:	numpy.array

class ztlearn.activations.LeakyReLU(activation_dict)[source]

Bases: object

LeakyReLU Activation Functions

Leaky ReLUs allow a small non-zero gradient to propagate through the network when the unit is not active hence avoiding bottlenecks that can prevent learning in the Neural Network.

References

[1] Rectifier Nonlinearities Improve Neural Network Acoustic Models

[Andrew L. Mass, et. al., 2013] https://goo.gl/k9fhEZ
[PDF] https://goo.gl/v48yXT

[2] Empirical Evaluation of Rectified Activations in Convolutional Network

[Bing Xu, et. al., 2015] https://arxiv.org/abs/1505.00853
[PDF] https://arxiv.org/pdf/1505.00853.pdf

Parameters:	alpha (float32) – provides for a small non-zero gradient (e.g. 0.01) when the unit is not active.

activation(input_signal)[source]

LeakyReLU activation applied to input provided

Parameters:	input_signal (numpy.array) – the input numpy array
Returns:	the output of the LeakyReLU function applied to the input
Return type:	numpy.array

activation_name

derivative(input_signal)[source]

LeakyReLU derivative applied to input provided

Parameters:	input_signal (numpy.array) – the input numpy array
Returns:	the output of the LeakyReLU derivative applied to the input
Return type:	numpy.array

class ztlearn.activations.Linear(activation_dict)[source]

Bases: object

Linear Activation Function

Linear Activation applies identity operation on your data such that the output data is proportional to the input data. The function always returns the same value that was used as its argument.

References

[1] Identity Function

[Wikipedia Article] https://en.wikipedia.org/wiki/Identity_function

activation(input_signal)[source]

Linear activation applied to input provided

Parameters:	input_signal (numpy.array) – the input numpy array
Returns:	the output of the Linear function applied to the input
Return type:	numpy.array

activation_name

derivative(input_signal)[source]

Linear derivative applied to input provided

Parameters:	input_signal (numpy.array) – the input numpy array
Returns:	the output of the Linear derivative applied to the input
Return type:	numpy.array

class ztlearn.activations.ReLU(activation_dict)[source]

Bases: object

Rectified Linear Units (ReLUs)

Rectifying neurons are an even better model of biological neurons yielding equal or better performance than hyperbolic tangent networks in-spite of the hard non-linearity and non-differentiability at zero hence creating sparse representations with true zeros which seem remarkably suitable for naturally sparse data.

References

[1] Deep Sparse Rectifier Neural Networks

[Xavier Glorot., et. al., 2011] http://proceedings.mlr.press/v15/glorot11a.html
[PDF] http://proceedings.mlr.press/v15/glorot11a/glorot11a.pdf

[2] Delving Deep into Rectifiers

[Kaiming He, et. al., 2015] https://arxiv.org/abs/1502.01852
[PDF] https://arxiv.org/pdf/1502.01852.pdf

activation(input_signal)[source]

ReLU activation applied to input provided

Parameters:	input_signal (numpy.array) – the input numpy array
Returns:	the output of the ReLU function applied to the input
Return type:	numpy.array

activation_name

derivative(input_signal)[source]

ReLU derivative applied to input provided

Parameters:	input_signal (numpy.array) – the input numpy array
Returns:	the output of the ReLU derivative applied to the input
Return type:	numpy.array

class ztlearn.activations.SELU(activation_dict)[source]

Bases: object

Scaled Exponential Linear Units (SELUs)

SELUs are activations which induce self-normalizing properties and are used in Self-Normalizing Neural Networks (SNNs). SNNs enable high-level abstract representations that tend to automatically converge towards zero mean and unit variance.

References

[1] Self-Normalizing Neural Networks (SELUs)

[Klambauer, G., et. al., 2017] https://arxiv.org/abs/1706.02515
[PDF] https://arxiv.org/pdf/1706.02515.pdf

Parameters:	ALPHA (float32) – 1.6732632423543772848170429916717 _LAMBDA (float32) – 1.0507009873554804934193349852946

ALPHA = 1.6732632423543772

activation(input_signal)[source]

SELU activation applied to input provided

Parameters:	input_signal (numpy.array) – the input numpy array
Returns:	the output of the SELU function applied to the input
Return type:	numpy.array

activation_name

derivative(input_signal)[source]

SELU derivative applied to input provided

Parameters:	input_signal (numpy.array) – the input numpy array
Returns:	the output of the SELU derivative applied to the input
Return type:	numpy.array

class ztlearn.activations.Sigmoid(activation_dict)[source]

Bases: object

Sigmoid Activation Function

A Sigmoid function, often used as the output activation function for binary classification problems as it outputs values that are in the range (0, 1). Sigmoid functions are real-valued and differentiable, producing a curve that is ‘S-shaped’ and feature one local minimum, and one local maximum

References

[1] The influence of the sigmoid function parameters on the speed of backpropagation learning

[PDF] https://goo.gl/MavJjj

activation(input_signal)[source]

Sigmoid activation applied to input provided

Parameters:	input_signal (numpy.array) – the input numpy array
Returns:	the output of the Sigmoid function applied to the input
Return type:	numpy.array

activation_name

derivative(input_signal)[source]

Sigmoid derivative applied to input provided

Parameters:	input_signal (numpy.array) – the input numpy array
Returns:	the output of the Sigmoid derivative applied to the input
Return type:	numpy.array

class ztlearn.activations.SoftPlus(activation_dict)[source]

Bases: object

SoftPlus Activation Function

A Softplus function is a smooth approximation to the rectifier linear units (ReLUs). Near point 0, it is smooth and differentiable and produces outputs in scale of (0, +inf).

References

[1] Incorporating Second-Order Functional Knowledge for Better Option Pricing

[Charles Dugas, et. al., 2001] https://goo.gl/z3jeYc
[PDF] https://goo.gl/z3jeYc

activation(input_signal)[source]

SoftPlus activation applied to input provided

Parameters:	input_signal (numpy.array) – the input numpy array
Returns:	the output of the SoftPlus function applied to the input
Return type:	numpy.array

activation_name

derivative(input_signal)[source]

SoftPlus derivative applied to input provided

Parameters:	input_signal (numpy.array) – the input numpy array
Returns:	the output of the SoftPlus derivative applied to the input
Return type:	numpy.array

class ztlearn.activations.Softmax(activation_dict)[source]

Bases: object

Softmax Activation Function

The Softmax Activation Function is a generalization of the logistic function that squashes the outputs of each unit to real values in the range [0, 1] but it also divides each output such that the total sum of the outputs is equal to 1.

References

[1] Softmax Regression

[UFLDL Tutorial] https://goo.gl/1qgqdg

[2] Deep Learning using Linear Support Vector Machines

[Yichuan Tang, 2015] https://arxiv.org/abs/1306.0239
[PDF] https://arxiv.org/pdf/1306.0239.pdf

[3] Probabilistic Interpretation of Feedforward Network Outputs

[Mario Costa, 1989] [PDF] https://goo.gl/ZhBY4r

activation(input_signal)[source]

Softmax activation applied to input provided

Parameters:	input_signal (numpy.array) – the input numpy array
Returns:	the output of the Softmax function applied to the input
Return type:	numpy.array

activation_name

derivative(input_signal)[source]

Softmax derivative applied to input provided

Parameters:	input_signal (numpy.array) – the input numpy array
Returns:	the output of the Softmax derivative applied to the input
Return type:	numpy.array

class ztlearn.activations.TanH(activation_dict)[source]

Bases: object

Tangent Hyperbolic (TanH)

The Tangent Hyperbolic function, a rescaled version of the sigmoid function that produces outputs in scale of [-1, +1]. As an activation function it gives an output for every input value hence making is a continuous function.

References

[1] Hyperbolic Functions

[Mathematics Education Centre] https://goo.gl/4Dkkrd
[PDF] https://goo.gl/xPSnif

activation(input_signal)[source]

TanH activation applied to input provided

Parameters:	input_signal (numpy.array) – the input numpy array
Returns:	the output of the TanH function applied to the input
Return type:	numpy.array

activation_name

derivative(input_signal)[source]

TanH derivative applied to input provided

Parameters:	input_signal (numpy.array) – the input numpy array
Returns:	the output of the TanH derivative applied to the input
Return type:	numpy.array

ztlearn.decayers module¶

class ztlearn.decayers.Decay(lr, decay, epoch, min_lr, max_lr)[source]

Bases: object

clip_lr

class ztlearn.decayers.DecayFunction(lr=0.001, name='inverse', decay=1e-06, epoch=1, min_lr=0.0, max_lr=inf, step_size=10.0)[source]

Bases: object

decompose

name

class ztlearn.decayers.ExponetialDecay(lr, decay, epoch, min_lr, max_lr, step_size)[source]

Bases: ztlearn.decayers.Decay

decay_name

decompose

class ztlearn.decayers.InverseTimeDecay(lr, decay, epoch, min_lr, max_lr, step_size)[source]

Bases: ztlearn.decayers.Decay

decay_name

decompose

class ztlearn.decayers.NaturalExponentialDecay(lr, decay, epoch, min_lr, max_lr, step_size)[source]

Bases: ztlearn.decayers.Decay

decay_name

decompose

class ztlearn.decayers.StepDecay(lr, decay, epoch, min_lr, max_lr, step_size)[source]

Bases: ztlearn.decayers.Decay

decay the learning rate every after step_size steps

decay_name

decompose

ztlearn.initializers module¶

class ztlearn.initializers.GlorotNormal[source]

Bases: ztlearn.initializers.WeightInitializer

Glorot Normal (GlorotNormal)

GlorotNormal, more famously known as the Xavier initialization is based on the effort to try mantain the same variance of the gradients of the weights for all the layers. Glorot normal is an implementation based on Gaussian distribution

References

[1] Understanding the difficulty of training deep feedforward neural networks

[Xavier Glorot, 2010] http://proceedings.mlr.press/v9/glorot10a.html
[PDF] http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf

[2] Initialization Of Deep Feedfoward Networks

[DeepGrid Article - Jefkine Kafunah] https://goo.gl/E2XrGe

init_name

weights(shape, random_seed)[source]

class ztlearn.initializers.GlorotUniform[source]

Bases: ztlearn.initializers.WeightInitializer

Glorot Uniform (GlorotUniform)

GlorotUniform, more famously known as the Xavier initialization is based on the effort to try mantain the same variance of the gradients of the weights for all the layers. Glorot uniform is an implementation based on Uniform distribution

References

[1] Understanding the difficulty of training deep feedforward neural networks

[Xavier Glorot, 2010] http://proceedings.mlr.press/v9/glorot10a.html
[PDF] http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf

[2] Initialization Of Deep Feedfoward Networks

[DeepGrid Article - Jefkine Kafunah] https://goo.gl/E2XrGe

init_name

weights(shape, random_seed)[source]

class ztlearn.initializers.HeNormal[source]

Bases: ztlearn.initializers.WeightInitializer

He Normal (HeNormal)

HeNormal is a robust initialization method that particularly considers the rectifier nonlinearities. He normal is an implementation based on Gaussian distribution

References

[1] Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

[Kaiming He, 2015] https://arxiv.org/abs/1502.01852
[PDF] https://arxiv.org/pdf/1502.01852.pdf

[2] Initialization Of Deep Networks Case of Rectifiers

[DeepGrid Article - Jefkine Kafunah] https://goo.gl/TBNw5t

init_name

weights(shape, random_seed)[source]

class ztlearn.initializers.HeUniform[source]

Bases: ztlearn.initializers.WeightInitializer

He Normal (HeNormal)

HeNormal is a robust initialization method that particularly considers the rectifier nonlinearities. He uniform is an implementation based on Uniform distribution

References

[1] Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

[Kaiming He, 2015] https://arxiv.org/abs/1502.01852
[PDF] https://arxiv.org/pdf/1502.01852.pdf

[2] Initialization Of Deep Networks Case of Rectifiers

[DeepGrid Article - Jefkine Kafunah] https://goo.gl/TBNw5t

init_name

weights(shape, random_seed)[source]

class ztlearn.initializers.Identity[source]

Bases: ztlearn.initializers.WeightInitializer

Identity (Identity)

Identity is an implementation of weight initialization that returns an identity matrix of size shape

init_name

weights(shape, random_seed)[source]

class ztlearn.initializers.InitializeWeights(name)[source]

Bases: object

initialize_weights(shape, random_seed=None)[source]

name

class ztlearn.initializers.LeCunNormal[source]

Bases: ztlearn.initializers.WeightInitializer

LeCun Normal (LeCunNormal)

Weights should be randomly chosen but in such a way that the sigmoid is primarily activated in its linear region. LeCun uniform is an implementation based on Gaussian distribution

References

[1] Efficient Backprop

[LeCun, 1998][PDF] http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf

init_name

weights(shape, random_seed)[source]

class ztlearn.initializers.LeCunUniform[source]

Bases: ztlearn.initializers.WeightInitializer

LeCun Uniform (LeCunUniform)

Weights should be randomly chosen but in such a way that the sigmoid is primarily activated in its linear region. LeCun uniform is an implementation based on Uniform distribution

References

[1] Efficient Backprop

[LeCun, 1998][PDF] http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf

init_name

weights(shape, random_seed)[source]

class ztlearn.initializers.One[source]

Bases: ztlearn.initializers.WeightInitializer

One (One)

One is an implementation of weight initialization that returns all ones

init_name

weights(shape, random_seed)[source]

class ztlearn.initializers.RandomNormal[source]

Bases: ztlearn.initializers.WeightInitializer

Random Normal (RandomNormal)

Random uniform, an implementation of weight initialization based on Gaussian distribution

init_name

weights(shape, random_seed)[source]

class ztlearn.initializers.RandomUniform[source]

Bases: ztlearn.initializers.WeightInitializer

Random Uniform (RandomUniform)

Random uniform, an implementation of weight initialization based on Uniform distribution

init_name

weights(shape, random_seed)[source]

class ztlearn.initializers.WeightInitializer[source]

Bases: object

compute_fans(shape)[source]

class ztlearn.initializers.Zero[source]

Bases: ztlearn.initializers.WeightInitializer

Zero (Zero)

Zero is an implementation of weight initialization that returns all zeros

init_name

weights(shape, random_seed)[source]

ztlearn.objectives module¶

class ztlearn.objectives.BinaryCrossEntropy[source]

Bases: ztlearn.objectives.Objective

Binary Cross Entropy

Binary CrossEntropy measures the performance of a classification model whose output is a probability value between 0 & 1. ‘Binary’ is meant for discrete classification tasks in which the classes are independent and not mutually exclusive. Targets here could be either 0 or 1 scalar

References

[1] Cross Entropy

[Wikipedia Article] https://en.wikipedia.org/wiki/Cross_entropy

accuracy(predictions, targets, threshold=0.5)[source]

Calculates the BinaryCrossEntropy Accuracy Score given prediction and targets

Parameters:	predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array threshold (numpy.float32) – the threshold value
Returns:	the output of BinaryCrossEntropy Accuracy Score
Return type:	numpy.float32

derivative(predictions, targets, np_type)[source]

Applies the BinaryCrossEntropy Derivative to prediction and targets provided

Parameters:	predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array
Returns:	the output of BinaryCrossEntropy Derivative to prediction and targets
Return type:	numpy.array

loss(predictions, targets, np_type)[source]

Applies the BinaryCrossEntropy Loss to prediction and targets provided

Parameters:	predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array
Returns:	the output of BinaryCrossEntropy Loss to prediction and targets
Return type:	numpy.array

objective_name

class ztlearn.objectives.CategoricalCrossEntropy[source]

Bases: ztlearn.objectives.Objective

Categorical Cross Entropy

Categorical Cross Entropy measures the performance of a classification model whose output is a probability value between 0 and 1. ‘Categorical’ is meant for discrete classification tasks in which the classes are mutually exclusive.

References

[1] Cross Entropy

[Wikipedia Article] https://en.wikipedia.org/wiki/Cross_entropy

accuracy(predictions, targets)[source]

Calculates the CategoricalCrossEntropy Accuracy Score given prediction and targets

Parameters:	predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array
Returns:	the output of CategoricalCrossEntropy Accuracy Score
Return type:	numpy.float32

derivative(predictions, targets, np_type)[source]

Applies the CategoricalCrossEntropy Derivative to prediction and targets provided

Parameters:	predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array
Returns:	the output of CategoricalCrossEntropy Derivative to prediction and targets
Return type:	numpy.array

loss(predictions, targets, np_type)[source]

Applies the CategoricalCrossEntropy Loss to prediction and targets provided

Parameters:	predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array
Returns:	the output of CategoricalCrossEntropy Loss to prediction and targets
Return type:	numpy.array

objective_name

class ztlearn.objectives.HellingerDistance[source]

Bases: object

Hellinger Distance

Hellinger Distance is used to quantify the similarity between two probability distributions.

References

[1] Hellinger Distance

[Wikipedia Article] https://en.wikipedia.org/wiki/Hellinger_distance

SQRT_2 = 1.4142135623730951

accuracy(predictions, targets, threshold=0.5)[source]

derivative(predictions, targets, np_type)[source]

Applies the HellingerDistance Derivative to prediction and targets provided

Parameters:	predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array
Returns:	the output of HellingerDistance Derivative to prediction and targets
Return type:	numpy.array

loss(predictions, targets, np_type)[source]

Applies the HellingerDistance Loss to prediction and targets provided

Parameters:	predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array
Returns:	the output of HellingerDistance Loss to prediction and targets
Return type:	numpy.array

objective_name

sqrt_difference(predictions, targets)[source]

class ztlearn.objectives.HingeLoss[source]

Bases: object

Hinge Loss

Hinge Loss also known as SVM Loss is used “maximum-margin” classification, most notably for support vector machines (SVMs)

References

[1] Hinge loss

[Wikipedia Article] https://en.wikipedia.org/wiki/Hinge_loss

accuracy(predictions, targets, threshold=0.5)[source]

Calculates the Hinge-Loss Accuracy Score given prediction and targets

Parameters:	predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array
Returns:	the output of Hinge-Loss Accuracy Score
Return type:	numpy.float32

derivative(predictions, targets, np_type)[source]

Applies the Hinge-Loss Derivative to prediction and targets provided

Parameters:	predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array
Returns:	the output of Hinge-Loss Derivative to prediction and targets
Return type:	numpy.array

loss(predictions, targets, np_type)[source]

Applies the Hinge-Loss to Loss prediction and targets provided

Parameters:	predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array
Returns:	the output of Hinge-Loss Loss to prediction and targets
Return type:	numpy.array

objective_name

class ztlearn.objectives.HuberLoss[source]

Bases: ztlearn.objectives.Objective

Huber Loss

Huber Loss: is a loss function used in robust regression where it is found to be less sensitive to outliers in data than the squared error loss.

References:

[1] Huber Loss

[Wikipedia Article] https://en.wikipedia.org/wiki/Huber_loss

[2] Huber loss

[Wikivisually Article] https://wikivisually.com/wiki/Huber_loss

accuracy(predictions, targets)[source]

Calculates the HuberLoss Accuracy Score given prediction and targets

Parameters:	predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array
Returns:	the output of KLDivergence Accuracy Score
Return type:	numpy.float32

derivative(predictions, targets, np_type, delta=1.0)[source]

Applies the HuberLoss Derivative to prediction and targets provided

Parameters:	predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array
Returns:	the output of KLDivergence Derivative to prediction and targets
Return type:	numpy.array

loss(predictions, targets, np_type, delta=1.0)[source]

Applies the HuberLoss Loss to prediction and targets provided

Parameters:	predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array
Returns:	the output of KLDivergence Loss to prediction and targets
Return type:	numpy.array

objective_name

class ztlearn.objectives.KLDivergence[source]

Bases: ztlearn.objectives.Objective

KL Divergence

Kullback–Leibler divergence (also called relative entropy) is a measure of divergence between two probability distributions.

accuracy(predictions, targets)[source]

Calculates the KLDivergence Accuracy Score given prediction and targets

Parameters:	predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array
Returns:	the output of KLDivergence Accuracy Score
Return type:	numpy.float32

derivative(predictions, targets, np_type)[source]

Applies the KLDivergence Derivative to prediction and targets provided

Parameters:	predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array
Returns:	the output of KLDivergence Derivative to prediction and targets
Return type:	numpy.array

loss(predictions, targets, np_type)[source]

Applies the KLDivergence Loss to prediction and targets provided

Parameters:	predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array
Returns:	the output of KLDivergence Loss to prediction and targets
Return type:	numpy.array

objective_name

class ztlearn.objectives.MeanSquaredError[source]

Bases: object

Mean Squared error (MSE)

MSE measures the average squared difference between the predictions and the targets. The closer the predictions are to the targets the more efficient the estimator.

References

[1] Mean Squared error

[Wikipedia Article] https://en.wikipedia.org/wiki/Mean_squared_error

accuracy(predictions, targets, threshold=0.5)[source]

derivative(predictions, targets, np_type)[source]

Applies the MeanSquaredError Derivative to prediction and targets provided

Parameters:	predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array
Returns:	the output of MeanSquaredError Derivative to prediction and targets
Return type:	numpy.array

loss(predictions, targets, np_type)[source]

Applies the MeanSquaredError Loss to prediction and targets provided

Parameters:	predictions (numpy.array) – the predictions numpy array targets (numpy.array) – the targets numpy array
Returns:	the output of MeanSquaredError Loss to prediction and targets
Return type:	numpy.array

objective_name

class ztlearn.objectives.Objective[source]

Bases: object

add_fuzz_factor(np_array, epsilon=1e-05)[source]

clip(predictions, epsilon=1e-15)[source]

error(predictions, targets)[source]

objective_name

class ztlearn.objectives.ObjectiveFunction(name)[source]

Bases: object

accuracy(predictions, targets)[source]

backward(predictions, targets, np_type=<class 'numpy.float32'>)[source]

forward(predictions, targets, np_type=<class 'numpy.float32'>)[source]

name

ztlearn.optimizers module¶

class ztlearn.optimizers.AdaGrad(**kwargs)[source]

Bases: ztlearn.optimizers.Optimizer

Adaptive Gradient Algorithm (AdaGrad)

AdaGrad is an optimization method that allows different step sizes for different features. It increases the influence of rare but informative features

References

[1] An overview of gradient descent optimization algorithms

[Sebastien Ruder, 2016] https://arxiv.org/abs/1609.04747
[PDF] https://arxiv.org/pdf/1609.04747.pdf

[2] Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

[John Duchi et. al., 2011] http://jmlr.org/papers/v12/duchi11a.html
[PDF] http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf

Parameters:	kwargs – Arbitrary keyword arguments.

optimization_name

update(weights, grads, epoch_num, batch_num, batch_size)[source]

class ztlearn.optimizers.Adadelta(**kwargs)[source]

Bases: ztlearn.optimizers.Optimizer

An Adaptive Learning Rate Method (Adadelta)

Adadelta is an extension of Adagrad that seeks to avoid setting the learing rate to an aggresively monotonically decreasing rate. This is achieved via a dynamic learning rate i.e a diffrent learning rate is computed for each training sample

References

[1] An overview of gradient descent optimization algorithms

[Sebastien Ruder, 2016] https://arxiv.org/abs/1609.04747
[PDF] https://arxiv.org/pdf/1609.04747.pdf

[2] ADADELTA: An Adaptive Learning Rate Method

[Matthew D. Zeiler, 2012] https://arxiv.org/abs/1212.5701
[PDF] https://arxiv.org/pdf/1212.5701.pdf

Parameters:	kwargs – Arbitrary keyword arguments.

optimization_name

update(weights, grads, epoch_num, batch_num, batch_size)[source]

class ztlearn.optimizers.Adam(**kwargs)[source]

Bases: ztlearn.optimizers.Optimizer

Adaptive Moment Estimation (Adam)

Adam computes adaptive learning rates for by updating each of the training samples while storing an exponentially decaying average of past squared gradients. Adam also keeps an exponentially decaying average of past gradients.

References

[1] An overview of gradient descent optimization algorithms

[Sebastien Ruder, 2016] https://arxiv.org/abs/1609.04747
[PDF] https://arxiv.org/pdf/1609.04747.pdf

[2] Adam: A Method for Stochastic Optimization

[Diederik P. Kingma et. al., 2014] https://arxiv.org/abs/1412.6980
[PDF] https://arxiv.org/pdf/1412.6980.pdf

Parameters:	kwargs – Arbitrary keyword arguments.

optimization_name

update(weights, grads, epoch_num, batch_num, batch_size)[source]

class ztlearn.optimizers.Adamax(**kwargs)[source]

Bases: ztlearn.optimizers.Optimizer

Admax

AdaMax is a variant of Adam based on the infinity norm. The Adam update rule for individual weights is to scale their gradients inversely proportional to a (scaled) L2 norm of their individual c urrent and past gradients. For Adamax we generalize the L2 norm based update rule to a Lp norm based update rule. These variants are numerically unstable for large p. but have special cases where as p tens to infinity, a simple and stable algorithm emerges.

References

[1] An overview of gradient descent optimization algorithms

[Sebastien Ruder, 2016] https://arxiv.org/abs/1609.04747
[PDF] https://arxiv.org/pdf/1609.04747.pdf

[2] Adam: A Method for Stochastic Optimization

[Diederik P. Kingma et. al., 2014] https://arxiv.org/abs/1412.6980
[PDF] https://arxiv.org/pdf/1412.6980.pdf

Parameters:	kwargs – Arbitrary keyword arguments.

optimization_name

update(weights, grads, epoch_num, batch_num, batch_size)[source]

class ztlearn.optimizers.GD[source]

Bases: object

Gradient Descent (GD)

GD optimizes parameters theta of an objective function J(theta) by updating all of the training samples in the dataset. The update is perfomed in the opposite direction of the gradient of the objective function d/d_theta J(theta) - with respect to the parameters (theta). The learning rate eta helps determine the size of teh steps we take to the minima

References

[1] An overview of gradient descent optimization algorithms

[Sebastien Ruder, 2016] https://arxiv.org/abs/1609.04747
[PDF] https://arxiv.org/pdf/1609.04747.pdf

class ztlearn.optimizers.NesterovAcceleratedGradient(**kwargs)[source]

Bases: ztlearn.optimizers.Optimizer

Nesterov Accelerated Gradient (NAG)

NAG is an improvement in SGDMomentum where the the previous parameter values are smoothed and a gradient descent step is taken from this smoothed value. This enables a more intelligent way of arriving at the minima

References

[1] An overview of gradient descent optimization algorithms

[Sebastien Ruder, 2016] https://arxiv.org/abs/1609.04747
[PDF] https://arxiv.org/pdf/1609.04747.pdf

[2] A method for unconstrained convex minimization problem with the rate of convergence

[Nesterov, Y. 1983][PDF] https://goo.gl/X8313t

[3] Nesterov’s Accelerated Gradient and Momentum as approximations to Regularised Update Descent

[Aleksandar Botev, 2016] https://arxiv.org/abs/1607.01981
[PDF] https://arxiv.org/pdf/1607.01981.pdf

Parameters:	kwargs – Arbitrary keyword arguments.

optimization_name

update(weights, grads, epoch_num, batch_num, batch_size)[source]

class ztlearn.optimizers.OptimizationFunction(optimizer_kwargs)[source]

Bases: object

name

update(weights, grads, epoch_num, batch_num, batch_size)[source]

class ztlearn.optimizers.Optimizer(**kwargs)[source]

Bases: object

get_learning_rate(current_epoch)[source]

class ztlearn.optimizers.RMSprop(**kwargs)[source]

Bases: ztlearn.optimizers.Optimizer

Root Mean Squared Propagation (RMSprop)

RMSprop utilizes the magnitude of recent gradients to normalize gradients. A moving average over the root mean squared (RMS) gradients is kept and then divided by the current gradient. Parameters are recomended to be set as follows rho = 0.9 and eta (learning rate) = 0.001

References

[1] An overview of gradient descent optimization algorithms

[Sebastien Ruder, 2016] https://arxiv.org/abs/1609.04747
[PDF] https://arxiv.org/pdf/1609.04747.pdf

[2] Lecture 6.5 - rmsprop, COURSERA: Neural Networks for Machine Learning

[Tieleman, T. and Hinton, G. 2012][PDF] https://goo.gl/Dhkvpk

Parameters:	kwargs – Arbitrary keyword arguments.

optimization_name

update(weights, grads, epoch_num, batch_num, batch_size)[source]

class ztlearn.optimizers.SGD(**kwargs)[source]

Bases: ztlearn.optimizers.Optimizer

Stochastic Gradient Descent (SGD)

SGD optimizes parameters theta of an objective function J(theta) by updating each of the training samples inputs(i) and targets(i) for all samples in the dataset. The update is perfomed in the opposite direction of the gradient of the objective function d/d_theta J(theta) - with respect to the parameters (theta). The learning rate eta helps determine the size of the steps we take to the minima

References

[1] An overview of gradient descent optimization algorithms

[Sebastien Ruder, 2016] https://arxiv.org/abs/1609.04747
[PDF] https://arxiv.org/pdf/1609.04747.pdf

[2] Large-Scale Machine Learning with Stochastic Gradient Descent

[Leon Botou, 2011][PDF] http://leon.bottou.org/publications/pdf/compstat-2010.pdf

Parameters:	kwargs – Arbitrary keyword arguments.

optimization_name

update(weights, grads, epoch_num, batch_num, batch_size)[source]

class ztlearn.optimizers.SGDMomentum(**kwargs)[source]

Bases: ztlearn.optimizers.Optimizer

Stochastic Gradient Descent with Momentum (SGDMomentum)

The objective function regularly forms places on the contour map in which the surface curves more steeply than others (ravines). Standard SGD will tend to oscillate across the narrow ravine since the negative gradient will point down one of the steep sides rather than along the ravine towards the optimum. Momentum hepls to push the objective more quickly along the shallow ravine towards the global minima

References

[1] An overview of gradient descent optimization algorithms

[Sebastien Ruder, 2016] https://arxiv.org/abs/1609.04747
[PDF] https://arxiv.org/pdf/1609.04747.pdf

[2] On the Momentum Term in Gradient Descent Learning Algorithms

[Ning Qian, 199] https://goo.gl/7fhr14
[PDF] https://goo.gl/91HtDt

[3] Two problems with backpropagation and other steepest-descent learning procedures for networks.

[Sutton, R. S., 1986][PDF] https://goo.gl/M3VFM1

Parameters:	kwargs – Arbitrary keyword arguments.

optimization_name

update(weights, grads, epoch_num, batch_num, batch_size)[source]

ztlearn.optimizers.register_opt(**kwargs)[source]

ztlearn.regularizers module¶

class ztlearn.regularizers.ElasticNetRegularization(_lambda, l1_ratio)[source]

Bases: object

Elastic Net Regularization (ElasticNetRegularization)

ElasticNetRegularization adds both absolute value of magnitude and squared magnitude of coefficient as penalty term to the loss function

References

[1] Regularization (mathematics)

[Wikipedia Article] https://en.wikipedia.org/wiki/Regularization_(mathematics)

Parameters:	_lambda (float32) – controls the weight of the penalty term l1_ratio (float32) – controls the value l1 penalty as a ratio of total penalty added to the loss function

derivative(weights)[source]

regulate(weights)[source]

regulation_name

class ztlearn.regularizers.L1Regularization(_lambda, **kwargs)[source]

Bases: object

Lasso Regression (L1Regularization)

L1Regularization adds sum of the absolute value magnitudes of parameters as penalty term to the loss function

References

[1] Regularization (mathematics)

[Wikipedia Article] https://en.wikipedia.org/wiki/Regularization_(mathematics)

[2] Regression shrinkage and selection via the lasso

[R Tibshirani, 1996] https://goo.gl/Yh9bBU
[PDF] https://goo.gl/mQP5mA

[3] Feature selection, L1 vs. L2 regularization, and rotational invariance

[Andrew Y. Ng, ] [PDF] https://goo.gl/rbwNCt

Parameters:	_lambda (float32) – controls the weight of the penalty term

derivative(weights)[source]

regulate(weights)[source]

regulation_name

class ztlearn.regularizers.L2Regularization(_lambda, **kwargs)[source]

Bases: object

Lasso Regression (L2Regularization)

L1Regularization adds sum of the squared magnitudes of parameters as penalty term to the loss function

References

[1] Regularization (mathematics)

[Wikipedia Article] https://en.wikipedia.org/wiki/Regularization_(mathematics)

[2] Regression shrinkage and selection via the lasso

[R Tibshirani, 1996] https://goo.gl/Yh9bBU
[PDF] https://goo.gl/mQP5mA

[3] Feature selection, L1 vs. L2 regularization, and rotational invariance

[Andrew Y. Ng, ] [PDF] https://goo.gl/rbwNCt

Parameters:	_lambda (float32) – controls the weight of the penalty term

derivative(weights)[source]

regulate(weights)[source]

regulation_name

class ztlearn.regularizers.RegularizationFunction(name='lasso', _lambda=0.5, l1_ratio=0.5)[source]

Bases: object

derivative(weights)[source]

name

regulate(weights)[source]

ztlearn package¶

Subpackages¶

Submodules¶

ztlearn.activations module¶

ztlearn.decayers module¶

ztlearn.initializers module¶

ztlearn.objectives module¶

ztlearn.optimizers module¶

ztlearn.regularizers module¶

Module contents¶