Deep Lattice Networks (DLN)
- Von Björn Piepenburg
- binary, Classification, Deep lattice network, Monotony, Tensorflow
Share post:
Frequent problems that we are confronted with can be traced back to a (multidimensional) regression or a classification. Starting from a set of characteristics, regression attempts to represent the dependency between the characteristics and a target variable as a function. One example would be the dependence of turnover on expenditure on research and on employee satisfaction in a company. Classification involves estimating the probability that an object belongs to a predefined class. A well-known example is the classification of image content (dog, cat, truck, …).
The complexity of regression and classification problems depends on the number of features and the degree of correlation between the features and the target variable. Another aspect is the interdependence of the characteristics. Simple problems can be solved using classical static methods, which have the advantage that the results can be easily interpreted. Problems of practical relevance are generally described by several characteristics that influence each other. A well-known and frequently used method for solving such problems is an artificial neural network, which is very powerful in solving these problems.
Deep lattice networks extend the properties of an artificial neural network to include monotonicity conditions between individual features and the target variable. One example of the necessity of this is a price model in which the price of a unit is to be reduced with the order quantity. The project, which was initiated and supported by Google developed process consists essentially of n
n
-dimensional Hypercube the edge length 1
1
where n
n
describes the number of characteristics. Each dimension of the cube therefore represents a characteristic. A (multidimensional) function is placed in the cube, which describes the relationship between the characteristics and the target variable. For this purpose, function values are calculated for the corner points of the cube, which represent the variable parameters of the model, on the basis of training data. The target function is interpolated linearly between the corner points. The edges of the cube can be subdivided to increase the detail of the function. A grid(lattice) is placed through the division points, the intersection points of which are assigned function values as additional parameters of the model.
The following figure shows a lattice with two features (i.e. a square). For the four corner points
θ[i]
θ
[
i
]
values were derived from a set of training data as parameters of the model. The function to be represented
f(x)
f
(
x
)
is approximated by linear interpolation between the four corner points. No further support points were used between the corner points to compress the grid and increase the level of detail of the approximated function.
Similar to an artificial neural network, training is performed by iteratively adjusting the parameters with the aim of minimizing the error between the model output and the observed values of the target variable. This procedure is known as supervised learning. After training, values for the target variable can be determined for (unknown) feature combinations.
In another post in this blog
I describe the application of a deep lattice network to determine the acceptance probabilities for transportation orders. This is part of a research project funded by the mFund, which we have successfully implemented.