Deep Lattice Networks (DLN)

Frequent problems that we are confronted with can be traced back to a (multi­di­men­sional) regres­sion or a classi­fi­ca­tion. Starting from a set of charac­te­ristics, regres­sion attempts to repre­sent the depen­dency between the charac­te­ristics and a target variable as a function. One example would be the depen­dence of turnover on expen­diture on research and on employee satis­fac­tion in a company. Classi­fi­ca­tion involves estimating the proba­bi­lity that an object belongs to a prede­fined class. A well-known example is the classi­fi­ca­tion of image content (dog, cat, truck, …).

The comple­xity of regres­sion and classi­fi­ca­tion problems depends on the number of features and the degree of corre­la­tion between the features and the target variable. Another aspect is the inter­de­pen­dence of the charac­te­ristics. Simple problems can be solved using classical static methods, which have the advan­tage that the results can be easily inter­preted. Problems of practical relevance are generally described by several charac­te­ristics that influence each other. A well-known and frequently used method for solving such problems is an artifi­cial neural network, which is very powerful in solving these problems.

Deep lattice networks extend the proper­ties of an artifi­cial neural network to include monoto­ni­city condi­tions between indivi­dual features and the target variable. One example of the neces­sity of this is a price model in which the price of a unit is to be reduced with the order quantity. The project, which was initiated and supported by Google developed process consists essen­ti­ally of n-dimen­sional Hyper­cube the edge length 1where n describes the number of charac­te­ristics. Each dimen­sion of the cube there­fore repres­ents a charac­te­ristic. A (multi­di­men­sional) function is placed in the cube, which describes the relati­onship between the charac­te­ristics and the target variable. For this purpose, function values are calcu­lated for the corner points of the cube, which repre­sent the variable parame­ters of the model, on the basis of training data. The target function is inter­po­lated linearly between the corner points. The edges of the cube can be subdi­vided to increase the detail of the function. A grid(lattice) is placed through the division points, the inter­sec­tion points of which are assigned function values as additional parame­ters of the model.

The follo­wing figure shows a lattice with two features (i.e. a square). For the four corner points

values were derived from a set of training data as parame­ters of the model. The function to be repre­sented

is appro­xi­mated by linear inter­po­la­tion between the four corner points. No further support points were used between the corner points to compress the grid and increase the level of detail of the appro­xi­mated function.

With an increase in the number of features or by incre­asing the level of detail of the function to be repre­sented, the number of functions to be optimized increases. parameter exponen­ti­ally. Let’s take an example: you have 15 charac­te­ristics that have an influence on the target value. Each charac­te­ristic should be charac­te­rized by 10 Support points ( 2 Key points and 8 additional subdi­vi­sion points) can be described in the lattice. This creates a cube with 1015=1 quadril­lion variable parame­ters. To reduce the comple­xity of the model, the charac­te­ristics can be divided into several separate dice, the results of which are combined after the training sessions. become. The so-called Crystal algorithm can be used to deter­mine the Charac­te­ristics accor­ding to their simila­rity on diffe­rent cubes to divide. Merging can be done via a simple averaging or through further lattices, whereby a Lattice network is created.

Similar to an artifi­cial neural network, training is performed by itera­tively adjus­ting the parame­ters with the aim of minimi­zing the error between the model output and the observed values of the target variable. This proce­dure is known as super­vised learning. After training, values for the target variable can be deter­mined for (unknown) feature combi­na­tions.

In another post in this blog

I describe the appli­ca­tion of a deep lattice network to deter­mine the accep­tance proba­bi­li­ties for trans­por­ta­tion orders. This is part of a research project funded by the mFund, which we have successfully imple­mented.

