NAME

Sim::OPT::Interlinear

SYNOPSIS

# as a function from Perl (for example, after having launched "re.pl" from the command line):
interlinear( "/path/to/a-pre-prepared-configfile.pl", "/path/to/a-pre-prepared-sourcefile.csv", "/path/to/the-metamodel-file-to-be-obtained" );
# or as a script, from the command line, from a directory where the file "Interlinear.pm" has been copied:
./Interlinear.pm .
# (note the dot at the end). In that case, Interlinear will look for the source file "sourcefile.csv" in the "$HOME" directory, and restitute back a file "sourcefile_meta.csv" in the same directory.
# and also, note that, in this case, the opening lines in the script saying "use Sim::OPT" etc. have to be deleted.
# or, again, from the command line, for beginning with a dialogue question:
interlinear interstart

DESCRIPTION

Interlinear is a program for computing the missing values in multivariate datasieries through a strategy entailing distance-weighting the nearest-neihbouring gradients between points in an n-dimensional space. The program adopts a distance-weighted gradient-based strategy. In this strategy, the known gradients are weighted in a manner inversely proportional to the distance of their pivot point from the pivot point of each unknown nearest-neighbouring gradient, then the gradients neighbouring near each unknown point are utilized to define that point, weighting the candidates by distance. In this strategy, the curvatures in the hyperspace derive from the fact that in the calculations local samples of the near-neighbouring gradients are used, which vary for each point. (This strategy is adopted in Interlinear since version 0.103. Before that version, the gradients were calculated on a global basis.) Besides the described strategy (a), the following metamodelling strategies are utilized by Interlinear:

b) pure linear interpolation (one may want to use this in some occasions: for example with factorials);

c) pure nearest neighbour (a strategy of last resort. One may want to use it to unlock a computation which is based on data which are too sparse to proceed, or when nothing else works).

Strategy a) in the adopted setting, works with cases which are adjacent in the design space. In that setting, it the strategy cannot work with the gradient between a certain iteration 1 and the corresponding iteration 3. It can only work with the gradient between iterations 1 and 2, or 2 and 3. (This is a design decision, but it is not inevitable. Versions of Interlinear prior to 0.103 could work with non-adiacent instances, if wanted.) For the told reason, Interlinear does not work well with data evenly distributed in the design space, like those deriving from latin hypercube sampling, or a random sampling; and works well with data clustered in small patches, like those deriving from star (coordinate descent) sampling strategies. To work well with a latin hypercube sampling, it is necessary to include a pass of strategy b) before calling strategy a). Then strategy a) will charge itself of reducing the gradient errors created by the initial pass of strategy b).

A configuration file should be prepared following the example in the "examples" folder in this distribution. If the configuration file is incomplete or missing, the program adopts its own defaults, exploiting the distance-weighted gradient-based strategy.

The only variable that must mandatorily be specified in a configuration file is $sourcefile : the Unix path to the source file containining the dataseries. If not specified, the program will look for "./sourcefile.csv".

The source file has to be prepared by listing in each column the values (levels) of the parameters (factors, variables), putting in the last column the objective function values, in the rows in which they are present.

The parameter number is given by the position of the column (i.e. column 4 host parameter 4).

Here below an example is shown of multivatiate dataseries of 3 parameters assuming 3 levels each. The numbers preceding the objective function (which is in the last colum) are the indices of the multidimensional matrix (tensor). Note that the parameter listings (i.e. numbers describing levels) cannot be incomplete. Just the objective function entries can be.And the parameter listings must be integer numbers from 1 to n.

1,1,1,1.234

1,2,3,2,1.500

1,3,3,3

2,1,3,1,1.534

2,2,3,2,0.000

2,3,3,0.550

3,1,3,1

3,2,3,2,0.670

3,3,3,3

The program converts this format into the one liked by Sim::OPTS, which is the following, in which the indices of the tensor are expressed more clearly:

1-1_2-1_3-1,9.234

1-1_2-2_3-2,4.500

1-1_2-3_3-3

1-2_2-1_3-1,7.534

1-2_2-2_3-2,0.000

1-2_2-3_3-3,0.550

1-3_2-1_3-1

1-3_2-2_3-2,0.670

1-3_2-3_3-3

After some computations, Interlinear will output a new dataseries, with the missing values filled in. This dataseries can be used by OPT for the optimization of one or more blocks. This can be useful for saving computations in searches involving simulations, especially when the time required by each simulations is long, like it may happen with CFD simulations in building design.

The number of computations required for the creation of a metamodel in OPT increases exponencially with the number of instances in the metamodel. To make the increase linear, a limit has to be set for the size of net of instances taken into account in the computations for gradients and for points. The variables in the configuration files controlling those limits are "$limit_checkgrades" and "$limit_checkpoints". By default they are both set to "", which entails that no limit is assumed.

EXPORT

interlinear, interstart.

AUTHOR

Gian Luca Brunetti, <gianluca.brunetti@polimi.it>

COPYRIGHT AND LICENSE

Copyright (C) 2018-19 by Gian Luca Brunetti and Politecnico di Milano. This is free software. You can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 3 or newer.