NAME
Sim::OPT::Interlinear
SYNOPSIS
# As a Perl function:
re.#!/usr/bin/env perl
use Sim::OPT::Interlinear
Sim::OPT::Interlinear::interlinear( "./sourcefile.csv", "./confiinterlinear.pl", "./obtainedmetamodel.csv" );
# or as a script, from the command line:
perl ./Interlinear.pm . ./sourcefile.csv
# (note the dot).
# or, again, from the command line, for beginning with a dialogue question:
interlinear interstart
DESCRIPTION
Interlinear is a program for metamodelling the missing instance values in n-dimensional multivariate datasieries by distance-weighting the nearest-neihbouring gradients between points. The strategy weights the known gradients in a manner inversely proportional to the positional distance between the points they are taken from to the missing nearest-neighbouring points they are going to be used for. It is a zero-order instance-based method. In this scheme, the curvatures in the space are reconstructed by exploiting the fact that in this calculation a local sample of the near-neighbouring gradients is used, which vary for each point. The method in question has been presented in the following publication: http://doi.org/10.1080/19401493.2019.1707875. In this publication, this metamodelling method has been proven capable of outperforming the Kriging method, the MARS method, and polynomial methods. This procedure (a1) is active by default when calling Intelinear. Another version of the procedure (a2), in which the derived points are calculated on a global rather than a local basis, allowing faster computations, but entailing less accurate results, can be activated by setting @modality = ( "simple" ) in the configuration file. Besides strategies a1) and a2), two alternative metamodelling strategies can be utilized in Interlinear:
b) pure linear interpolation (one may want to use this just in some occasions: for example, on factorials);
c) pure nearest neighbour (a strategy of last resort. One may want to use a pass of it to unlock a computation which is based on data which are too sparse to proceed, or when nothing else works).
Strategies a1 and a2) work preferentially on the basis of group of samples that are adjacent in the design space. For example, it does not like to work with only the gradients between a certain iteration 1 and the corresponding iteration 3. It likes to work with the gradient between iterations 1 and 2, or 2 and 3. For that reason, it does not work well with data evenly distributed in the design space, like those deriving from latin hypercube sampling, or a random sampling; and works well with data clustered in small patches, like those deriving from star sampling strategies, or from coordinate descent, or block coordinate descent, or overlapping block coordinate descent. To work well with a latin hypercube sampling, it may be necessary to include a pass of strategy b) or c) before calling strategy a). Then strategy a) will charge itself of reducing the errors created by that initial pass. As an alternative, in strategy a1), the third element of the variable $minreq_forgrad in the configuration file (which is, by default, "confinterlinear.pl") may be set to more than 1: for example $minreq_forgrad = [1, 1, 2]. This makes not only the nearest-neighbouring samples be taken into account in the calculations, but also the 2nd-nearest, or the nth-nearest.
A configuration file should be prepared following the example in the "examples" folder in this distribution. If the configuration file is incomplete or missing, the program will adopt its own defaults, exploiting the distance-weighted gradient-based strategy a1. The only variable that must mandatorily be specified in a configuration file is $sourcefile: the Unix path to the source file containining the dataseries. The source file has to be prepared by listing in each column the values (levels) of the parameters (factors, variables), putting the objective function valuesin the last column in the last column, at the rows in which they are present.
The parameter number is given by the position of the column (i.e. column 4 host parameter 4).
Here below is an example of multivatiate dataseries of 3 parameters assuming 3 levels each. The numbers preceding the objective function (which is in the last colum) are the indices of the multidimensional matrix (tensor).
1,1,1,1,1.234
1,2,3,2,1.500
1,3,3,3
2,1,3,1,1.534
2,2,3,2,0.000
2,3,3,1,0.550
3,1,3,1
3,2,3,2,0.670
3,3,3,3
The program converts this format into the one preferred by Sim::OPTS, which is the following:
1-1_2-1_3-1,9.234
1-1_2-2_3-2,4.500
1-1_2-3_3-3
1-2_2-1_3-1,7.534
1-2_2-2_3-2,0.000
1-2_2-3_3-3,0.550
1-3_2-1_3-1
1-3_2-2_3-2,0.670
1-3_2-3_3-3
(((Note that the parameter listings cannot be incomplete if Interlinear is to be involved without involving Sim::OPT. Just the objective function entries can be incomplete. The following series, for example, is a version of the series above, incomplete as regards the parameter listings:
1-1_2-1_3-1,9.234
1-1_2-2_3-2,4.500
1-2_2-1_3-1,7.534
1-2_2-2_3-2,0.000
1-2_2-3_3-3,0.550
1-3_2-2_3-2,0.670
How to involve Sim::OPT is dealt with at the end of this document.)))
After some computations, Interlinear will output a new dataseries with the missing values filled in. This dataseries can be used by OPT for the optimization of one or more blocks. This can be useful, for example, to save computations in searches involving simulations, especially when the time required by each simulations is long, like it may happen with CFD simulations in building design.
The number of computations required for the creation of a metamodel in OPT increases exponentially with the number of instances in the metamodel. To reduce the exponential, a limit has to be set for the size of the net of instances taken into account in the computations for gradients and for points. The variables in the configuration files controlling those limits are "$nfiltergrads", a limit with adaptive effects (putting a ceiling to the number of originary gradients utilized to derive the points), as well as "$limit_checkdistgrads" and "$limit_checkdistpoints" (putting a limit to the number of derived gradients and points from which the calculations are further propagated at each computation pass). By default they are unspecified. If they are unspecified (i.e. a null value ("") is specified for them), no limit is assumed. "$nfiltergrads" may be set to the double of the square root of the number of instances of a problem space. "$limit_checkdistgrads" and "$limit_checkdistpoints" may be set to a part of the total number of instances, for example that number divided by 1/5, or 1/10. "$limit_checkdistgrads" and "$limit_checkdistpoints" may be given the same value. An example of configuration file with more information in the comments is embedded in this source code, where it sets the defaults.
By utilizing the metamodelling procedure at point (a), Interlinear can also weld two related problem space models together, provided that they share the same parametric structure. This welding is not a mere merge. It is a neighbour-by-neighbour action, much wholler and, yes, cooler. The procedure has been presented in the following publication: http://doi.org/10.1080/19401493.2020.1712477. The action of procedure is controlled by the following settings in the configuration file: 1) @weldsprepared = ( "/home/luca/ffexpexps_full/minmissionsprep.csv" ); #The path to the second dataseries. 2) @parswelds = ( [ 1, 4 ] ); #The parameter numbers of which the welding action has to take place. 3) @recedes = ( 1, 4 ); #This signals with respect to which parameters the first dataseries gives way to the second. (Otherwise, the obtained points would be averaged one-to-one with those of first dataseries. Usually you do not want that.)
To call Interlinear as a Perl function (best strategy): re.pl # open Perl shell use Sim::OPT::Interlinear; # load Interlinear Sim::OPT::Interlinear::interlinear( "./sourcefile.csv", "./confinterlinear.pl", "./obtainedmetamodel.csv" ); "confinterlinear.pl" is the configuration file. If that file is an empty file, Interlinear will assume the default file names above. "./sourcefile.csv" is the only information which is truly mandatory: the path to the csv dataseries to be completed. If is not specified,
To use Interlinear as a script from the command line: perl ./Interlinear.pm . "./sourcefile.csv" "./confinterlinear.pl "; (Note the dot within the line.) Again, if "./sourcefile.csv" is not specified, the default file "./sourcefile.csv" will be sought.
Or to begin with a dialogue question: ./Interlinear.pm interstart; .
The minimal operations for utilizing a data series which is incomplete as regards the parameter listings are the following:
1) copy the executable "opt" in the work folder;
2) create a configuration file for Sim::OPT by modifying the "caravantrial.pl" file in the "examples" folder of this distribution (it is sufficient to modify the few values signalled by capital letters in the comments) and place it in the work folder;
4) copy the .csv file in the work folder;
5) launch Sim::OPT in the shell: << ./opt >>;
6) when asked, specify the name (with relative path) of the Sim::OPT configuration file. For example: ./filename.pl .
EXPORT
interlinear, interstart.
AUTHOR
Gian Luca Brunetti (2018-22) <gianluca.brunetti@polimi.it>
COPYRIGHT AND LICENSE
Copyright (C) 2018-22 by Gian Luca Brunetti and Politecnico di Milano. This is free software. You can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 3 or newer.