LESSON.md 9.13 KB
 Johann Dreo committed Jan 06, 2020 1 Introduction  Johann Dreo committed Sep 28, 2020 2 ============  Johann Dreo committed Jan 06, 2020 3   Johann Dreo committed Sep 28, 2020 4 5 6 7 8 9 10 11 12 13 14 15 16 17 > Metaheuristics are mathematical optimization algorithms solving $\argmin_{x \in X} f(x)$. > > Synonyms: > > - search heuristics, > - evolutionary algorithms, > - stochastic local search. > > The general approach is to only look at the solutions, by trial and error, without further information on its structure. > Hence the problem is often labelled as "black-box". > > Link to NP-hardness/curse of dimensionality: easy to evaluate, hard to solve. > Easy to evaluate = fast, but not as fast as the algorithm itself. > Hard to solve, but not impossible.  Johann Dreo committed Jan 06, 2020 18   Johann Dreo committed Jan 06, 2020 19   Johann Dreo committed Sep 28, 2020 20 21 Hard optimization -----------------  Johann Dreo committed Jan 06, 2020 22   Johann Dreo committed Sep 28, 2020 23 Metaheuristics are algorithms which aim at solving "hard" mathematical optimization problems.  Johann Dreo committed Jan 06, 2020 24   Johann Dreo committed Sep 28, 2020 25 26 27 A mathematical optimization problem is defined by a "solution" $x$ and an "objective function" $f:x\mapsto\reals$ : $$\argmin_{x \in X} f(x)$$  Johann Dreo committed Jan 06, 2020 28   Johann Dreo committed Sep 28, 2020 29 30 31 32 One can consider using $\argmax$ without loss of genericity[^argminmax]. Usually, the set $X$ is defined intentionally and constraints on $x$ are managed separately. For example, using a function $g:x\mapsto \{0,1\}$: $$\argmin_{x\in[0,1]^n} f(x),\quad\text{s.t.}\quad g(x)=0$$  Johann Dreo committed Jan 06, 2020 33 34 35   Johann Dreo committed Sep 28, 2020 36 [^argminmax]: In the metaheuristics literature, $\argmax$ is often assumed for evolutionary algorithms, whether $\argmin$ is often assumed for local search or simulated annealing.  Johann Dreo committed Jan 06, 2020 37 38   Johann Dreo committed Oct 12, 2020 39 40 41 42  Complete VS approximation VS heuristics.  Johann Dreo committed Sep 28, 2020 43 44 45 46 47 Algorithmics ============ > Those algorithms are randomized and iteratives (hence stochastics) and manipulates a sample (synonym population) > of solutions (s. individual) to the problem, each one being associated with its quality (s. cost, fitness).  Johann Dreo committed Oct 12, 2020 48 >  Johann Dreo committed Sep 28, 2020 49 > Thus, algorithms have a main loop, and articulate functions that manipulates the sample (called "operators").  Johann Dreo committed Oct 12, 2020 50 >  Johann Dreo committed Sep 28, 2020 51 52 53 > Main design problem: exploitation/exploration compromise (s. intensification/diversification). > Main design goal: raise the abstraction level. > Main design tools: learning (s. memory) + heuristics (s. bias).  Johann Dreo committed Oct 12, 2020 54 >  Johann Dreo committed Sep 28, 2020 55 > Forget metaphors and use mathematical descriptions.  Johann Dreo committed Oct 12, 2020 56 >  Johann Dreo committed Sep 28, 2020 57 > Seek a compromise between complexity, performances and explainability.  Johann Dreo committed Oct 12, 2020 58 >  Johann Dreo committed Sep 28, 2020 59 60 61 62 > The is no better "method". > Difference between model and instance, for problem and algorithm. > No Free Lunch Theorem. > But there is a "better algorithm instances on a given problem instances set".  Johann Dreo committed Oct 12, 2020 63 >  Johann Dreo committed Sep 28, 2020 64 > The better you understand it, the better the algorithm will be.  Johann Dreo committed Jan 06, 2020 65 66 67  Problem modelization  Johann Dreo committed Sep 28, 2020 68 ====================  Johann Dreo committed Jan 06, 2020 69   Johann Dreo committed Sep 28, 2020 70 71 > Way to assess the quality: fitness function. > Way to model a solution: encoding.  Johann Dreo committed Jan 06, 2020 72   Johann Dreo committed Jan 06, 2020 73   Johann Dreo committed Sep 28, 2020 74 ## Main models  Johann Dreo committed Jan 06, 2020 75   Johann Dreo committed Sep 28, 2020 76 > Encoding:  Johann Dreo committed Oct 12, 2020 77 >  Johann Dreo committed Sep 28, 2020 78 79 80 > - continuous (s. numeric), > - discrete metric (integers), > - combinatorial (graph, permutation).  Johann Dreo committed Oct 12, 2020 81 >  Johann Dreo committed Sep 28, 2020 82 > Fitness:  Johann Dreo committed Oct 12, 2020 83 >  Johann Dreo committed Sep 28, 2020 84 85 > - mono-objective, > - multi-modal,  Johann Dreo committed Nov 02, 2020 86 > - multi-objectives (cf. Pareto optimality).  Johann Dreo committed Jan 06, 2020 87   Johann Dreo committed Jan 06, 2020 88   Johann Dreo committed Sep 28, 2020 89 ## Constraints management  Johann Dreo committed Jan 06, 2020 90   Johann Dreo committed Sep 28, 2020 91 92 93 94 > Main constraints management tools for operators: > - penalization, > - reparation, > - generation.  Johann Dreo committed Jan 06, 2020 95 96 97  Performance evaluation  Johann Dreo committed Sep 28, 2020 98 99 ======================  Johann Dreo committed Oct 12, 2020 100 101 What is performance -------------------  Johann Dreo committed Sep 28, 2020 102 103  > Main performances axis:  Johann Dreo committed Oct 12, 2020 104 >  Johann Dreo committed Sep 28, 2020 105 106 107 > - time, > - quality, > - probability.  Johann Dreo committed Oct 12, 2020 108 >  Johann Dreo committed Sep 28, 2020 109 > Additional performance axis:  Johann Dreo committed Oct 12, 2020 110 >  Johann Dreo committed Nov 02, 2020 111 112 > - robustness (cf. "external" problem structure), > - stability (cf. "internal" randomisation).  Johann Dreo committed Oct 12, 2020 113 >  Johann Dreo committed Sep 28, 2020 114 115 116 > Golden rule: the output of a metaheuristic is a distribution, not a solution.  Johann Dreo committed Oct 12, 2020 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 ### Run time and target quality One may think that the obvious objective of an optimization algorithm is to find the location of the optimum. While this is true for deterministic and provable optimization, it is more complex in the case of metaheuristics. When dealing with search heuristics, the quality of the (sub)optimum found is also a performance metric, as one want to maximize the quality of the best solution found during the search. The two main performance metrics are thus the runtime that is necessary to find a solution of a given quality, and conversely, the quality of the solution which can be found in a given runtime. Of course, those two metrics tend to be contradictory: the more time is given to the search, the best the solution, and conversely, the better the solution one want, the longer the run should be. ### Measuring time and quality To measure the run time, a robust measure of time should be available. However, measuring run time on a modern computer is not necessarily robust, for instance because one cannot easily control the context switching managed by the scheduler, or because the CPU load can produce memory access contentions. However, in practical application, the call to the objective function largely dominates any other part of the algorithm. The number of calls to the objective function is thus a robust proxy measure of the run time. To measure the quality of solutions, the obvious choice is to rely on absolute values. However, this may vary a lot across problem instances and be read differently if one has a minimization or a maximization problem. It may thus be useful to use the error against a known bound of the problem. ### Probability of attainment For metaheuristics which are based on randomized process (which is the vast majority of them), measuring time and quality is not enough to estimate their performance. Indeed, if one run several time the same "algorithm", one will get different results, hence different performances. That's why it's more useful to consider that the "output" of an "algorithm" is not a single solution, but a random variable, a distribution of several solutions. If one define an "algorithm" with fuzzy concepts, like "simulated annealing" or "genetic algorithm", the reason is obvious, because the terms encompass a large variety of possible implementations. But one should keep in mind that even a given implementation has (a lot of) parameters, and that metaheuristics are usually (very) sensitive to their parameter setting. In order to have a good mental image of how to asses the performance of a solver, one should relize that we can only *estimates* the performances, considering *at least* run time, quality *and probability* of attaining a fixed target. ### Robustness and stability Empirical evaluation --------------------  Johann Dreo committed Sep 28, 2020 167 168  > Proof-reality gap is huge, thus empirical performance evaluation is gold standard.  Johann Dreo committed Oct 12, 2020 169 >  Johann Dreo committed Sep 28, 2020 170 > Empirical evaluation = scientific method.  Johann Dreo committed Oct 12, 2020 171 >  Johann Dreo committed Sep 28, 2020 172 > Basic rules of thumb:  Johann Dreo committed Oct 12, 2020 173 >  Johann Dreo committed Sep 28, 2020 174 175 176 177 178 179 180 181 182 > - randomized algorithms => repetition of runs, > - sensitivity to parameters => design of experiments, > - use statistical tools, > - design experiments to answer a single question, > - test one thing at a time. ## Useful statistical tools > Statistical tests:  Johann Dreo committed Oct 12, 2020 183 >  Johann Dreo committed Sep 28, 2020 184 185 > - classical null hypothesis: test equality of distributions. > - beware of p-value.  Johann Dreo committed Oct 12, 2020 186 >  Johann Dreo committed Sep 28, 2020 187 > How many runs?  Johann Dreo committed Oct 12, 2020 188 >  Johann Dreo committed Sep 28, 2020 189 190 191 > - not always "as many as possible", > - maybe "as many as needed", > - generally: 15 (min for non-parametric tests) -- 20 (min for parametric-gaussian tests).  Johann Dreo committed Oct 12, 2020 192 >  Johann Dreo committed Sep 28, 2020 193 194 195 196 197 198 > Use robust estimators: median instead of mean, Inter Quartile Range instead of standard deviation. ## Expected Empirical Cumulative Distribution Functions > On Run Time: ERT-ECDF.  Johann Dreo committed Oct 12, 2020 199 >  Johann Dreo committed Sep 28, 2020 200 > $$ERTECDF(\{X_0,\dots,X_i,\dots,X_r\}, \delta, f, t) := \#\{x_t \in X_t | f(x_t^*)>=\delta \}$$  Johann Dreo committed Oct 12, 2020 201 >  Johann Dreo committed Sep 28, 2020 202 > $$\delta \in \left[0, \max_{x \in \mathcal{X}}(f(x))\right]$$  Johann Dreo committed Oct 12, 2020 203 >  Johann Dreo committed Sep 28, 2020 204 > $$X_i := \left\{\left\{ x_0^0, \dots, x_i^j, \dots, x_p^u | p\in[1,\infty[ \right\} | u \in [0,\infty[ \right\} \in \mathcal{X}$$  Johann Dreo committed Oct 12, 2020 205 >  Johann Dreo committed Sep 28, 2020 206 207 > with $p$ the sample size, $r$ the number of runs, $u$ the number of iterations, $t$ the number of calls to the objective > function.  Johann Dreo committed Oct 12, 2020 208 >  Johann Dreo committed Sep 28, 2020 209 > The number of calls to the objective function is a good estimator of time because it dominates all other times.  Johann Dreo committed Oct 12, 2020 210 >  Johann Dreo committed Sep 28, 2020 211 > The dual of the ERT-ECDF can be easily computed for quality (EQT-ECDF).  Johann Dreo committed Oct 12, 2020 212 >  Johann Dreo committed Sep 28, 2020 213 214 215 216 217 218 > 3D ERT/EQT-ECDF may be useful for terminal comparison. ## Other tools > Convergence curves: do not forget the golden rule and show distributions:  Johann Dreo committed Oct 12, 2020 219 >  Johann Dreo committed Sep 28, 2020 220 221 222 > - quantile boxes, > - violin plots, > - histograms.  Johann Dreo committed Jan 06, 2020 223 224 225  Algorithm Design  Johann Dreo committed Sep 28, 2020 226 227 228 229 230 ================ ## Neighborhood > Convergence definition(s):  Johann Dreo committed Oct 12, 2020 231 >  Johann Dreo committed Sep 28, 2020 232 233 > - strong, > - weak.  Johann Dreo committed Oct 12, 2020 234 >  Johann Dreo committed Sep 28, 2020 235 > Neighborhood: subset of solutions atteinable after an atomic transformation:  Johann Dreo committed Oct 12, 2020 236 >  Johann Dreo committed Sep 28, 2020 237 238 > - ergodicity, > - quasi-ergodicity.  Johann Dreo committed Oct 12, 2020 239 >  Johann Dreo committed Sep 28, 2020 240 241 242 243 244 245 > Relationship to metric space in the continuous domain. ## Structure of problem/algorithms > Structure of problems to exploit:  Johann Dreo committed Oct 12, 2020 246 >  Johann Dreo committed Sep 28, 2020 247 248 249 250 > - locality (basin of attraction), > - separability, > - gradient, > - funnels.  Johann Dreo committed Oct 12, 2020 251 >  Johann Dreo committed Sep 28, 2020 252 > Structure with which to capture those structures:  Johann Dreo committed Oct 12, 2020 253 >  Johann Dreo committed Sep 28, 2020 254 255 256 > - implicit, > - explicit, > - direct.  Johann Dreo committed Oct 12, 2020 257 >  Johann Dreo committed Sep 28, 2020 258 > Silver rule: choose the algorithmic template that adhere the most to the problem model.  Johann Dreo committed Oct 12, 2020 259 >  Johann Dreo committed Sep 28, 2020 260 261 262 263 264 265 266 > - taking constraints into account, > - iterate between problem/algorithm models. ## Grammar of algorithms > Parameter setting < tuning < control.  Johann Dreo committed Oct 12, 2020 267 >  Johann Dreo committed Sep 28, 2020 268 269 > Portfolio approaches. > Example: numeric low dimensions => Nelder-Mead Search is sufficient.  Johann Dreo committed Oct 12, 2020 270 >  Johann Dreo committed Sep 28, 2020 271 > Algorithm selection.  Johann Dreo committed Oct 12, 2020 272 >  Johann Dreo committed Sep 28, 2020 273 > Algorithms are templates in which operators are interchangeable.  Johann Dreo committed Oct 12, 2020 274 >  Johann Dreo committed Sep 28, 2020 275 276 > Most generic way of thinking about algorithms: grammar-based algorithm selection with parameters. > Example: modular CMA-ES.  Johann Dreo committed Jan 06, 2020 277 278  Parameter setting tools:  Johann Dreo committed Jan 06, 2020 279   Johann Dreo committed Jan 06, 2020 280 281 282 283 284 - ParamILS, - SPO, - i-race. Design tools:  Johann Dreo committed Jan 06, 2020 285 286  - ParadisEO,  Johann Dreo committed Sep 28, 2020 287 - IOH profiler  Johann Dreo committed Jan 06, 2020 288 289 290 291 292 - jMetal, - Jenetics, - ECJ, - DEAP, - HeuristicLab.  Johann Dreo committed Jan 06, 2020 293 294   Johann Dreo committed Sep 28, 2020 295 ## Landscape-aware algorithms  Johann Dreo committed Jan 06, 2020 296   Johann Dreo committed Sep 28, 2020 297 298 > Fitness landscapes: structure of problems as seen by an algorithm. > Features: tool that measure one aspect of a fitness landscape.  Johann Dreo committed Oct 12, 2020 299 >  Johann Dreo committed Sep 28, 2020 300 301 > We can observe landscapes, and learn which algorithm instance solves it better. > Examples: SAT, TSP, BB.  Johann Dreo committed Oct 12, 2020 302 >  Johann Dreo committed Sep 28, 2020 303 > Toward automated solver design.  Johann Dreo committed Jan 06, 2020 304