LESSON.md 9.13 KB
Newer Older
Johann Dreo's avatar
Johann Dreo committed
1
Introduction
Johann Dreo's avatar
Johann Dreo committed
2
============
Johann Dreo's avatar
Johann Dreo committed
3

Johann Dreo's avatar
Johann Dreo committed
4
5
6
7
8
9
10
11
12
13
14
15
16
17
> Metaheuristics are mathematical optimization algorithms solving $\argmin_{x \in X} f(x)$.
>
> Synonyms:
>
> - search heuristics,
> - evolutionary algorithms,
> - stochastic local search.
>
> The general approach is to only look at the solutions, by trial and error, without further information on its structure.
> Hence the problem is often labelled as "black-box".
>
> Link to NP-hardness/curse of dimensionality: easy to evaluate, hard to solve.
> Easy to evaluate = fast, but not as fast as the algorithm itself.
> Hard to solve, but not impossible.
Johann Dreo's avatar
Johann Dreo committed
18

Johann Dreo's avatar
Johann Dreo committed
19

Johann Dreo's avatar
Johann Dreo committed
20
21
Hard optimization
-----------------
Johann Dreo's avatar
Johann Dreo committed
22

Johann Dreo's avatar
Johann Dreo committed
23
Metaheuristics are algorithms which aim at solving "hard" mathematical optimization problems.
Johann Dreo's avatar
Johann Dreo committed
24

Johann Dreo's avatar
Johann Dreo committed
25
26
27
A mathematical optimization problem is defined by a "solution" $x$
and an "objective function" $f:x\mapsto\reals$ :
$$\argmin_{x \in X} f(x)$$
Johann Dreo's avatar
Johann Dreo committed
28

Johann Dreo's avatar
Johann Dreo committed
29
30
31
32
One can consider using $\argmax$ without loss of genericity[^argminmax].
Usually, the set $X$ is defined intentionally and constraints on $x$ are managed separately.
For example, using a function $g:x\mapsto \{0,1\}$:
$$\argmin_{x\in[0,1]^n} f(x),\quad\text{s.t.}\quad g(x)=0$$
Johann Dreo's avatar
Johann Dreo committed
33
34
35



Johann Dreo's avatar
Johann Dreo committed
36
[^argminmax]: In the metaheuristics literature, $\argmax$ is often assumed for evolutionary algorithms, whether $\argmin$ is often assumed for local search or simulated annealing.
Johann Dreo's avatar
Johann Dreo committed
37
38


Johann Dreo's avatar
Johann Dreo committed
39
40
41
42

Complete VS approximation VS heuristics.


Johann Dreo's avatar
Johann Dreo committed
43
44
45
46
47
Algorithmics
============

> Those algorithms are randomized and iteratives (hence stochastics) and manipulates a sample (synonym population)
> of solutions (s. individual) to the problem, each one being associated with its quality (s. cost, fitness).
Johann Dreo's avatar
Johann Dreo committed
48
>
Johann Dreo's avatar
Johann Dreo committed
49
> Thus, algorithms have a main loop, and articulate functions that manipulates the sample (called "operators").
Johann Dreo's avatar
Johann Dreo committed
50
>
Johann Dreo's avatar
Johann Dreo committed
51
52
53
> Main design problem: exploitation/exploration compromise (s. intensification/diversification).
> Main design goal: raise the abstraction level.
> Main design tools: learning (s. memory) + heuristics (s. bias).
Johann Dreo's avatar
Johann Dreo committed
54
>
Johann Dreo's avatar
Johann Dreo committed
55
> Forget metaphors and use mathematical descriptions.
Johann Dreo's avatar
Johann Dreo committed
56
>
Johann Dreo's avatar
Johann Dreo committed
57
> Seek a compromise between complexity, performances and explainability.
Johann Dreo's avatar
Johann Dreo committed
58
>
Johann Dreo's avatar
Johann Dreo committed
59
60
61
62
> The is no better "method".
> Difference between model and instance, for problem and algorithm.
> No Free Lunch Theorem.
> But there is a "better algorithm instances on a given problem instances set".
Johann Dreo's avatar
Johann Dreo committed
63
>
Johann Dreo's avatar
Johann Dreo committed
64
> The better you understand it, the better the algorithm will be.
Johann Dreo's avatar
Johann Dreo committed
65
66
67


Problem modelization
Johann Dreo's avatar
Johann Dreo committed
68
====================
Johann Dreo's avatar
Johann Dreo committed
69

Johann Dreo's avatar
Johann Dreo committed
70
71
> Way to assess the quality: fitness function.
> Way to model a solution: encoding.
Johann Dreo's avatar
Johann Dreo committed
72

Johann Dreo's avatar
Johann Dreo committed
73

Johann Dreo's avatar
Johann Dreo committed
74
## Main models
Johann Dreo's avatar
Johann Dreo committed
75

Johann Dreo's avatar
Johann Dreo committed
76
> Encoding:
Johann Dreo's avatar
Johann Dreo committed
77
>
Johann Dreo's avatar
Johann Dreo committed
78
79
80
> - continuous (s. numeric),
> - discrete metric (integers),
> - combinatorial (graph, permutation).
Johann Dreo's avatar
Johann Dreo committed
81
>
Johann Dreo's avatar
Johann Dreo committed
82
> Fitness:
Johann Dreo's avatar
Johann Dreo committed
83
>
Johann Dreo's avatar
Johann Dreo committed
84
85
> - mono-objective,
> - multi-modal,
Johann Dreo's avatar
Johann Dreo committed
86
> - multi-objectives (cf. Pareto optimality).
Johann Dreo's avatar
Johann Dreo committed
87

Johann Dreo's avatar
Johann Dreo committed
88

Johann Dreo's avatar
Johann Dreo committed
89
## Constraints management
Johann Dreo's avatar
Johann Dreo committed
90

Johann Dreo's avatar
Johann Dreo committed
91
92
93
94
> Main constraints management tools for operators:
> - penalization,
> - reparation,
> - generation.
Johann Dreo's avatar
Johann Dreo committed
95
96
97


Performance evaluation
Johann Dreo's avatar
Johann Dreo committed
98
99
======================

Johann Dreo's avatar
Johann Dreo committed
100
101
What is performance
-------------------
Johann Dreo's avatar
Johann Dreo committed
102
103

> Main performances axis:
Johann Dreo's avatar
Johann Dreo committed
104
>
Johann Dreo's avatar
Johann Dreo committed
105
106
107
> - time,
> - quality,
> - probability.
Johann Dreo's avatar
Johann Dreo committed
108
>
Johann Dreo's avatar
Johann Dreo committed
109
> Additional performance axis:
Johann Dreo's avatar
Johann Dreo committed
110
>
Johann Dreo's avatar
Johann Dreo committed
111
112
> - robustness (cf. "external" problem structure),
> - stability (cf. "internal" randomisation).
Johann Dreo's avatar
Johann Dreo committed
113
>
Johann Dreo's avatar
Johann Dreo committed
114
115
116
> Golden rule: the output of a metaheuristic is a distribution, not a solution.


Johann Dreo's avatar
Johann Dreo committed
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
### Run time and target quality

One may think that the obvious objective of an optimization algorithm is to find the location of the optimum.
While this is true for deterministic and provable optimization, it is more complex in the case of metaheuristics.
When dealing with search heuristics, the quality of the (sub)optimum found is also a performance metric,
as one want to maximize the quality of the best solution found during the search.

The two main performance metrics are thus the runtime that is necessary to find a solution of a given quality,
and conversely, the quality of the solution which can be found in a given runtime.
Of course, those two metrics tend to be contradictory: the more time is given to the search, the best the solution,
and conversely, the better the solution one want, the longer the run should be.


### Measuring time and quality

To measure the run time, a robust measure of time should be available.
However, measuring run time on a modern computer is not necessarily robust, for instance because one cannot easily
control the context switching managed by the scheduler, or because the CPU load can produce memory access contentions.
However, in practical application, the call to the objective function largely dominates any other part of the algorithm.
The number of calls to the objective function is thus a robust proxy measure of the run time.

To measure the quality of solutions, the obvious choice is to rely on absolute values.
However, this may vary a lot across problem instances and be read differently if one has a minimization or a
maximization problem.
It may thus be useful to use the error against a known bound of the problem.


### Probability of attainment

For metaheuristics which are based on randomized process (which is the vast majority of them), measuring time and
quality is not enough to estimate their performance.
Indeed, if one run several time the same "algorithm", one will get different results, hence different performances.

That's why it's more useful to consider that the "output" of an "algorithm" is not a single solution, but a random
variable, a distribution of several solutions.
If one define an "algorithm" with fuzzy concepts, like "simulated annealing" or "genetic algorithm", the reason is
obvious, because the terms encompass a large variety of possible implementations.
But one should keep in mind that even a given implementation has (a lot of) parameters, and that metaheuristics are
usually (very) sensitive to their parameter setting.

In order to have a good mental image of how to asses the performance of a solver,
one should relize that we can only *estimates* the performances, considering *at least* run time, quality
*and probability* of attaining a fixed target.


### Robustness and stability


Empirical evaluation
--------------------
Johann Dreo's avatar
Johann Dreo committed
167
168

> Proof-reality gap is huge, thus empirical performance evaluation is gold standard.
Johann Dreo's avatar
Johann Dreo committed
169
>
Johann Dreo's avatar
Johann Dreo committed
170
> Empirical evaluation = scientific method.
Johann Dreo's avatar
Johann Dreo committed
171
>
Johann Dreo's avatar
Johann Dreo committed
172
> Basic rules of thumb:
Johann Dreo's avatar
Johann Dreo committed
173
>
Johann Dreo's avatar
Johann Dreo committed
174
175
176
177
178
179
180
181
182
> - randomized algorithms => repetition of runs,
> - sensitivity to parameters => design of experiments,
> - use statistical tools,
> - design experiments to answer a single question,
> - test one thing at a time.

## Useful statistical tools

> Statistical tests:
Johann Dreo's avatar
Johann Dreo committed
183
>
Johann Dreo's avatar
Johann Dreo committed
184
185
> - classical null hypothesis: test equality of distributions.
> - beware of p-value.
Johann Dreo's avatar
Johann Dreo committed
186
>
Johann Dreo's avatar
Johann Dreo committed
187
> How many runs?
Johann Dreo's avatar
Johann Dreo committed
188
>
Johann Dreo's avatar
Johann Dreo committed
189
190
191
> - not always "as many as possible",
> - maybe "as many as needed",
> - generally: 15 (min for non-parametric tests) -- 20 (min for parametric-gaussian tests).
Johann Dreo's avatar
Johann Dreo committed
192
>
Johann Dreo's avatar
Johann Dreo committed
193
194
195
196
197
198
> Use robust estimators: median instead of mean, Inter Quartile Range instead of standard deviation.


## Expected Empirical Cumulative Distribution Functions

> On Run Time: ERT-ECDF.
Johann Dreo's avatar
Johann Dreo committed
199
>
Johann Dreo's avatar
Johann Dreo committed
200
> $$ERTECDF(\{X_0,\dots,X_i,\dots,X_r\}, \delta, f, t) := \#\{x_t \in X_t | f(x_t^*)>=\delta \}$$
Johann Dreo's avatar
Johann Dreo committed
201
>
Johann Dreo's avatar
Johann Dreo committed
202
> $$\delta \in \left[0, \max_{x \in \mathcal{X}}(f(x))\right]$$
Johann Dreo's avatar
Johann Dreo committed
203
>
Johann Dreo's avatar
Johann Dreo committed
204
> $$X_i := \left\{\left\{ x_0^0, \dots, x_i^j, \dots, x_p^u | p\in[1,\infty[ \right\} | u \in [0,\infty[ \right\} \in \mathcal{X}$$
Johann Dreo's avatar
Johann Dreo committed
205
>
Johann Dreo's avatar
Johann Dreo committed
206
207
> with $p$ the sample size, $r$ the number of runs, $u$ the number of iterations, $t$ the number of calls to the objective
> function.
Johann Dreo's avatar
Johann Dreo committed
208
>
Johann Dreo's avatar
Johann Dreo committed
209
> The number of calls to the objective function is a good estimator of time because it dominates all other times.
Johann Dreo's avatar
Johann Dreo committed
210
>
Johann Dreo's avatar
Johann Dreo committed
211
> The dual of the ERT-ECDF can be easily computed for quality (EQT-ECDF).
Johann Dreo's avatar
Johann Dreo committed
212
>
Johann Dreo's avatar
Johann Dreo committed
213
214
215
216
217
218
> 3D ERT/EQT-ECDF may be useful for terminal comparison.


## Other tools

> Convergence curves: do not forget the golden rule and show distributions:
Johann Dreo's avatar
Johann Dreo committed
219
>
Johann Dreo's avatar
Johann Dreo committed
220
221
222
> - quantile boxes,
> - violin plots,
> - histograms.
Johann Dreo's avatar
Johann Dreo committed
223
224
225


Algorithm Design
Johann Dreo's avatar
Johann Dreo committed
226
227
228
229
230
================

## Neighborhood

> Convergence definition(s):
Johann Dreo's avatar
Johann Dreo committed
231
>
Johann Dreo's avatar
Johann Dreo committed
232
233
> - strong,
> - weak.
Johann Dreo's avatar
Johann Dreo committed
234
>
Johann Dreo's avatar
Johann Dreo committed
235
> Neighborhood: subset of solutions atteinable after an atomic transformation:
Johann Dreo's avatar
Johann Dreo committed
236
>
Johann Dreo's avatar
Johann Dreo committed
237
238
> - ergodicity,
> - quasi-ergodicity.
Johann Dreo's avatar
Johann Dreo committed
239
>
Johann Dreo's avatar
Johann Dreo committed
240
241
242
243
244
245
> Relationship to metric space in the continuous domain.


## Structure of problem/algorithms

> Structure of problems to exploit:
Johann Dreo's avatar
Johann Dreo committed
246
>
Johann Dreo's avatar
Johann Dreo committed
247
248
249
250
> - locality (basin of attraction),
> - separability,
> - gradient,
> - funnels.
Johann Dreo's avatar
Johann Dreo committed
251
>
Johann Dreo's avatar
Johann Dreo committed
252
> Structure with which to capture those structures:
Johann Dreo's avatar
Johann Dreo committed
253
>
Johann Dreo's avatar
Johann Dreo committed
254
255
256
> - implicit,
> - explicit,
> - direct.
Johann Dreo's avatar
Johann Dreo committed
257
>
Johann Dreo's avatar
Johann Dreo committed
258
> Silver rule: choose the algorithmic template that adhere the most to the problem model.
Johann Dreo's avatar
Johann Dreo committed
259
>
Johann Dreo's avatar
Johann Dreo committed
260
261
262
263
264
265
266
> - taking constraints into account,
> - iterate between problem/algorithm models.


## Grammar of algorithms

> Parameter setting < tuning < control.
Johann Dreo's avatar
Johann Dreo committed
267
>
Johann Dreo's avatar
Johann Dreo committed
268
269
> Portfolio approaches.
> Example: numeric low dimensions => Nelder-Mead Search is sufficient.
Johann Dreo's avatar
Johann Dreo committed
270
>
Johann Dreo's avatar
Johann Dreo committed
271
> Algorithm selection.
Johann Dreo's avatar
Johann Dreo committed
272
>
Johann Dreo's avatar
Johann Dreo committed
273
> Algorithms are templates in which operators are interchangeable.
Johann Dreo's avatar
Johann Dreo committed
274
>
Johann Dreo's avatar
Johann Dreo committed
275
276
> Most generic way of thinking about algorithms: grammar-based algorithm selection with parameters.
> Example: modular CMA-ES.
Johann Dreo's avatar
Johann Dreo committed
277
278

Parameter setting tools:
Johann Dreo's avatar
Johann Dreo committed
279

Johann Dreo's avatar
Johann Dreo committed
280
281
282
283
284
- ParamILS,
- SPO,
- i-race.

Design tools:
Johann Dreo's avatar
Johann Dreo committed
285
286

- ParadisEO,
Johann Dreo's avatar
Johann Dreo committed
287
- IOH profiler
Johann Dreo's avatar
Johann Dreo committed
288
289
290
291
292
- jMetal,
- Jenetics,
- ECJ,
- DEAP,
- HeuristicLab.
Johann Dreo's avatar
Johann Dreo committed
293
294


Johann Dreo's avatar
Johann Dreo committed
295
## Landscape-aware algorithms
Johann Dreo's avatar
Johann Dreo committed
296

Johann Dreo's avatar
Johann Dreo committed
297
298
> Fitness landscapes: structure of problems as seen by an algorithm.
> Features: tool that measure one aspect of a fitness landscape.
Johann Dreo's avatar
Johann Dreo committed
299
>
Johann Dreo's avatar
Johann Dreo committed
300
301
> We can observe landscapes, and learn which algorithm instance solves it better.
> Examples: SAT, TSP, BB.
Johann Dreo's avatar
Johann Dreo committed
302
>
Johann Dreo's avatar
Johann Dreo committed
303
> Toward automated solver design.
Johann Dreo's avatar
Johann Dreo committed
304