### +empirical evaluation

parent 74fd22e5
 ... ... @@ -36,27 +36,31 @@ $$\argmin_{x\in[0,1]^n} f(x),\quad\text{s.t.}\quad g(x)=0$$ [^argminmax]: In the metaheuristics literature, $\argmax$ is often assumed for evolutionary algorithms, whether $\argmin$ is often assumed for local search or simulated annealing. Complete VS approximation VS heuristics. Algorithmics ============ > Those algorithms are randomized and iteratives (hence stochastics) and manipulates a sample (synonym population) > of solutions (s. individual) to the problem, each one being associated with its quality (s. cost, fitness). > > > Thus, algorithms have a main loop, and articulate functions that manipulates the sample (called "operators"). > > > Main design problem: exploitation/exploration compromise (s. intensification/diversification). > Main design goal: raise the abstraction level. > Main design tools: learning (s. memory) + heuristics (s. bias). > > > Forget metaphors and use mathematical descriptions. > > > Seek a compromise between complexity, performances and explainability. > > > The is no better "method". > Difference between model and instance, for problem and algorithm. > No Free Lunch Theorem. > But there is a "better algorithm instances on a given problem instances set". > > > The better you understand it, the better the algorithm will be. ... ... @@ -70,13 +74,13 @@ Problem modelization ## Main models > Encoding: > > > - continuous (s. numeric), > - discrete metric (integers), > - combinatorial (graph, permutation). > > > Fitness: > > > - mono-objective, > - multi-modal, > - multi-objectives. ... ... @@ -93,30 +97,81 @@ Problem modelization Performance evaluation ====================== ## What is performance What is performance ------------------- > Main performances axis: > > > - time, > - quality, > - probability. > > > Additional performance axis: > > > - robustness, > - stability. > > > Golden rule: the output of a metaheuristic is a distribution, not a solution. ## Empirical evaluation ### Run time and target quality One may think that the obvious objective of an optimization algorithm is to find the location of the optimum. While this is true for deterministic and provable optimization, it is more complex in the case of metaheuristics. When dealing with search heuristics, the quality of the (sub)optimum found is also a performance metric, as one want to maximize the quality of the best solution found during the search. The two main performance metrics are thus the runtime that is necessary to find a solution of a given quality, and conversely, the quality of the solution which can be found in a given runtime. Of course, those two metrics tend to be contradictory: the more time is given to the search, the best the solution, and conversely, the better the solution one want, the longer the run should be. ### Measuring time and quality To measure the run time, a robust measure of time should be available. However, measuring run time on a modern computer is not necessarily robust, for instance because one cannot easily control the context switching managed by the scheduler, or because the CPU load can produce memory access contentions. However, in practical application, the call to the objective function largely dominates any other part of the algorithm. The number of calls to the objective function is thus a robust proxy measure of the run time. To measure the quality of solutions, the obvious choice is to rely on absolute values. However, this may vary a lot across problem instances and be read differently if one has a minimization or a maximization problem. It may thus be useful to use the error against a known bound of the problem. ### Probability of attainment For metaheuristics which are based on randomized process (which is the vast majority of them), measuring time and quality is not enough to estimate their performance. Indeed, if one run several time the same "algorithm", one will get different results, hence different performances. That's why it's more useful to consider that the "output" of an "algorithm" is not a single solution, but a random variable, a distribution of several solutions. If one define an "algorithm" with fuzzy concepts, like "simulated annealing" or "genetic algorithm", the reason is obvious, because the terms encompass a large variety of possible implementations. But one should keep in mind that even a given implementation has (a lot of) parameters, and that metaheuristics are usually (very) sensitive to their parameter setting. In order to have a good mental image of how to asses the performance of a solver, one should relize that we can only *estimates* the performances, considering *at least* run time, quality *and probability* of attaining a fixed target. ### Robustness and stability Empirical evaluation -------------------- > Proof-reality gap is huge, thus empirical performance evaluation is gold standard. > > > Empirical evaluation = scientific method. > > > Basic rules of thumb: > > > - randomized algorithms => repetition of runs, > - sensitivity to parameters => design of experiments, > - use statistical tools, ... ... @@ -126,43 +181,43 @@ Performance evaluation ## Useful statistical tools > Statistical tests: > > > - classical null hypothesis: test equality of distributions. > - beware of p-value. > > > How many runs? > > > - not always "as many as possible", > - maybe "as many as needed", > - generally: 15 (min for non-parametric tests) -- 20 (min for parametric-gaussian tests). > > > Use robust estimators: median instead of mean, Inter Quartile Range instead of standard deviation. ## Expected Empirical Cumulative Distribution Functions > On Run Time: ERT-ECDF. > > > $$ERTECDF(\{X_0,\dots,X_i,\dots,X_r\}, \delta, f, t) := \#\{x_t \in X_t | f(x_t^*)>=\delta \}$$ > > > $$\delta \in \left[0, \max_{x \in \mathcal{X}}(f(x))\right]$$ > > > $$X_i := \left\{\left\{ x_0^0, \dots, x_i^j, \dots, x_p^u | p\in[1,\infty[ \right\} | u \in [0,\infty[ \right\} \in \mathcal{X}$$ > > > with $p$ the sample size, $r$ the number of runs, $u$ the number of iterations, $t$ the number of calls to the objective > function. > > > The number of calls to the objective function is a good estimator of time because it dominates all other times. > > > The dual of the ERT-ECDF can be easily computed for quality (EQT-ECDF). > > > 3D ERT/EQT-ECDF may be useful for terminal comparison. ## Other tools > Convergence curves: do not forget the golden rule and show distributions: > > > - quantile boxes, > - violin plots, > - histograms. ... ... @@ -174,35 +229,35 @@ Algorithm Design ## Neighborhood > Convergence definition(s): > > > - strong, > - weak. > > > Neighborhood: subset of solutions atteinable after an atomic transformation: > > > - ergodicity, > - quasi-ergodicity. > > > Relationship to metric space in the continuous domain. ## Structure of problem/algorithms > Structure of problems to exploit: > > > - locality (basin of attraction), > - separability, > - gradient, > - funnels. > > > Structure with which to capture those structures: > > > - implicit, > - explicit, > - direct. > > > Silver rule: choose the algorithmic template that adhere the most to the problem model. > > > - taking constraints into account, > - iterate between problem/algorithm models. ... ... @@ -210,14 +265,14 @@ Algorithm Design ## Grammar of algorithms > Parameter setting < tuning < control. > > > Portfolio approaches. > Example: numeric low dimensions => Nelder-Mead Search is sufficient. > > > Algorithm selection. > > > Algorithms are templates in which operators are interchangeable. > > > Most generic way of thinking about algorithms: grammar-based algorithm selection with parameters. > Example: modular CMA-ES. ... ... @@ -242,9 +297,9 @@ Design tools: > Fitness landscapes: structure of problems as seen by an algorithm. > Features: tool that measure one aspect of a fitness landscape. > > > We can observe landscapes, and learn which algorithm instance solves it better. > Examples: SAT, TSP, BB. > > > Toward automated solver design.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment