Commit 093a285b authored by Grégoire GRZECZKOWICZ's avatar Grégoire GRZECZKOWICZ
Browse files

Merge branch 'master' of gitlab.ensta.fr:Dreo/sho

parents c31277d3 74fd22e5
Metaheuristics (IA-308)
=======================
Compile as PDF: `pandoc -f markdown --toc -o LESSON.pdf LESSON.md`.
Introduction
------------
Metaheuristics are mathematical optimization algorithms solving $argmin_{x \in X} f(x)$ (or argmax).
============
Synonyms:
> Metaheuristics are mathematical optimization algorithms solving $\argmin_{x \in X} f(x)$.
>
> Synonyms:
>
> - search heuristics,
> - evolutionary algorithms,
> - stochastic local search.
>
> The general approach is to only look at the solutions, by trial and error, without further information on its structure.
> Hence the problem is often labelled as "black-box".
>
> Link to NP-hardness/curse of dimensionality: easy to evaluate, hard to solve.
> Easy to evaluate = fast, but not as fast as the algorithm itself.
> Hard to solve, but not impossible.
- search heuristics,
- evolutionary algorithms,
- stochastic local search.
The general approach is to only look at the solutions, by trial and error, without further information on its structure.
Hence the problem is often labelled as "black-box".
Hard optimization
-----------------
Link to NP-hardness/curse of dimensionality: easy to evaluate, hard to solve.
Easy to evaluate = fast, but not as fast as the algorithm itself.
Hard to solve, but not impossible.
Metaheuristics are algorithms which aim at solving "hard" mathematical optimization problems.
A mathematical optimization problem is defined by a "solution" $x$
and an "objective function" $f:x\mapsto\reals$ :
$$\argmin_{x \in X} f(x)$$
Algorithmics
------------
Those algorithms are randomized and iteratives (hence stochastics) and manipulates a sample (synonym population)
of solutions (s. individual) to the problem, each one being associated with its quality (s. cost, fitness).
One can consider using $\argmax$ without loss of genericity[^argminmax].
Usually, the set $X$ is defined intentionally and constraints on $x$ are managed separately.
For example, using a function $g:x\mapsto \{0,1\}$:
$$\argmin_{x\in[0,1]^n} f(x),\quad\text{s.t.}\quad g(x)=0$$
Thus, algorithms have a main loop, and articulate functions that manipulates the sample (called "operators").
Main design problem: exploitation/exploration compromise (s. intensification/diversification).
Main design goal: raise the abstraction level.
Main design tools: learning (s. memory) + heuristics (s. bias).
Forget metaphors and use mathematical descriptions.
[^argminmax]: In the metaheuristics literature, $\argmax$ is often assumed for evolutionary algorithms, whether $\argmin$ is often assumed for local search or simulated annealing.
Seek a compromise between complexity, performances and explainability.
The is no better "method".
Difference between model and instance, for problem and algorithm.
No Free Lunch Theorem.
But there is a "better algorithm instances on a given problem instances set".
The better you understand it, the better the algorithm will be.
Algorithmics
============
> Those algorithms are randomized and iteratives (hence stochastics) and manipulates a sample (synonym population)
> of solutions (s. individual) to the problem, each one being associated with its quality (s. cost, fitness).
>
> Thus, algorithms have a main loop, and articulate functions that manipulates the sample (called "operators").
>
> Main design problem: exploitation/exploration compromise (s. intensification/diversification).
> Main design goal: raise the abstraction level.
> Main design tools: learning (s. memory) + heuristics (s. bias).
>
> Forget metaphors and use mathematical descriptions.
>
> Seek a compromise between complexity, performances and explainability.
>
> The is no better "method".
> Difference between model and instance, for problem and algorithm.
> No Free Lunch Theorem.
> But there is a "better algorithm instances on a given problem instances set".
>
> The better you understand it, the better the algorithm will be.
Problem modelization
--------------------
Way to assess the quality: fitness function.
Way to model a solution: encoding.
====================
### Main models
> Way to assess the quality: fitness function.
> Way to model a solution: encoding.
Encoding:
- continuous (s. numeric),
- discrete metric (integers),
- combinatorial (graph, permutation).
## Main models
Fitness:
> Encoding:
>
> - continuous (s. numeric),
> - discrete metric (integers),
> - combinatorial (graph, permutation).
>
> Fitness:
>
> - mono-objective,
> - multi-modal,
> - multi-objectives.
- mono-objective,
- multi-modal,
- multi-objectives.
## Constraints management
### Constraints management
Main constraints management tools for operators:
- penalization,
- reparation,
- generation.
> Main constraints management tools for operators:
> - penalization,
> - reparation,
> - generation.
Performance evaluation
----------------------
### What is performance
Main performances axis:
- time,
- quality,
- probability.
Additional performance axis:
- robustness,
- stability.
Golden rule: the output of a metaheuristic is a distribution, not a solution.
### Empirical evaluation
Proof-reality gap is huge, thus empirical performance evaluation is gold standard.
Empirical evaluation = scientific method.
Basic rules of thumb:
- randomized algorithms => repetition of runs,
- sensitivity to parameters => design of experiments,
- use statistical tools,
- design experiments to answer a single question,
- test one thing at a time.
### Useful statistical tools
Statistical tests:
- classical null hypothesis: test equality of distributions.
- beware of p-value.
How many runs?
- not always "as many as possible",
- maybe "as many as needed",
- generally: 15 (min for non-parametric tests) -- 20 (min for parametric-gaussian tests).
Use robust estimators: median instead of mean, Inter Quartile Range instead of standard deviation.
### Expected Empirical Cumulative Distribution Functions
On Run Time: ERT-ECDF.
$$ERTECDF(\{X_0,\dots,X_i,\dots,X_r\}, \delta, f, t) := \#\{x_t \in X_t | f(x_t^*)>=\delta \}$$
$$\delta \in \left[0, \max_{x \in \mathcal{X}}(f(x))\right]$$
$$X_i := \left\{\left\{ x_0^0, \dots, x_i^j, \dots, x_p^u | p\in[1,\infty[ \right\} | u \in [0,\infty[ \right\} \in \mathcal{X}$$
with $p$ the sample size, $r$ the number of runs, $u$ the number of iterations, $t$ the number of calls to the objective
function.
The number of calls to the objective function is a good estimator of time because it dominates all other times.
The dual of the ERT-ECDF can be easily computed for quality (EQT-ECDF).
3D ERT/EQT-ECDF may be useful for terminal comparison.
### Other tools
Convergence curves: do not forget the golden rule and show distributions:
- quantile boxes,
- violin plots,
- histograms.
======================
## What is performance
> Main performances axis:
>
> - time,
> - quality,
> - probability.
>
> Additional performance axis:
>
> - robustness,
> - stability.
>
> Golden rule: the output of a metaheuristic is a distribution, not a solution.
## Empirical evaluation
> Proof-reality gap is huge, thus empirical performance evaluation is gold standard.
>
> Empirical evaluation = scientific method.
>
> Basic rules of thumb:
>
> - randomized algorithms => repetition of runs,
> - sensitivity to parameters => design of experiments,
> - use statistical tools,
> - design experiments to answer a single question,
> - test one thing at a time.
## Useful statistical tools
> Statistical tests:
>
> - classical null hypothesis: test equality of distributions.
> - beware of p-value.
>
> How many runs?
>
> - not always "as many as possible",
> - maybe "as many as needed",
> - generally: 15 (min for non-parametric tests) -- 20 (min for parametric-gaussian tests).
>
> Use robust estimators: median instead of mean, Inter Quartile Range instead of standard deviation.
## Expected Empirical Cumulative Distribution Functions
> On Run Time: ERT-ECDF.
>
> $$ERTECDF(\{X_0,\dots,X_i,\dots,X_r\}, \delta, f, t) := \#\{x_t \in X_t | f(x_t^*)>=\delta \}$$
>
> $$\delta \in \left[0, \max_{x \in \mathcal{X}}(f(x))\right]$$
>
> $$X_i := \left\{\left\{ x_0^0, \dots, x_i^j, \dots, x_p^u | p\in[1,\infty[ \right\} | u \in [0,\infty[ \right\} \in \mathcal{X}$$
>
> with $p$ the sample size, $r$ the number of runs, $u$ the number of iterations, $t$ the number of calls to the objective
> function.
>
> The number of calls to the objective function is a good estimator of time because it dominates all other times.
>
> The dual of the ERT-ECDF can be easily computed for quality (EQT-ECDF).
>
> 3D ERT/EQT-ECDF may be useful for terminal comparison.
## Other tools
> Convergence curves: do not forget the golden rule and show distributions:
>
> - quantile boxes,
> - violin plots,
> - histograms.
Algorithm Design
----------------
### Neighborhood
Convergence definition(s):
- strong,
- weak.
Neighborhood: subset of solutions atteinable after an atomic transformation:
- ergodicity,
- quasi-ergodicity.
Relationship to metric space in the continuous domain.
### Structure of problem/algorithms
Structure of problems to exploit:
- locality (basin of attraction),
- separability,
- gradient,
- funnels.
Structure with which to capture those structures:
- implicit,
- explicit,
- direct.
Silver rule: choose the algorithmic template that adhere the most to the problem model.
- taking constraints into account,
- iterate between problem/algorithm models.
### Grammar of algorithms
Parameter setting < tuning < control.
Portfolio approaches.
Example: numeric low dimensions => Nelder-Mead Search is sufficient.
Algorithm selection.
Algorithms are templates in which operators are interchangeable.
Most generic way of thinking about algorithms: grammar-based algorithm selection with parameters.
Example: modular CMA-ES.
================
## Neighborhood
> Convergence definition(s):
>
> - strong,
> - weak.
>
> Neighborhood: subset of solutions atteinable after an atomic transformation:
>
> - ergodicity,
> - quasi-ergodicity.
>
> Relationship to metric space in the continuous domain.
## Structure of problem/algorithms
> Structure of problems to exploit:
>
> - locality (basin of attraction),
> - separability,
> - gradient,
> - funnels.
>
> Structure with which to capture those structures:
>
> - implicit,
> - explicit,
> - direct.
>
> Silver rule: choose the algorithmic template that adhere the most to the problem model.
>
> - taking constraints into account,
> - iterate between problem/algorithm models.
## Grammar of algorithms
> Parameter setting < tuning < control.
>
> Portfolio approaches.
> Example: numeric low dimensions => Nelder-Mead Search is sufficient.
>
> Algorithm selection.
>
> Algorithms are templates in which operators are interchangeable.
>
> Most generic way of thinking about algorithms: grammar-based algorithm selection with parameters.
> Example: modular CMA-ES.
Parameter setting tools:
......@@ -216,6 +230,7 @@ Parameter setting tools:
Design tools:
- ParadisEO,
- IOH profiler
- jMetal,
- Jenetics,
- ECJ,
......@@ -223,13 +238,13 @@ Design tools:
- HeuristicLab.
### Landscape-aware algorithms
Fitness landscapes: structure of problems as seen by an algorithm.
Features: tool that measure one aspect of a fitness landscape.
We can observe landscapes, and learn which algorithm instance solves it better.
Examples: SAT, TSP, BB.
## Landscape-aware algorithms
Toward automated solver design.
> Fitness landscapes: structure of problems as seen by an algorithm.
> Features: tool that measure one aspect of a fitness landscape.
>
> We can observe landscapes, and learn which algorithm instance solves it better.
> Examples: SAT, TSP, BB.
>
> Toward automated solver design.
\usepackage{amssymb}
\def\reals{\mathbb{R}}
% thin space, limits underneath in displays
\DeclareMathOperator*{\argmin}{arg\,min}
\DeclareMathOperator*{\argmax}{arg\,max}
metadata:
title: "Metaheuristics Lesson"
author: Johann Dreo
variables:
documentclass: memoir
include-in-header:
- docs/style_quote.tex
- docs/style_paragraph.tex
- docs/commands.tex
table-of-contents: true
toc-depth: 3
\usepackage{enumitem}
\setlist[itemize]{noitemsep, topsep=0pt}
\setlength{\parskip}{0.5em}
\usepackage{tcolorbox}
\definecolor{summary}{cmyk}{0.80, 0.13, 0.14, 0.04, 1.00}
\newtcolorbox{myquote}{
title=Key concepts,
colback=summary!5!white,
colframe=summary!75!black
}
% redefine the 'quote' environment to use this 'myquote' environment
\renewenvironment{quote}
{\begin{myquote}
\setlength{\parskip}{0.5em}
}
{\end{myquote}}
#!/bin/sh
pandoc -f markdown --defaults docs/config_pdf.yaml -o LESSON.pdf LESSON.md
......@@ -24,7 +24,7 @@ if __name__=="__main__":
help="Sensors' range (as a fraction of domain width, max is √2)")
can.add_argument("-w", "--domain-width", metavar="NB", default=30, type=int,
help="Domain width (a number of cells)")
help="Domain width (a number of cells). If you change this you will probably need to update `--target` accordingly")
can.add_argument("-i", "--iters", metavar="NB", default=100, type=int,
help="Maximum number of iterations")
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment