User talk:Botev

From Wikipedia, the free encyclopedia

Welcome to the Wikipedia Sandbox! This page allows you to carry out experiments. To edit, click here or edit this page above (or the views section for obscure browsers), make your changes and click the Save page button when finished. Content will not stay permanently; this page is automatically cleaned every 12 hours, although it tends to be overwritten by other testing users much faster than that.

Please do not place copyrighted, offensive, or libelous content in the sandbox(es). If you have any questions regarding Wikipedia, please see Wikipedia:Questions. Thanks!

This page is a virtual sandbox on Wikipedia. For uses of sandboxes, see the article sandbox.

You may also use the templates X1, X2, X3, X4, X5, X6, X7, X8, and X9 for experimental purposes.

Shortcuts: WP:SB WP:SAND WP:BOX WP:SANDBOX WP:TEST

More info:
Introduction to Wikipedia
About the Sandbox

Editing tutorial

af:Wikipedia:Sandput ar:ويكيبيديا:ساحة اللعب als:Wikipedia:Spielwiese ang:Wikipedia:Sandbox bg:Уикипедия:Пясъчник bn:Wikipedia:Sandbox ca:Viquipèdia:Pàgina de proves cs:Wikipedie:Pískoviště cy:Wicipedia:Bocs_swnd da:Wikipedia:Sandkassen de:Wikipedia:Spielwiese el:Βικιπαίδεια:Αμμοδοχείο eo:Vikipedio:Testejo es:Wikipedia:Zona de pruebas eu:Wikipedia:Proba orria fi:Wikipedia:Hiekkalaatikko fr:Wikipédia:Bac à sable fy:Wikipedy:Kladblok ga:Vicipid:Clár dubh gd:Tigh-cluiche he:ויקיפדיה:ארגז חול hr:Wikipedija:Stranica za vježbanje hu:Wikipédia:Homokozó ia:Wikipedia:Sabliera id:Wikipedia:Bak pasir it:Wikipedia:Sandbox ja:Wikipedia:サンドボックス ko:위키백과:모래상자 la:Wikipedia:Pagina experimentalis li:Wikipedia:Zandjbak mg:Wikipedia:Fasika toerana mi:Wikipedia:Papa tākaro ml:Wikipedia:Sandbox ms:Sandbox nds:Wikipedia:Speelwisch nl:Wikipedia:Zandbak nn:Wikipedia:Sankasse no:Wikipedia:Sandkasse pl:Wikipedia:Brudnopis pt:Wikipedia:Página de testes ro:Wikipedia:Cutia cu nisip ru:Википедия:Песочница si:Wikipedia:Peskovnik simple:Wikipedia:Sandbox sr:Википедија:Песак sq:Wikipedia:Livadhi sh:Wikipedia:Eksperimenti sv:Wikipedia:Sandlådan th:วิกิพีเดีย:กระบะทราย tr:Vikipedi:Deneme tahtası vi:Wikipedia:Chỗ thử zh:Wikipedia:沙盒

The Cross-Entropy method is a Monte Carlo approach to combinatorial and continuous multi-extremal optimization and rare event simulation. The cross-entropy (CE) method has been successfully applied to static and noisy combinatorial optimization problems such as the traveling salesman problem, the quadratic assignment problem, DNA sequence alignment, the max-cut problem and the buffer allocation problem. The CE method has also been applied to global optimization problems where the objective function is continuous and has many local extrema. Other popular applications of the method include rare event simulation, where very small probabilities need to be accurately estimated -- for example in network reliability analysis, queueing models, or performance analysis of telecommunication systems.

In a nutshell the CE method consists of two phases:

Generate a random data sample (trajectories, vectors, etc.) according to a specified mechanism.
Update the parameters of the random mechanism based on the data to produce a "better" sample in the next iteration.

The generality of the method lies in the fact that the updating rules are often simple and explicit (and hence fast), and are "optimal" in a well-defined mathematical sense.

1 Rare Event Estimation
- 1.1 Algorithm
2 Generic CE Algorithm
3 Example (Continuous Optimization)
- 3.1 Pseudo-code
4 Related methods
5 See also
6 References
7 External links
8 See also
9 References
10 External links

[edit] Rare Event Estimation

Consider the weighted graph of the Figure below

with random weights $X_1,\ldots,X_5$ . Suppose the weights are independent and exponentially distributed random variables with means $u_1,\ldots,u_5$ , respectively. Denote the probability density function (pdf) of $\vec X$ by $f(\cdot; \vec u)$ ; thus,

$f(\vec x; \vec u) = \exp\left(- \sum_{j=1}^5 \frac{x_j}{u_j} \right) \prod_{j=1}^5 \frac{1}{u_j} \;.$

Image:Fig.eps

Let $S(\vec X)$ be the total length of the shortest path from node A to node B. Of great interest is the estimation of the probability: $\ell = {\mathbb{P}}(S(\vec X) \geq \gamma) = {\mathbb{E}}I_{\{ S(\vec X)\geq \gamma\}}\;,$ that is, the probability that the length of the shortest path $S(\vec X)$ will exceed some fixed $γ$ . A straightforward way to estimate $\ell$ is to use Crude Monte Carlo (CMC) simulation. That is, we draw a random sample $\vec X_1,\ldots,\vec X_N$ from the distribution of $\vec X$ and use

$\frac{1}{N}\sum^N_{i=1} I_{\{ S(\vec X_i)\geq \gamma\}}$ as the unbiased estimator of $\ell$ . However, for large $γ$ the probability $\ell$ will be very small and CMC requires a very large simulation effort. Namely, $N$ needs to be very large in order to estimate $\ell$ accurately --- that is, to obtain a small relative error of 0.01, say. A better way to perform the simulation is to use importance sampling (IS). That is, let $g$ be another probability density such that $g(\vec x) = 0 \Rightarrow I_{\{S(\vec x) \geq \gamma\}} f(\vec x) = 0$ . Using the density $g$ we can represent $\ell$ as

$\ell=\int I_{\{ S(\vec x)\geq \gamma\}} \frac{f({\vec x})}{g({\vec x})} \, g(\vec x) \, { d}{\vec x} = \mathbb{E}_g I_{\{ S(\vec X)\geq \gamma\}}\, \frac{f(\vec X)}{g(\vec X)}\;,$ where the subscript $g$ means that the expectation is taken with respect to $g$ , which is called the Importance Sampling (IS) density. An unbiased estimator of $\ell$ is

$\widehat{\ell} = \frac{1}{N}\sum^N_{i=1} I_{\{ S(\vec X_i)\geq \gamma\}}\, W({\vec X}_i) ,$ where $\widehat{\ell}$ is called the importance sampling (IS) or the likelihood ratio (LR) estimator, $W({\vec x})= f({\vec x}) /g({\vec x})$ is called the likelihood ratio, and ${\vec X}_1,\ldots,{\vec X}_N$ is a random sample from $g$ , that is, $\vec X_1,\ldots,\vec X_n$ are i.i.d. random vectors with density $g$ . Suppose that $X_1,\ldots,X_5$ are independent and exponentially distributed with means $v_1,\ldots,v_5$ . Then

$W(\vec x; \vec u,\vec v) = \frac{f(\vec x; \vec u)}{f(\vec x; \vec v)} = \exp\left(- \sum_{j=1}^5 x_j \left( \frac{1}{u_j} - \frac{1}{v_j}\right) \right) \prod_{j=1}^5 \frac{v_j}{u_j} \;.$ The main problem now is how to select the $\vec v$ which gives the most accurate estimate of $\ell$ for a given simulation effort. The CE method for rare-event simulation provides a fast and simple way to determine/estimate an optimal in some sense value for the parameter $\vec v$ . To this end, without going into the details, a quite general CE algorithm for rare-event estimation is outlined next.

[edit] Algorithm

1. Define $\widehat{\vec v}_0 = \vec u$ . Set $t = 1$ (iteration counter).

2. Generate a random sample $\vec X_1, \ldots ,\vec X_N$ according to the pdf $f(\cdot; \widehat{\vec v}_{t-1}).$ Calculate the performances $S(\vec X_i)$ for all $i$ , and order them from smallest to biggest, $S_{(1)}\leq \ldots \leq S_{(N)}$ . Let $\widehat{\gamma}_t$ be the sample $(1 - ρ)$ -quantile of performances: $\widehat{\gamma}_t = S_{{[(1-\rho)N}]}$ , provided this is less than $γ$ . Otherwise, put $\widehat{\gamma}_t = \gamma$ .

3. Use the same sample to calculate, for $j=1,\ldots,n (= 5)$ , $\widehat{v}_{t,j} = \frac{\sum_{i=1}^N I_{\{S(\vec X_i)\geq\widehat{\gamma}_t\}} W(\vec X_i; \vec u, \widehat{\vec v}_{t-1}) \, X_{ij}} {\sum_{i=1}^N I_{\{S(\vec X_i)\geq\widehat{\gamma}_t\}} W(\vec X_i; \vec u, \widehat{\vec v}_{t-1})}\;.$

4. If $\widehat{\gamma}_t = \gamma$ then proceed to Step 5; otherwise set $t = t + 1$ and reiterate from Step 2.

5. Let $T$ be the final iteration. Generate a sample $\vec X_1, \ldots ,\vec X_{N_1}$ according to the pdf $f(\cdot; \widehat{\vec v}_T)$ and estimate $\ell$ via the IS estimator $\widehat{\ell} = \frac{1}{N_1} \sum_{i=1}^{N_1} I_{\{ S(\vec X_i)\geq \gamma\}} W(\vec X_i; \vec u, \widehat{\vec v}_T)\;.$

Note that in Steps 2--4 the optimal IS parameter is estimated. In the final step (Step 5) this parameter is used to estimate the probability of interest. Note also that the algorithm assumes availability of the parameters $ρ$ (typically between 0.01 and 0.1), $N$ and $N 1$ in advance. As an example, consider the case where the parameter vector $\vec u$ is given by $(0.25,0.4,0.1,0.3,0.2)$ and the probability that the minimum path is greater than $γ = 2$ has to be estimated . Crude Monte Carlo with $107$ samples gives an estimate $1.65 \cdot 10^{-5}$ with an estimated relative error, (RE), (that is, $\textstyle \sqrt{{\textrm{Var}}({\widehat{\ell}})}/\ell$ ) of $0.165$ . With $108$ samples the estimate $1.30 \cdot 10^{-5}$ with RE 0.03 is obtained. In contrast the CE algorithm with $N = 1000$ provides the estimated optimal parameter vector of $\widehat{\vec v}_5$ = $(1.692,1.901,0.129,$ $0.712,0.564)$ . The optimal $\widehat{\vec v}_5$ is employed in the final importance sampling step of the Algorithm with $N 1 = 10 5$ and gives an estimate of $1.34\cdot 10^{-5}$ with an estimated RE of $0.03$ . This simple example illustrates the advantage of using the CE method as compared to the naive CMC method. Typically the simulation effort (as measured by computing time) reduces dramatically and the final estimate of the quantity of interest is more accurate.

[edit] Generic CE Algorithm

A generic CE algorithm can be formulated for estimation problems of the form $\ell = \mathbb{E}_fH(\mathbf{X}) = \mathbb{E}_f \phi(S(\mathbf{X});\gamma) = \int \phi(S(\mathbf{x});\gamma)f(\mathbf{x})\mu(\textrm{d}\mathbf{x})$ . Let $f(\mathbf{x}) = f(\mathbf{x};\cdot)$ be a member of a parametric family of distributions, and $f(\mathbf{x};\mathbf{u})$ be a reference distribution from the same parametric family. (In some contexts (such as in rare-event estimation) there is a natural candidate for the reference distribution; in others, the reference distribution may correspond to an arbitrary initial distribution.) To proceed, estimate $\ell$ via importance sampling with $\hat{\ell} = \frac{1}{N} \sum_{i=1}^N H(\mathbf{X}_i) \frac{f(\mathbf{x})}{g(\mathbf{x})}$ , where $\mathbf{X}_1,\dots,\mathbf{X}_N$ is a random sample from $g\,$ . The following generic algorithm iteratively approaches the member of the given parametric family that is closest (in the Kullback-Leibler sense) to the zero variance importance sampling density (the $g\,$ that corresponds to a zero variance $\hat{\ell}$ ).

1. Choose initial parameter vector ; set t = 1.
2. Generate a random sample  from 
3. Solve for , where
   
4. If convergence is reached then stop; otherwise, increase t by 1 and reiterate from step 2.

In some cases, the solution to $\mathbf{v}^{(t)} = \mathop{\textrm{argmax}}_{\mathbf{v}} \frac{1}{N} \sum_{i=1}^N H(\mathbf{X}_i) \frac{f(\mathbf{X}_i;\mathbf{u})}{f(\mathbf{X}_i;\mathbf{v}^{(t-1)})} \log f(\mathbf{X}_i;\mathbf{v})$ can be found analytically. Situations in which this occurs are

When $f\,$ belongs to the natural exponential family (single parameter member of the exponential family)
When $f\,$ is discrete with finite support
If $H(\mathbf{X}) = \mathrm{I}_{\{\mathbf{x}\in A\}}$ and $f(\mathbf{X}_i;\mathbf{u}) = f(\mathbf{X}_i;\mathbf{v}^{(t-1)})$ , then $\mathbf{v}^{(t)}$ corresponds to the Maximum Likelihood Estimator based on those $\mathbf{X}_k \in A$

[edit] Example (Continuous Optimization)

Suppose the problem of interest is to locate $x^* = {\mathop{\textrm{argmax}}}_{x\in\R}\,S(x)$ , where $S(x) = \textrm{e}^{-(x-2)^2} + 0.8\,\textrm{e}^{-(x+2)^2}$ . Instead of proceeding directly, consider the estimation problem $\mathbb{P}_{\boldsymbol{\theta}}(S(X)\geq\gamma)$ for a given $\gamma\,$ , and parametric family $\left\{f(\cdot;\boldsymbol{\theta})\right\}$ (the so-called associated stochastic problem). In this example, the parametric family is taken to be a single Gaussian distribution, parameterized by its mean $\mu_t\,$ and variance $\sigma_t^2$ (so $\boldsymbol{\theta} = (\mu,\sigma^2)$ here). Hence, for a given $\gamma\,$ , the goal is to find $\boldsymbol{\theta}$ so that $D_{\mathrm{KL}}(\textrm{I}_{\{S(x)\geq\gamma\}}\|f_{\boldsymbol{\theta}})$ is minimized. Further, instead of considering this KL divergence minimization problem directly, the sample version (stochastic counterpart) is considered. It turns out that parameters that minimize the stochastic counterpart for this choice of target distribution and parametric family are the mean and variance of those samples which have objective function value $\geq\gamma$ . This yields the following randomized algorithm for this problem.

[edit] Pseudo-code

1. mu:=-6; sigma2:=100; t:=0; maxits=100;    // Initialize parameters
2. N:=100; Ne:=10;                           // 
3. while t < maxits and sigma2 > epsilon     // While not converged and maxits not exceeded
4.  X = SampleGaussian(mu,sigma2,N);         // Obtain N samples from current sampling distribution
5.  S = exp(-(X-2)^2) + 0.8 exp(-(X+2)^2);   // Evaluate objective function at sampled points
6.  X = sort(X,S);                           // Sort X by objective function values (in descending order)
7.  mu = mean(X(1:Ne)); sigma2=var(X(1:Ne)); // Update parameters of sampling distribution
8.  t = t+1;                                 // Increment iteration counter
9. return mu                                 // Return mean of final sampling distribution as solution

[edit] Related methods

[edit] See also

[edit] References

De Boer, P-T., Kroese, D.P, Mannor, S. and Rubinstein, R.Y. (2005). A Tutorial on the Cross-Entropy Method. Annals of Operations Research, 134 (1), 19--67.
Rubinstein, R.Y., Kroese, D.P. (2004). The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning, Springer-Verlag, New York.

[edit] External links

Homepage for the CE method

Category:Heuristics Category:Optimization algorithms Category:Monte Carlo methods Category:Machine learning

[edit] See also

[edit] References

De Boer, P-T., Kroese, D.P, Mannor, S. and Rubinstein, R.Y. (2005). A Tutorial on the Cross-Entropy Method. Annals of Operations Research, 134 (1), 19--67.
Rubinstein, R.Y., Kroese, D.P. (2004). The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning, Springer-Verlag, New York.