tag:blogger.com,1999:blog-81139782472724964352019-04-06T17:58:02.675+02:00Vita Genetic ProgrammingA scalable, high performance genetic programming / algorithms frameworkUnknownnoreply@blogger.comBlogger9125tag:blogger.com,1999:blog-8113978247272496435.post-89205043284143247032019-04-06T15:27:00.002+02:002019-04-06T15:27:41.265+02:00Completely revised Titanic classification example<div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-m7i5LDWwQh4/XKipIjEombI/AAAAAAAAALs/DQbzvHABNS4PJyy90_CdIdVWHg2wIqyRgCLcBGAs/s1600/titanic.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="516" data-original-width="1315" height="156" src="https://2.bp.blogspot.com/-m7i5LDWwQh4/XKipIjEombI/AAAAAAAAALs/DQbzvHABNS4PJyy90_CdIdVWHg2wIqyRgCLcBGAs/s400/titanic.jpg" width="400" /></a></div><b>Titanic: Machine Learning from Disaster</b> is a well known <i>getting started prediction competition</i> promoted by <a href="https://www.kaggle.com/">Kaggle</a>.<br /><br />The <a href="https://github.com/morinim/vita/wiki/titanic_tutorial">revised example</a> is a step by step guide illustrating the basic elements of classification with genetic programming and the Vita library.<br /><a name='more'></a><h2>Competition Description</h2>The sinking of the RMS Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. This sensational tragedy shocked the international community and led to better safety regulations for ships.<br /><br />One of the reasons that the shipwreck led to such loss of life was that there were not enough lifeboats for the passengers and crew. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others, such as women, children, and the upper-class.<br /><br />In this challenge, we ask you to complete the analysis of what sorts of people were likely to survive. In particular, we ask you to apply the tools of machine learning to predict which passengers survived the tragedy.<br /><br />References<br /><br /><ul><li><a href="https://www.kaggle.com/c/titanic/">Kaggle competition</a></li><li><a href="https://github.com/morinim/vita/">Vita library</a></li><li><a href="https://github.com/morinim/vita/wiki/titanic_tutorial">Tutorial</a></li></ul>MMhttp://www.blogger.com/profile/03664940271122731908noreply@blogger.com0tag:blogger.com,1999:blog-8113978247272496435.post-86640134545177608692019-01-19T17:02:00.001+01:002019-01-19T17:02:59.881+01:00Slides from C++ Day (Pavia, Italy - 2018)Slides from the C++Day 2018 (Pavia - Italy) talk are available here: <a href="https://github.com/morinim/documents/tree/master/lessons_learned_developing_evolutionary_algorithms_in_cpp"><b class="final-path">Lessons Learned Developing Evolutionary Algorithms in C++</b></a><span class="final-path">.</span><br /><br /><hr /><div class="separator" style="clear: both; text-align: center;"><img border="0" data-original-height="157" data-original-width="320" src="https://4.bp.blogspot.com/-m68Ov9EqSws/W96xxc_PCkI/AAAAAAAAAKM/30i5ojcP0e0kSjShfzC6zZdP-5FumKU7gCLcBGAs/s320/cppday18.jpg" /></div><br />The <b>C++ Day</b> is a fall event dedicated to C++ development where professionals, students and companies meet and share experience. The C++ Day has been co-organized with the University of Pavia, in particular with the <a href="http://fisica.unipv.it/EN_index.php">Physics Department</a>.<br /><br />Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-8113978247272496435.post-86199541188782208052018-08-29T10:25:00.000+02:002018-08-29T10:25:44.478+02:00Nonogram puzzle<img alt="An example of a nonogram puzzle" border="0" src="https://github.com/morinim/vita/wiki/img/nonogram.png" style="width:100%" /><br /><br />Nonograms, also known as Japanese puzzles, are logic puzzles that are sold by many news paper vendors. The challenge is to fill a grid with black and white pixels in such a way that a given description for each row and column, indicating the lengths of consecutive segments of black pixels, is adhered to. Although the Nonograms in puzzle books can usually be solved by hand, the general problem of solving Nonograms is NP-hard.<br /><br /><a href="https://github.com/morinim/vita/wiki/nonogram_tutorial">Read more »</a>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-8113978247272496435.post-66335218131014882442018-01-03T16:51:00.003+01:002018-01-04T16:44:48.417+01:00Solving a Tetris-puzzle with GAs<img alt="The puzzle in the picture comes from a Stackoverflow's question (https://stackoverflow.com/q/47858717/3235496)" border="0" src="https://github.com/morinim/vita/wiki/img/polyomino.jpg" width="240" /><br /><br />The general challenge posed is to tile a given region with a given set of <a href="https://en.wikipedia.org/wiki/Polyomino">polyominoes</a>.<br /><br /><strong>TL;DR</strong><br /><br />GAs aren't the most appropriate tool for this kind of puzzle. GAs tend to produce good but sub-optimal solutions and this behaviour is acceptable only for some combinatorial optimization problems.<br /><br />Anyway the last example proposed finds a solution almost always (often in a short time) and GAs prove themselves a general, viable path especially when previous knowledge isn't available.<br /><br /><a href="https://github.com/morinim/vita/wiki/polyomino_tutorial">Read more »</a>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-8113978247272496435.post-18680335434677357322017-12-02T18:27:00.001+01:002018-01-03T16:44:06.186+01:00Hello World! Genetic Algorithms<img src="https://github.com/morinim/vita/wiki/img/weasel.png" alt="" width="126" style="float:left; margin-right:1em" /><br /><br /><p><strong>TL;DR</strong><br /><br />Writing a GAs system from scratch is a great learning experience but for real tasks don't reinvent the wheel.<br /><br />It's not just a matter of subtleties: issues in a fresh implementation can easily go unnoticed because GAs are fault-tolerant by their very nature… at the expense of performance.</p><br /><a href="https://github.com/morinim/documents/blob/master/ga_string_guessing/article.md">Read more »</a>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-8113978247272496435.post-54837035434610110032017-11-06T12:58:00.000+01:002018-01-04T16:45:04.265+01:00A genetic programming approach to Forex Expert Advisor generation<img alt="A genetic programming approach to Forex Expert Advisor generation" border="0" src="https://github.com/morinim/vita/wiki/img/forex.png" width="320" /><br /><br />An example of how to use Genetic Programming to evolve a simple software agent, more specifically a MQL5 Expert Advisor (EA), operating on the Forex exchange market.<br /><br /><a href="https://github.com/morinim/vita/wiki/forex_tutorial">Read more »</a>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-8113978247272496435.post-64237824725984122172017-05-24T17:02:00.000+02:002018-01-04T16:45:23.988+01:00Scheduling concurrent jobs for several machines<img border="0" src="https://github.com/morinim/vita/wiki/img/scheduling.png" width="320" alt="Scheduling concurrent jobs for several machines" /><br /><br />A department needs to run certain programs daily. They have access to a fixed number of machines (homogeneous machines, so each job time is independent of machine used) and have good approximations for the run time of each program.<br /><br />These times are only approximate and moreover we want a schedule that is robust (in the sense that a failure of some assumption such as program time will not do too much damage). Typically one might augment them by a modest percentage; we will simply assume that has already been done and take our set of job times as given.<br /><br /><a href="https://github.com/morinim/vita/wiki/scheduling_tutorial">Read more »</a>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-8113978247272496435.post-55547430665994960712016-10-01T19:08:00.003+02:002017-12-02T18:30:41.755+01:00Pathfinding via genetic algorithms<img border="0" src="https://github.com/morinim/vita/wiki/img/pathfinding.png" width="320" height="162" alt="Pathfinding via genetic algorithms" /><br /><br />The problem we want to solve is to get from a starting point to a goal point avoiding obstacles.<br /><br /><div style="border-top:1px solid black; border-bottom:1px solid black"><strong>NOTE</strong>. There are many efficient, <i>ad-hoc</i> algorithms (e.g. <a href="http://www.redblobgames.com/pathfinding/a-star/introduction.html">A*</a>) that should be preferred for real pathfinding tasks, but this is a good example of the flexibility of GA.<br /></div><br /><a href="https://github.com/morinim/vita/wiki/pathfinding_tutorial">Read more »</a><br /><br /><hr/><p style="text-align:center">(There's also a <a href="http://stackoverflow.com/q/38083172/3235496">related question / answer</a> on Stackoverflow)</p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-8113978247272496435.post-18424123080589655302016-06-29T17:59:00.001+02:002017-12-02T18:54:17.850+01:00How to analyze the performance of an evolutionary algorithm experimentally?<script type="text/x-mathjax-config">MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}});</script><script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script><br /><br />The typical approach is executing several runs of the <abbr title="Evolutionary Algorithm">EA</abbr> and plot the <strong>average performance over time</strong> (average performance of best-of-run-individual NOT population average).<br /><br />A good rule of thumb is performing a minimum of 30 runs (of course 50-100 runs is better).<br /><br />The average performance is better than the best-value-achieved-in-a-set-of-runs but variance should also be taken into account.<br /><br />You can see two nice examples from <a href="http://www.randalolson.com/">Randy Olson's website</a>:<br /><br /><div style="width:100%; text-align:center"><img src="https://4.bp.blogspot.com/-jTuh2ERR6CQ/V3PqwhdNdaI/AAAAAAAAAcY/gMTJyP6xr-MvXlFuOZPwWG2N2S5ij8QFACLcB/s1600/average_fitness.png" style="width:80%" alt="Average fitness of both algorithms over several replicates" /><br /><br /><em>The average fitness of both algorithms over several replicates. From this graph, we would conclude that our algorithm performs better than the current best algorithm on average.</em><br /></div><br /><div style="width:100%; text-align:center"><img src="https://3.bp.blogspot.com/-Kf7doA6vIKA/V3Pqwl3_pkI/AAAAAAAAAcg/kOMTnnkzwS0pVlf4WsqBs6dEhqauah8NwCLcB/s1600/average_fitness_ci.png" style="width:80%" alt="Average fitness with a 95% confidence interval" /><br /><br /><em>The average fitness with a 95% confidence interval for each algorithm. This graph shows us that our algorithm does not actually perform better than the current best algorithm, and only appeared to perform better on average due to chance.</em><br /></div><br /><hr/>The basic breakdown of how to calculate a confidence interval for a population mean is as follows:<br /><br /><ol><li>Identify the <strong>sample mean</strong> $\bar{x}$. While $\bar{x}$ differs from $\mu$, population mean, they are still calculated the same way: $$\bar x = \sum {x_{i} \over n}$$</li><li>Identify the <strong>(corrected) sample standard deviation</strong> $s$: $$s = \sqrt{\frac{\sum_{i=1}^{n}{(x_i - \bar{x})^2}} {n-1}}$$<br />$s$ is an <a href="https://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation">estimation</a> of the population standard deviation $\sigma$.</li><li>Calculate the <strong>critical value</strong>, $t^*$, of the Student-t distribution. This value is dependent on the confidence level, $C$, and the number of observations, $n$.<br /><br /><br />The critical value is found from the <em>t-distribution</em> table (most statistical textbooks list it). In this table $t^*$ is written as $$t^*(\alpha, r)$$ where $r = n-1$ is the <strong>degrees of freedom</strong> (found by subtracting one from the number of observations) and $\alpha = {1-C \over 2}$ is the <strong>significance level</strong>.<br />A better way to a fully precise critical $t^*$ value is the statistical function implemented in spreadsheets (e.g. <a href="http://www.excelfunctions.net/Excel-T-Inv-2t-Function.html">T.INV.2T function</a>) / scientific computing environments (e.g. SciPy <a href="http://stackoverflow.com/a/19339372/3235496">stats.t.ppf</a>), language libraries (e.g. C++ and <a href="http://www.boost.org/doc/libs/1_61_0/libs/math/doc/html/math_toolkit/stat_tut/weg/st_eg/tut_mean_intervals.html">boost::math::students_t</a>).<br /></li><li>Plug the found values into the appropriate equation:<br />$$\left({\bar x} - t^{*}{\frac {s} {\sqrt n}}, {\bar x} + t^{*}{\frac {s} {\sqrt n}}\right)$$</li><li>The final step is to <a href="https://en.wikipedia.org/wiki/Confidence_interval#Meaning_and_interpretation">interpret the answer</a>. Since the found answer is an interval with an upper and lower bound it's appropriate to state that, based on the given data, the true mean of the population is between the <strong>lower bound</strong> and <strong>upper bound</strong> with the chosen confidence level.</li></ol><br /><hr/>The more the confidence intervals of two algorithms overlap, the more likely the algorithms are to perform the same (or we haven't sampled enough to discriminate between the two). If the 95% confidence intervals don't overlap, then the algorithm with the highest average performance performs significantly better.<br /><br />In EA, the source distribution is essentially never normal and what said so far formally applies only to a normal distribution!<br /><br />Indeed it still tells many things. The following table summarizes the performance of the t-intervals under four situations:<br /><br /><table border="1" style="margin-left:auto;margin-right:auto"><thead><tr><th></th><th>Normal curve</th><th>Not Normal curve</th></tr></thead> <tbody><tr><td>Small sample size (n<30)</td><td>Good</td><td>Poor</td></tr><tr><td>Larger sample size (n≥30)</td><td>Good</td><td>Fair</td></tr></tbody> </table><br />For more accurate answers, <strong>non-parametric</strong> statistics are the way to go (see <a href="http://www.cis.uoguelph.ca/~wineberg/publications/ECStat2004.pdf">An Introduction to Statistics for EC Experimental Analysis</a> by Mark Wineberg and Steffen Christensen for further details).<br /><br /><p style="text-align:center"><em>(First revision posted on <a href="http://cs.stackexchange.com/a/60104/50003">Computer Science - Stackexchange</a>)</em></p>Unknownnoreply@blogger.com0