Making a Training Dataset from Multiple Data Distributions

Over time we might accumulate lots of data from several different populations: e.g., the spread of a virus across different countries. Yet what we wish to model is not any one of these populations. One might want a model for the spread of the virus that is robust to the different countries, or is predictive on a new location we have only limited data for. We overview and formalize the objectives these present for mixing different distributions to make a training dataset, which have historically been hard to optimize.