Making useful Corner / triangle plots with gwnr

Often we find ourselves in possession of samples drawn from a multivariate probability distribution $p(\theta)$, whose full form is not known to us. These samples usually come from an inferencing algorithm that attempts to measure the same distribution. A combined two-dimensional and one-dimensional visualization of the data is a useful tool to understand the properties of $p(\theta)$ and various bivariate correlations hidden inside of it.

Contemporary libraries like seaborn, triangle and corner have functionality to make these figures, but all of them lack one or the other aspect often desired while understanding inferred posteriors for source parameters of gravitational-wave sources - such as customizability of two-D panels to show multiple analyses together, ability to show the prior distributions, ability to show contours of a mapped variable on the two-dimensional panels. Our class gwnr.graph.CornerPlot hopes to achieve the same.

This function loads a file containing samples from a posterior probability distribution. The details of data storage in that file are not relevant to the purposes of this tutorial.

Initialize CornerPlot class with samples

The necessary argument is:

  1. A 2D array-type datastructure with columns / rows stacking independent samples of different variables. The dimension with a smaller extent is assumed to represent different variables, unless var_names is given a list with the number of elements equal to the data's dimension with larger extent.

Other useful though optional arguments are:

  1. var_type: a string specifying the variable type of each variable in order. E.g. if there are 3 parameters and all are continuous variables, provide 'ccc', while if the second of the 3 parameters is discrete with (without) order, provide 'coc' ('cuc'). The default choice is to assume all variables are continuous.
  2. var_names: List of strings equal in length to the number of columns in data array. Defaults to using whole numbers starting from zero.
  3. verbose: Show log messages. Defaults to True.

Scatter plots on 2D panels

Lets make a simple corner plot showing the distribution of two variables. The default visualization for the 1D panels is to only mark the $90\%$ credible intervals. We can enable the demarkation of median values explicitly by setting show_oned_median=True. The default visualization for the 2D panels is show a scatter plot. Colors are chosen randomly, so every time one calls this function a new color is chosen, unless manually assigned by setting color to one of the colors that matplotlib recognizes, e.g. b, r, g, etc.

Contours on 2D panels

We can change the 2D panels to show contours instead of scatters, by setting plot_type to contour (instead of its default value scatter):

The number of contours to be plotted can be controlled through the argument contour_levels, which by default includes contours for $1-\sigma$, $2-\sigma$ and $90\%$ credible levels. Let us try adding our own:

Lets plot another set of variables, the chirp mass and mass ratio for the source binary:

The fontsizes are probably not big enough in the previous panel. The default was to use fontsize=18, we can increase it thus:

Adding a 3rd dimension on 2D panels

Sometimes we may want to visualize a third dimension on the 2D panels in addition to the number density that we anyway find on those. We can add this by setting param_color to the name of that parameter (must be part of the original dataset that the CornerPlot object was initialized with):

Or if one wants, they can forego the number density completely and simply show contours of the 3rd dimension.

In the example below it doesn't come out very well because we are simply showing the mass ratio as the 3rd dimension, which is either monotonically changing in the 2D panels or is (as it must be) randomly distributed.

Showing true values in case those are known

Multiple visualizations on the same corner plot

There are many use cases for showing multiple distributions on the same figure, for e.g. when one wants to compare parameter recovery from diferent waveform models, or when one wants to compare different events with each other. The CornerPlot class allows for trivial combination of multiple such data sets.

As the first use case, let us image if we want to show both credible-level contours as well as a scatter plot on the 2D panels marking a 3rd dimension (here taken to be the mass ratio $q$). We first draw the scatter plots on the 2D panels as well as the 1D histograms with the first of the two function calls below and catch the returned figure and axis array. Then we use the same figure/axes and add contours on just the 2D panels, while disabling any drawing on the 1D panels using skip_oned_hists=True:

For our second use case we first make a dummy posterior and show it alongside our original posterior set.

These can be labelled by providing a label and enabling the display of legends with legend=True. We only support and encourage placing the legend on 1D panels, and that too only the first one.

If for some reason one wants to have a legend on every 1D panel, that can be done by setting label_oned_hists to -1 for all 1D panels or to a list of integers indexing the intended set of panels (this can be a useful feature when the number of parameters is large).

Combining some of these features, let us visualize two runs. For the first, we show contours of posterior density, alongside a scatterplot showing the number density of samples. For the second, we show a scatterplot showing the number density but also with a heatmap showing a 3rd dimension (here, simply, the mass ratio):

Clearly, this is not a realistic use case but only a demonstration of how different cornerplots can be overlaid seamlessly here.