gwnr
¶Often we find ourselves in possession of samples drawn from a multivariate probability distribution $p(\theta)$, whose full form is not known to us. These samples usually come from an inferencing algorithm that attempts to measure the same distribution. A combined two-dimensional and one-dimensional visualization of the data is a useful tool to understand the properties of $p(\theta)$ and various bivariate correlations hidden inside of it.
Contemporary libraries like seaborn
, triangle
and corner
have functionality to make these figures, but all of them lack one or the other aspect often desired while understanding inferred posteriors for source parameters of gravitational-wave sources - such as customizability of two-D panels to show multiple analyses together, ability to show the prior distributions, ability to show contours of a mapped variable on the two-dimensional panels. Our class gwnr.graph.CornerPlot
hopes to achieve the same.
import imp
from gwnr.graph import CornerPlot
import numpy as np
import pandas as pd
import h5py
import pycbc.pnutils as pnu
This function loads a file containing samples from a posterior probability distribution. The details of data storage in that file are not relevant to the purposes of this tutorial.
def load_bilby_posterior(file_name):
with open(file_name, "r") as fin:
params = fin.readline()
params = params.strip("\n").split(" ")
dd = np.loadtxt(fin)
post = pd.DataFrame(dd, columns=params)
return post
file_name = "/home/prayush/Downloads/GW170608_samples.dat"
!head -n3 /home/prayush/Downloads/GW170608_samples.dat
mass_ratio chirp_mass dec ra theta_jn psi eccentricity time_jitter 0.7507036259364714 8.27764145878667 0.6448370197412604 2.2597678207703207 2.3566637840825995 2.695118554979706 0.32307361859251554 0.00010649436756763144 0.911738531361221 8.474086398968682 0.9760107441668023 2.320093418306539 2.25488295206224 2.804353919883073 0.20164690895915888 -0.00017163009647279786
post = load_bilby_posterior(file_name)
post['mass1'], post['mass2'] =\
pnu.mchirp_q_to_mass1_mass2(post['chirp_mass'].to_numpy(), post['mass_ratio'].to_numpy())
post.head()
mass_ratio | chirp_mass | dec | ra | theta_jn | psi | eccentricity | time_jitter | mass1 | mass2 | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0.750704 | 8.277641 | 0.644837 | 2.259768 | 2.356664 | 2.695119 | 0.323074 | 0.000106 | 10.996843 | 8.255370 |
1 | 0.911739 | 8.474086 | 0.976011 | 2.320093 | 2.254883 | 2.804354 | 0.201647 | -0.000172 | 10.196624 | 9.296655 |
2 | 0.998247 | 8.384037 | 1.338142 | 0.576523 | 1.659306 | 1.785259 | 0.244411 | 0.000096 | 9.639184 | 9.622284 |
3 | 0.607167 | 8.368590 | 0.588561 | 2.145443 | 2.007669 | 1.378058 | 0.024203 | 0.000055 | 12.413084 | 7.536812 |
4 | 0.885460 | 8.439551 | 1.362240 | 1.300832 | 1.886080 | 2.333544 | 0.219664 | -0.000060 | 10.306267 | 9.125789 |
post.to_numpy().shape
(5440, 10)
CornerPlot
class with samples¶The necessary argument is:
var_names
is given a list with the number of elements equal to the data's dimension with larger extent.Other useful though optional arguments are:
var_type
: a string specifying the variable type of each variable in order. E.g. if there are 3 parameters and all are continuous variables, provide 'ccc', while if the second of the 3 parameters is discrete with (without) order, provide 'coc' ('cuc'). The default choice is to assume all variables are continuous.var_names
: List of strings equal in length to the number of columns in data array. Defaults to using whole numbers starting from zero.verbose
: Show log messages. Defaults to True
.pl = CornerPlot(post.to_numpy(), var_names=post.columns.to_list())
Lets make a simple corner plot showing the distribution of two variables. The default visualization for the 1D panels is to only mark the $90\%$ credible intervals. We can enable the demarkation of median values explicitly by setting show_oned_median=True
. The default visualization for the 2D panels is show a scatter plot. Colors are chosen randomly, so every time one calls this function a new color is chosen, unless manually assigned by setting color
to one of the colors that matplotlib
recognizes, e.g. b
, r
, g
, etc.
axs = pl.draw(['ra', 'dec'], show_oned_median=True)
/home/prayush/local/venv/pycbc_inf/local/lib/python2.7/site-packages/gwnrtools-0.1.0-py2.7.egg/gwnrtools/graph/corner.py:246: MatplotlibDeprecationWarning: The hold function was deprecated in version 2.0. fig.hold(True) /home/prayush/local/venv/pycbc_inf/local/lib/python2.7/site-packages/matplotlib/axes/_axes.py:6571: UserWarning: The 'normed' kwarg is deprecated, and has been replaced by the 'density' kwarg. warnings.warn("The 'normed' kwarg is deprecated, and has been "
We can change the 2D panels to show contours instead of scatters, by setting plot_type
to contour
(instead of its default value scatter
):
axs = pl.draw(['ra', 'dec'], plot_type='contour', show_oned_median=True)
/home/prayush/local/venv/pycbc_inf/local/lib/python2.7/site-packages/matplotlib/contour.py:1004: UserWarning: The following kwargs were not used by contour: 'label' s)
The number of contours to be plotted can be controlled through the argument contour_levels
, which by default includes contours for $1-\sigma$, $2-\sigma$ and $90\%$ credible levels. Let us try adding our own:
axs = pl.draw(['ra', 'dec'], plot_type='contour', contour_levels=[10.0, 30.0, 50.0, 90.0, 95.0],
show_oned_median=True)
Lets plot another set of variables, the chirp mass and mass ratio for the source binary:
axs = pl.draw(['chirp_mass', 'mass_ratio'],
plot_type='contour',
contour_levels=[50.0, 90.0, 95.0],
show_oned_median=True)
axs = pl.draw(['chirp_mass', 'mass_ratio', 'ra', 'dec'],
plot_type='contour',
contour_levels=[50.0, 90.0, 95.0],
show_oned_median=True)
The fontsizes are probably not big enough in the previous panel. The default was to use fontsize=18
, we can increase it thus:
axs = pl.draw(['chirp_mass', 'mass_ratio', 'ra', 'dec'],
plot_type='contour',
contour_levels=[50.0, 90.0, 95.0],
show_oned_median=True,
fontsize=24)
Sometimes we may want to visualize a third dimension on the 2D panels in addition to the number density that we anyway find on those. We can add this by setting param_color
to the name of that parameter (must be part of the original dataset that the CornerPlot
object was initialized with):
axs = pl.draw(['chirp_mass', 'mass_ratio', 'ra', 'dec'],
plot_type='scatter',
param_color='mass_ratio',
contour_levels=[50.0, 90.0, 95.0],
show_oned_median=True,
fontsize=24)
Or if one wants, they can forego the number density completely and simply show contours of the 3rd dimension.
axs = pl.draw(['chirp_mass', 'mass_ratio', 'ra', 'dec'],
plot_type='contour',
param_color='mass_ratio',
contour_levels=[10.0, 30.0, 50.0, 60.0, 70.0, 80.0, 90.0, 95.0],
show_oned_median=True,
fontsize=24)
In the example below it doesn't come out very well because we are simply showing the mass ratio as the 3rd dimension, which is either monotonically changing in the 2D panels or is (as it must be) randomly distributed.
axs = pl.draw(['chirp_mass', 'mass_ratio', 'ra', 'dec'],
plot_type='scatter',
param_color='mass_ratio',
contour_levels=[50.0, 90.0, 95.0],
true_params_vals=[8.5, 0.8, 1.57, 0.75],
grid_twod_on=False,
show_oned_median=True,
fontsize=24)
There are many use cases for showing multiple distributions on the same figure, for e.g. when one wants to compare parameter recovery from diferent waveform models, or when one wants to compare different events with each other. The CornerPlot
class allows for trivial combination of multiple such data sets.
As the first use case, let us image if we want to show both credible-level contours as well as a scatter plot on the 2D panels marking a 3rd dimension (here taken to be the mass ratio $q$). We first draw the scatter plots on the 2D panels as well as the 1D histograms with the first of the two function calls below and catch the returned figure and axis array. Then we use the same figure/axes and add contours on just the 2D panels, while disabling any drawing on the 1D panels using skip_oned_hists=True
:
f, axs = pl.draw(['chirp_mass', 'mass_ratio', 'ra', 'dec'],
plot_type='scatter',
param_color='mass_ratio',
show_oned_median=True,
fontsize=24)
f, axs = pl.draw(['chirp_mass', 'mass_ratio', 'ra', 'dec'],
plot_type='contour',
contour_levels=[50.0, 90.0, 95.0],
skip_oned_hists=True,
fontsize=24,
fig=f,
axes_array=axs)
For our second use case we first make a dummy posterior and show it alongside our original posterior set.
post2 = post.copy()
# multiplying all samples by 1.2
post2['chirp_mass'] = post2['chirp_mass'] * 1.2
post2['mass_ratio'] = post2['mass_ratio'] * 1.2
pl2 = CornerPlot(post2.to_numpy(),
var_names=post.columns.to_list())
No handlers could be found for logger "numexpr.utils"
f, axs = pl.draw(['chirp_mass', 'mass_ratio'],
plot_type='contour',
contour_levels=[50.0, 90.0, 95.0],
show_oned_percentiles=-1,
show_oned_median=True,
fontsize=24)
f, axs = pl2.draw(['chirp_mass', 'mass_ratio'],
plot_type='contour',
contour_levels=[50.0, 90.0, 95.0],
show_oned_percentiles=-1, # disabled showing 90% credible intervals on 1D panels
show_oned_median=True,
fontsize=24,
fig=f,
axes_array=axs)
These can be labelled by providing a label
and enabling the display of legends with legend=True
. We only support and encourage placing the legend on 1D panels, and that too only the first one.
f, axs = pl.draw(['chirp_mass', 'mass_ratio'],
plot_type='contour',
contour_levels=[50.0, 90.0, 95.0],
show_oned_percentiles=-1,
show_oned_median=True,
label='run1', legend=True,
fontsize=24)
f, axs = pl2.draw(['chirp_mass', 'mass_ratio'],
plot_type='contour',
contour_levels=[50.0, 90.0, 95.0],
show_oned_percentiles=-1, # disabled showing 90% credible intervals on 1D panels
show_oned_median=True,
label='run2', legend=True,
fontsize=24,
fig=f,
axes_array=axs)
If for some reason one wants to have a legend on every 1D panel, that can be done by setting label_oned_hists
to -1
for all 1D panels or to a list of integers indexing the intended set of panels (this can be a useful feature when the number of parameters is large).
f, axs = pl.draw(['chirp_mass', 'mass_ratio'],
plot_type='contour',
contour_levels=[50.0, 90.0, 95.0],
show_oned_percentiles=-1,
show_oned_median=True,
label='run1', legend=True,
fontsize=24)
f, axs = pl2.draw(['chirp_mass', 'mass_ratio'],
plot_type='contour',
contour_levels=[50.0, 90.0, 95.0],
show_oned_percentiles=-1, # disabled showing 90% credible intervals on 1D panels
show_oned_median=True,
label='run2', legend=True,
label_oned_hists=-1,
fontsize=24,
fig=f,
axes_array=axs)
Combining some of these features, let us visualize two runs. For the first, we show contours of posterior density, alongside a scatterplot showing the number density of samples. For the second, we show a scatterplot showing the number density but also with a heatmap showing a 3rd dimension (here, simply, the mass ratio):
f, axs = pl.draw(['chirp_mass', 'mass_ratio', 'ra', 'dec'],
plot_type='scatter',
show_oned_median=True,
show_oned_percentiles=-1,
label='run1', legend=True,
fontsize=24)
f, axs = pl.draw(['chirp_mass', 'mass_ratio', 'ra', 'dec'],
plot_type='contour',
contour_levels=[50.0, 90.0, 95.0],
skip_oned_hists=True,
show_oned_percentiles=-1,
fontsize=24,
fig=f,
axes_array=axs)
f, axs = pl2.draw(['chirp_mass', 'mass_ratio', 'ra', 'dec'],
plot_type='scatter',
param_color='mass_ratio',
skip_oned_hists=False,
show_oned_median=True,
show_oned_percentiles=-1,
fontsize=24,
label='run2', legend=True,
fig=f,
axes_array=axs)
Clearly, this is not a realistic use case but only a demonstration of how different cornerplots can be overlaid seamlessly here.