Skip to content

πŸ“ˆ Get your plots right, all along your analysis pipeline πŸ“Š

License

Notifications You must be signed in to change notification settings

yzerlaut/datavyz

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

datavyz logo

datavyz

From data exploration to final figure production, get your plots right, all along your analysis pipeline.

A layer on top of matplotlib to achieve flexible & high-standard data visualization across mediums.

Part of the software suite for data science: analyz, datavyz, finalyz

Motivation / Principle

The motivation behind this extension of the matplotlib library is two-fold:

  • the default settings of matplotlib do not really match the relatively specific constraints (on fontsize, axes size, etc...) of figure production in scientific journals from publishers such as Nature/Springer, Elsevier, Cell Press, ... See the Cell Press figure guidelines for an example of such instructions.

  • a single graphical setting to display figures is not very practical in the process of scientific analysis. Such analysis is usually made of several steps and is performed on different mediums: notebooks with embedded figures, script that produce on-display figures, multipanel figures on A4 page, ...

Therefore, at the core is this library, lies the concept of a *Graphical Environment" (implemented as the graph_env class). It is a set of settings that will best adapt to the specific medium used (and will generalize to others). Create a graph environment associated to a specific visualization setting, below "screen":

from datavyz import graph_env
ge = graph_env('screen')

You then call all of your plotting functions relative to this environment, e.g.:

import numpy # to generate random data
ge.plot(Y=numpy.random.randn(4, 10), sY=numpy.random.randn(4, 10),
        xlabel='xlabel (xunit)', ylabel='ylabel (yunit)', title='datavyz demo plot')
ge.show()

pops up:

Installation

Using git to clone the repository and "pip" to install the package in your python environment:

pip install git+https://github.com/yzerlaut/datavyz.git

Multipanel figure demo

Building a complex multipanel figure such as the following:

is made easy with the module, the above figure is generated by the following code

from datavyz import ge # load the default graph environment ("manuscript")

# generate some random data
t = np.linspace(0, 10, 1e3)
y = np.cos(5*t)+np.random.randn(len(t))

# Panel 'a' - schematic (drawn with Inkscape for example)
fig11 = 'docs/schematic.svg' # just pass a filename

# Panel 'b' - time series plot
fig12, ax12 = ge.figure(axes_extents=(3,1)) # i.e. a wide plot
ax12.plot(t, y)
ge.set_plot(ax12, xlabel='xlabel (xunit)', ylabel='ylabel (yunit)')

# Panel 'c' - more time series plot
fig21, ax21 = ge.figure(axes_extents=(4,1)) # i.e. a very wide plot
ax21.plot(t[t>9], y[t>9], label='raw')
ax21.plot(t[t>9][1:], np.diff(y[t>9]), label='deriv.')
ax21.plot(t[t>9][1:-1], np.diff(np.diff(y[t>9])), label='2nd deriv.')
ge.set_plot(ax21, xlabel='xlabel (xunit)', ylabel='ylabel (yunit)')
ge.legend(ax21, ncol=3, loc=(.3,1.)) # put a legend

# Panel 'd' - scatter plot
fig31, ax31 = ge.scatter(t[::10], t[::10]+np.random.randn(100),
                         xlabel='ylabel (yunit)')

# Panel 'e' - bar plot
fig32, ax32 = ge.bar(np.random.randn(8),
                     COLORS=[ge.viridis(i/7) for i in range(8)],
                     xlabel='ylabel (yunit)')

# Panel 'f' - bar plot
fig33, ax33 = ge.pie([0.25,0.4,0.35], ext_labels=['Set 1', 'Set 2', 'Set 3'])

# Panel 'g' - bar plot
fig34, ax34 = ge.hist(np.random.randn(200))

ge.multipanel_figure([[fig11, fig12],
                      [fig21],
                      [fig31,fig32,fig33,fig34]],
                     LABELS=[['a','b'],
                             ['c'],
                             ['d', 'e', 'f', 'g']],
                     width='double-column', # can also be "single-column" or "one-and-a-half-column"
                     fig_name='docs/multipanel.svg',
                     grid=False, # switch to True to get the Grid position and pricesely place labels if necessary
                     autoposition=True)

Graphical environments

You can specifiy different environments corresponding to different visualization settings.

Below is an example settings.py file (see datavyz/settings.py), it defines two graphical environments: "manuscript" and "notebook".

ENVIRONMENTS = {} # dictionary storing the different environments

"""
MANUSCRIPT ENVIRONMENT
"""
ENVIRONMENTS['manuscript'] = {
    'fontsize':8,
    'default_color':'k',
    'single_plot_size':(22., 16.), # mm
    'hspace_size':10., # mm
    'wspace_size':14., # mm
    'left_size':16., # mm
    'right_size':4., # mm
    'top_size':7., # mm
    'bottom_size':13., # mm
    'background':'w',
    'facecolor':'w',
    'transparency':False,
    'dpi':150,
    'size_factor': 1.,
    'markersize':2.5,
}

"""
DARK NOTEBOOK ENVIRONMENT
"""
ENVIRONMENTS['dark_notebook'] = {
	'fontsize':13,
        'default_color':'lightgray',
        'single_plot_size':(28.*2., 20.*2.), # mm
        'hspace_size':12.*2., # mm
        'wspace_size':16.*2., # mm
        'left_size':20*2., # mm
        'right_size':4.*2., # mm
        'top_size':7.*2., # mm
        'bottom_size':19.*2., # mm
        'background':'none',
        'facecolor':'none',
        'transparency':True,
        'dpi':200,
}

We illustrate the use of those two environments in the demo notebook.

One first design an analysis in a convenient environment to explore data, here "dark_notebook" (integrated within the excellent Emacs IPython Notebook (EIN) interface that allows you to run Jupyter notebooks within Emacs). In a final step, one exports the figure, using the "manuscript" environment, for its inlusion in a report.

Size settings

Features

We document here the different plotting features covered by the library:

Pie plots

# building data
data = .5+np.random.randn(3)*.4

#plotting
fig, ax = ge.pie(data,
				 ext_labels = ['Data1', 'Data2', 'Data3'],
				 pie_labels = ['%.1f%%' % (100*d/data.sum()) for d in data],
				 ext_labels_distance=1.2,
				 explodes=0.05*np.ones(len(data)),
				 center_circle=0.2,
				 COLORS = [ge.tab20(x) for x in np.linspace(0,1,len(data))],
				 # pie_args=dict(rotate=90), # e.g. for rotation
				 legend=None) 
				 # set legend={} to have it appearing
fig.savefig('./docs/pie-plot.png', dpi=200)

Output:

Features plot

# data: breast cancer dataset from datavyz.klearn
from datavyz.klearn.datasets import load_breast_cancer
raw = load_breast_cancer()

# re-arange for plotting
data = {}
for feature, values in zip(raw['feature_names'], raw['data']):
	data[feature+'\n(log)'] = np.log(values)

# plotting
fig, AX = ge.features_plot(data, ms=3,
						   fig_args={'left':.1, 'right':.3, 'bottom':.1, 'top':.1,
									 'hspace':.4, 'wspace':.4})
fig.savefig('docs/features-plot.png', dpi=200)

Cross-correlation plot

Look at the cross-correlation between several joint measurements and estimate the signficance of the correlation:

# building random data
data = {}
for i in range(7):
	data['feature_%s'%(i+1)] = np.random.randn(30)

# plotting
fig = ge.cross_correl_plot(data,
                          features=list(data.keys())[:7])

fig.savefig('./docs/cross-correl-plot.png', dpi=200)

Output:

Bar plots

Classical bar plot

ge.bar(np.random.randn(5), yerr=.3*np.random.randn(5), bottom=-3, COLORS=ge.colors[:5])

Related sample measurements

fig, ax, pval = ge.related_samples_two_conditions_comparison(np.random.randn(10)+2., np.random.randn(10)+2.,
															 xticks_labels=['$\||$cc($V_m$,$V_{ext}$)$\||$', '$cc(V_m,pLFP)$'],
															 xticks_rotation=45, fig_args={'bottom':1.5, 'right':8.})
fig.savefig('docs/related-samples.png', dpi=200)

Unrelated sample measurements

fig, ax, pval = ge.unrelated_samples_two_conditions_comparison(np.random.randn(10)+2., np.random.randn(10)+2.,
															   xticks_labels=['$\||$cc($V_m$,$V_{ext}$)$\||$', '$cc(V_m,pLFP)$'],
															   xticks_rotation=45, fig_args={'bottom':1.5, 'right':8.})
fig.savefig('docs/unrelated-samples.png', dpi=200)

Line plots

Simple trace plot with X-and-Y bars for the labels

fig, ax = ge.plot(t, x,
                  fig_args=dict(figsize=(3,1), left=.4, bottom=.5),
                  bar_scale_args = dict(Xbar=.2,Xbar_label='0.2s',
                                        Ybar=20,Ybar_label='20mV ',
                                        loc='left-bottom'))

Scatter plots

from datavyz import graph_env
ge = graph_env('manuscript')

fig, ax = ge.scatter(Y=np.random.randn(4, 10),
                     sY=np.random.randn(4, 10),
                     xlabel='xlabel (xunit)',
                     ylabel='ylabel (yunit)',
                     title='datavyz demo plot')

Histogram plots

fig, ax = ge.hist(np.random.randn(100), xlabel='some value')
ge.savefig(fig, 'docs/hist-plot.png')

Surface plots

# BUILDING THE DATA
x, y = np.meshgrid(np.arange(1, 11), np.arange(1, 11))
z = np.sqrt(x*y)
x, y, z = np.array(x).flatten(),\
          np.array(y).flatten(),\
          np.array(z).flatten()*np.random.randn(len(z.flatten()))
index = np.arange(len(x))
np.random.shuffle(index)
x, y, z = x[index], y[index], z[index]

# PLOT
fig, ax, acb = ge.twoD_plot(x, y, z,
                            vmin=-7, vmax=7,
                            bar_legend={'label':'color',
                                        'color_discretization':20})
ge.set_plot(ax, xlabel='x-label (X)', ylabel='y-label (Y)')

Parallel coordinates plots

For multidimensional data:

# LOADING THE DATA
from sklearn.datasets import load_iris
dataset = load_iris()
    fig.savefig('docs/parallel-plot.png')

# PLOT
fig, ax = ge.parallel_plot(dataset['data'],
						  SET_OF_LABELS=['sepal length\n(cm)','sepal width\n(cm)',
										 'petal length\n(cm)', 'petal width\n(cm)'],
						  COLORS = [ge.viridis(x/2) for x in dataset['target']])
for i, name in enumerate(dataset['target_names']):
	ge.annotate(ax, name, ((i+1)/3., 1.1), color=ge.viridis(i/2), ha='right')

Components plots

Plotting the components extracted from of a high dimensional dataset. For example, we show below the 10 first components extracted from PCA in the "breast cancer" dataset of sklearn.

# LOADING THE DATA
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()

# PERFORMING PCA
from sklearn.decomposition import PCA as sklPCA
pca = sklPCA(n_components=10)
pca.fit_transform([data[key] for key in data])

# PLOT
fig, AX = ge.components_plot(pca.components_)

Time-frequency plots

Plotting the quantities following a time-frequency analysis (e.g. a wavelet transform below):

# import continuous wavelet transform
from analyz.freq_analysis.wavelet_transform import my_cwt 

dt, tstop = 1e-4, 1.
t = np.arange(int(tstop/dt))*dt

freq1, width1, freq2, width2, freq3, width3 = 10., 100e-3, 40., 40e-3, 70., 20e-3
data = 3.2+np.cos(2*np.pi*freq1*t)*np.exp(-(t-.5)**2/2./width1**2)+\
    np.cos(2*np.pi*freq2*t)*np.exp(-(t-.2)**2/2./width2**2)+\
    np.cos(2*np.pi*freq3*t)*np.exp(-(t-.8)**2/2./width3**2)

# Continuous Wavelet Transform analysis
freqs = np.linspace(1, 90, 40)
coefs = my_cwt(data, freqs, dt)

fig, AX = ge.time_freq_plot(t, freqs, data, coefs)    

ge.savefig(fig, 'docs/time-freq.png')

Insets

from datavyz import graph_env
ge = graph_env('manuscript')

y = np.exp(np.random.randn(100))
fig, ax = ge.plot(y, xlabel='time', ylabel='y-value')
sax = ge.inset(ax, rect=[.5,.8,.5,.4])
ge.hist(y, bins=10, ax=sax, axes_args={'spines':[]}, xlabel='y-value')
fig.savefig('docs/inset.png')
ge.show()

Twin axis

from datavyz import graph_env
ge = graph_env('manuscript')

fig, ax = ge.figure(figsize=(1.2,1), left=1., right=4.)
ax2 = ax.twinx()
ax.plot(np.log10(np.logspace(-2,3,100)), np.exp(np.random.randn(100)), 'o', ms=2, color=ge.blue)
ax2.plot(np.log10(np.logspace(-2,3,100)), np.exp(np.random.randn(100)), 'o', ms=1, color=ge.red)
ge.set_plot(ax2, ['right'], yscale='log', ylabel='blabal',
         tck_outward=2, ycolor=ge.red)
ge.set_plot(ax, ycolor=ge.blue, xcolor='k',
         yscale='log', ylabel='blabal', xscale='already-log10',
         tck_outward=2, xlabel='trying', ylabelpad=-5)

Annotations

fig, AX= ge.figure(axes=(1,10), figsize=(.5,.5), bottom=1.5)
for i, ax in enumerate(AX):
    ge.top_left_letter(ax, ge.int_to_roman(i+1))
    ge.matrix(np.random.randn(10,10), ax=ax)

sax = ge.arrow(fig, x0=0.04, y0=.2, dx=.93, dy=0.)
ge.annotate(fig, 'time', (.5, .17), ha='center')

Bar legends

Other custom features

About

πŸ“ˆ Get your plots right, all along your analysis pipeline πŸ“Š

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages