Jupyter: Plotting pivots & changing legend entries

A while ago I blogged about project Jupyter and in the last days I have been working a lot with it and I am still fascinated by its power.

Today I faced and solved two challenges I like to share here:
. plotting a pivot table
. changing legend entries

Assume we have the following dataframe:

Creating a pivot is a piece of cake by using the pandas pivot_table method on that dataframe:

Code:
pivot = pd.pivot_table(df,index=["Org"],values=["Male employees","Female employees"], 
    aggfunc=[len,np.mean,np.min,np.max,np.sum])

 

This gets us
. the number of departments per org ( = len Female employees or len Male employees )
. the sum of male and female employees per org ( = sum Female employees and sum Male employees )
. as well as mean, min and max

How to plot ?
We can simply save the pivot tables as a new dataframe ‘pivot’ and call its plot method. Let’s say we want to plot sum of male and female employees per org. First we need to drop the other statistics from the pivot table we don’t need for the plot. Then we plot:

 

Code:
pivot.drop(['len','mean','amin','amax'],axis=1).plot(kind="barh") 
plt.show()

 

Only problem here is that the legend entries of this plot look a bit cryptic. Here is some code to fix this:

 

Code:
ax = plt.gca() 
handles,labels = ax.get_legend_handles_labels() 
new_labels = [] 
for l in labels: 
    new_labels.append(l.split(",")[-1][:-1]) 
ax.legend(handles, new_labels)  
plt.show()

 

I have shared the entire notebook here.

Advertisements

How to print ipython notebooks without the source code

This is something I really need to create sort of standard reports based on ipython notebooks which should not contain the source code and input prompts of ipython cells: the capability to print ipython notebooks without the source code.

There are ways to do that as discussed here on stackoverflow but all these methods involve adding some ugly code to your ipython cells or tweaking the way the ipython server is started ( or running nbconvert ) which might be out of your control if you use some cloud offering like Data Science Experience on IBM Cloud and not your own ipython installation.

Here is how I achieve this:

I simply download my notebook as html.

Then I run this python script to convert that html file so that prompts and code cells are gone:

FILE = "/somewhere/myHTMLFile.html"

with open(FILE, 'r') as html_file:
    content = html_file.read()

# Get rid off prompts and source code
content = content.replace("div.input_area {","div.input_area {\n\tdisplay: none;")    
content = content.replace(".prompt {",".prompt {\n\tdisplay: none;")

f = open(FILE, 'w')
f.write(content)
f.close()

That script bascially adds the CSS ‘display: none’ attribute for all divs of class ‘prompt’ or ‘input_area’.

That tweaked html page now easily can be printed into a pdf file for me to get my standard report without any code or input prompt cells.

If you know what you are doing you can add more CSS tweaking, like e.g. this one, to that Python code:

# For dataframe tables use Courier font family with smaller font size
content = content.replace(".dataframe thead","table.dataframe { font-size: 7px; font-family: Courier; }\n.dataframe thead")

To figure out things like that I used Firefox Inspector to determine class names of DOM elements ( like e.g. ‘div.data_frame’ is used to display dataframe tables in ipython ) and some CSS knowledge to achieve the manipulations I find useful, like reducing the font size of tables in order to make them fit on pages printed with portrait orientation.

Project Jupyter

Project Jupyter is an open source project allowing to run Python code in a web browser, focusing to support interactive data science and scientific computing not only for Python but across all programming languages. It is a spin-off from IPython I blogged about here.
Typically you would have to install Jupyter and a full stack of Python packages on your computer and start the Jupyter server to get started.
But there is also an alternative available in the web where you can run IPython notebooks for free: https://try.jupyter.org/
This site does not allow you to save your projects permanently but you can export projects and download and also upload notebooks from your local computer.
IPython notebooks are a great way to get started with Python and learn the language. It makes it easy to run your script in small increments and preserves the state of those increments aka cells. It also nicely integrates output into your workflow including graphical plots created with packages like matplotlib.pyplot, and it comes with some primitive markup language to add documentation to your scripts.
The possibilities are endless with IPython or Jupyter – to learn Python as a language or data analysis techniques.
I was inspired by this video on IBM developerWorks to again get started with this: “Use data science to up your game performance“. And the book “Learning IPython for Interactive Computing and Data Visualization – Second Edition” by Cyrille Rossant is the source where I got this tip from about free Jupyter in the web.

Of course you can also sign up for a trial on IBMs Bluemix and start a IBM Data Science Experience project.