Jupyter notebooks &etc. for processing data from the Barbara Curtis Adachi Bunraku (Japanese Puppet Theater) Collection.
online collection data / bunraku-online.ipynb
Cake PHP site powered by Relational MYSQL database | |
1 | MySQL dump to CSVs |
2 | Import CSVs into IPython as Pandas Dataframes |
3 | Merge relational data (from CSV jointables) onto Dataframes by type |
4 | Export Dataframes as JSON records (and CSVs, for archival purposes only). |
5 | Drop null key:value pairs from JSON (bash JQ) |
6 | Convert (no nulls) JSON to YAML (bash Pyyaml) |
7 | Generate Jekyll collections (and pages) from YAML using Yaml-Splitter plugin |
Static Jekyll site powered by YAML data, with JSON index for static search |
total collection data / bunraku-full.ipynb
The data accessible on the original PHP site (as well as the new Jekyll site) represents only about 60% or so of the information stored in the MySQL database. To preserve that information for future use, I used a separate Ipy notebook/pipeline to output CSVs and JSON where images/media marked 'offline' were not dropped.
There is also a Jupyter notebook for generating matplotlib graphs and D3-specific/refactored JSON for data visualization. (bunraku-stats.ipynb)