A self-documenting build automation tool
Sake is a way to easily design, share, build, and visualize workflows with intricate interdependencies. Sake is self-documenting because the instructions for building a project also serve as the documentation of the project's workflow. The first time it's run, sake will build all of the components of a project in an order that automatically satisfies all dependencies. For all subsequent runs, sake will only rebuild the parts of the project that depend on changed files. This cuts down on unnecessary re-building and lets the user concentrate on their work rather than memorizing the order in which commands have to be run.
Sake is free, open source cross-platform software under a very permissive license (MIT Expat) and is written in Python. Sake is stable
Quick links:
Consider this example workflow that examines correlates of DUI arrests with various adolescent-related data by state.
---
# Macros
#! TEEN_STATS_URL=http://mathforum.org/workshops/sum96/data.collections/datalibrary/US_TeenStats.XL.zip.xls
fetch teen stats:
help: fetches various teen statstics from the web
# no dependencies
formula: >
curl -o teenstats.xls $TEEN_STATS_URL;
output:
- teenstats.xls
formatting:
help: formatting and conversion steps
convert teen stats to csv:
help: uses gnumerics ssconvert to convert ugly xls to csv and cleans it
dependencies:
- teenstats.xls
- convert.sh
formula: >
./convert.sh;
output:
- teenstats.csv
format dui stats:
help: format raw (copy and pasted) dui/state data using perl
dependencies:
- rawdata.txt
formula: >
perl -pe 's/^(\D+)\s+([\d,]+)\s+([\d,]+)\s*/\1\t\2\t\3\n/'
rawdata.txt | sed 's/,//g' > duistats.tsv;
output:
- duistats.tsv
find correlates:
help: calls R script that finds correlates of DUI arrest in various teen statistics
dependencies:
- duistats.tsv
- teenstats.csv
- dui-correlates.R
formula: >
./dui-correlates.R
output:
- Rplots.pdf
- lmcoeffs.txt
...
This is an Sakefile to build/document a process that processes and formats two data files from the web and feeds it into an R script that searches for correlations and, ultimately, produces an output table and a graphic.
The entire process can be performed, start to finish by running the following command:
sake
The mandatory "help" fields are used internally by sake to produce the following output when
sake help
is run:
You can 'sake' one of the following...
"find correlates":
- calls R script that finds correlated of DUI arrest in various teen statistics
formatting:
- formatting and conversion steps
"convert teen stats to csv":
- uses gnumerics ssconvert to convert ugly xls to csv and cleans it
"format dui stats":
- format raw (copy and pasted) dui/state data using perl
"fetch teen stats":
- fetches various teen statstics from the web
clean:
- remove all targets' outputs and start from scratch
visual:
- output visual representation of project's dependencies
Finally, a visual representation of the dependency diagram can be produced, automagically, by running the following command
sake visual
Which produces the following image
This is a really simple example, sure, but it's easy to see that, even for the most labyrinthine of pipelines, that a visualization like this can really help get a sense all the actions involved in a workflow. The key points here are (a) that no extra effort had to be expended by the operator/writer of the workflow to generate 'help' and visualization of dependencies, and (b) that the documentation of the workflow occurs as a result of designing and writing it.
The coupling of writing the automation and documentation makes sake a sound choice for
This project has four dependencies:
The easiest way to install sake is via pip.
Assuming you have python and easy_install installed, just run:
[sudo] easy_install pip
[sudo] pip install master-sake
OS-specific instructions are available in the documentation linked to below
PDF documentation may be accessed here
HTML documentation may be accessed here
If you're having trouble using sake; have a question; or want to contribute, please email me at tony.fischetti@gmail.com