Sake is a way to easily design, share, build, and visualize workflows with intricate interdependencies. Sake is self-documenting because the instructions for building a project also serve as the documentation of the project's workflow. The first time it's run, sake will build all of the components of a project in an order that automatically satisfies all dependencies. For all subsequent runs, sake will only rebuild the parts of the project that depend on changed files. This cuts down on unnecessary re-building and lets the user concentrate on their work rather than memorizing the order in which commands have to be run.
Sake is free, open source cross-platform software under a very permissive license (MIT Expat) and is written in Python. Sake is in the beta stage of development.
Consider this example workflow that examines correlates of DUI arrests with various adolescent-related data by state.
--- # Macros #! TEEN_STATS_URL=http://mathforum.org/workshops/sum96/data.collections/datalibrary/US_TeenStats.XL.zip.xls fetch teen stats: help: fetches various teen statstics from the web # no dependencies formula: > curl -o teenstats.xls $TEEN_STATS_URL; output: - teenstats.xls formatting: help: formatting and conversion steps convert teen stats to csv: help: uses gnumerics ssconvert to convert ugly xls to csv and cleans it dependencies: - teenstats.xls - convert.sh formula: > ./convert.sh; output: - teenstats.csv format dui stats: help: format raw (copy and pasted) dui/state data using perl dependencies: - rawdata.txt formula: > perl -pe 's/^(\D+)\s+([\d,]+)\s+([\d,]+)\s*/\1\t\2\t\3\n/' rawdata.txt | sed 's/,//g' > duistats.tsv; output: - duistats.tsv find correlates: help: calls R script that finds correlates of DUI arrest in various teen statistics dependencies: - duistats.tsv - teenstats.csv - dui-correlates.R formula: > ./dui-correlates.R output: - Rplots.pdf - lmcoeffs.txt ...
This is an Sakefile to build/document a process that processes and formats two data files from the web and feeds it into an R script that searches for correlations and, ultimately, produces an output table and a graphic.
The entire process can be performed, start to finish by running the following command:
The mandatory "help" fields are used internally by sake to produce the following output when
You can 'sake' one of the following... "find correlates": - calls R script that finds correlated of DUI arrest in various teen statistics formatting: - formatting and conversion steps "convert teen stats to csv": - uses gnumerics ssconvert to convert ugly xls to csv and cleans it "format dui stats": - format raw (copy and pasted) dui/state data using perl "fetch teen stats": - fetches various teen statstics from the web clean: - remove all targets' outputs and start from scratch visual: - output visual representation of project's dependencies
Finally, a visual representation of the dependency diagram can be produced, automagically, by running the following command
Which produces the following image
This is a really simple example, sure, but it's easy to see that, even for the most labyrinthine of pipelines, that a visualization like this can really help get a sense all the actions involved in a workflow. The key points here are (a) that no extra effort had to be expended by the operator/writer of the workflow to generate 'help' and visualization of dependencies, and (b) that the documentation of the workflow occurs as a result of designing and writing it.
The coupling of writing the automation and documentation makes sake a sound choice for
This project has four dependencies:
The easiest way to install sake is via pip.
Assuming you have python and easy_install installed, just run:
[sudo] easy_install pip [sudo] pip install master-sake
OS-specific instructions are available in the documentation linked to below
PDF documentation may be accessed here
HTML documentation may be accessed here
If you're having trouble using sake; have a question; or want to contribute, please email me at firstname.lastname@example.org