• Slide Background


Readability

With Snakemake, data analysis workflows are defined via an easy to read, adaptable, yet
powerful specification language on top of Python. Each rule describes
a step in an analysis defining how to obtain output files from input files.
Dependencies
between rules are determined automatically.


Portability

By integration with the Conda package
manager
and container
virtualization
, all software dependencies of each
workflow step are automatically deployed upon execution.


Modularization

Rapidly implement analysis steps via direct script and jupyter notebook integration. Easily create and employ
re-usable tool wrappers and split
your data analysis into well-separated modules.

configfile: “config.yaml”

rule all:



input:



expand(

“plots/{country}.hist.pdf”,


country=config[“countries”]


)

rule select_by_country:



input:



“data/worldcitiespop.csv”



output:



“by-country/{country}.csv”



conda:



“envs/xsv.yaml”



shell:



“xsv search -s Country ‘{wildcards.country}’ “


“{input} > {output}”

rule plot_histogram:

input:

“by-country/{country}.csv”



output:

“plots/{country}.hist.svg”


container:

“docker://faizanbashir/python-datascience:3.6”



script:

“scripts/plot-hist.py”

rule convert_to_pdf:

input:

“{prefix}.svg”



output:

“{prefix}.pdf”



wrapper:

“0.47.0/utils/cairosvg”


Transparency

Automatic, interactive, self-contained reports ensure full transparency from results
down to used steps, parameters, code, and software.

Slide Background


Scalability

Workflows scale seamlessly from single to multicore, clusters or the cloud, without
modification of the workflow definition and automatic avoidance of redundant
computations.

workstation

Slide Background

compute server

Slide Background

cluster

Slide Background

grid computing

Slide Background

cloud computing

Slide Background

Read More