Why use Make

Why use Make February 23, 2013Mike Bostock

I love Make. You can think of Make as just a tool for building large binaries or libraries (and that's almost a mistake), but it's much more than that. Makefiles are machine-readable documentation that makes your workflow repeatable.

To clarify, this article isn't just about GNU Make; it's about the benefits of capturing workflows through a file-based dependency tracking build system, including modern alternatives such as Rake and Waf.

To illustrate with a recent example: yesterday, Kevin and I needed to update a six-month-old drought chart to accompany a new article on thin snowpack in the West. The article was already on the homepage, so time was running out to republish it with new data as soon as possible.

Shamefully, I hadn't documented the data transformation process, and it's painfully easy to forget six-month details: I had a mess of CSV and GeoJSON data files, but not the URL exact NCDC source; I was temporarily confused as to the correct Palmer drought metric (Drought Severity Index or Z Index?) and corresponding categorical thresholds; Finally, I had to resurrect the code to calculate the drought coverage area.

Despite these challenges, we have republished the updated chart without much delay. But I thought how much easier it could have been if I had just saved the process the first time as a makefile. I could have just typed make in the terminal and that was it!

#It's files all the way

The beauty of Make is that it's just a rigorous way of recording what you're already doing. It doesn't fundamentally change how you do something, but it does encourage you to record every step of the process, allowing you (and your colleagues) to reproduce the entire process later.< /p>

The basic concept is that generated files depend on other files. When the generated files are missing or when the files they depend on have changed, the necessary files are recreated using a sequence of commands that you specify.

Let's say you're building a choropleth map of unemployment and you need a TopoJSON file of US counties. This file depends on the cartographic boundaries published by the U.S. Census Bureau. So your workflow might look like:

Download a Census Bureau zip archive. Extract the shapefile from the archive. Convert the shapefile to TopoJSON.

As a flowchart:

In a slightly mind-bending maneuver, Make encourages you to express your workflow backward as dependencies between files, rather than forward as a sequential recipe. For example, the shapefile depends on the zip archive because you need to download the archive before you can extract the shapefile (obviously). So to express your workflow in a language that Make understands, consider the dependency graph instead:

This way of thinking can be uncomfortable at first, but it has its benefits. Unlike a linear script, a dependency graph is flexible and modular; for example, you can augment the makefile to derive multiple shapefiles from the same zip archive without repeated downloads. Capturing dependencies also drives efficiency: you can recreate generated files with minimal effort when something changes. A well-designed makefile allows you to iterate quickly while keeping generated files consistent and up-to-date.

#The syntax is not pretty

The ugly side of Make is its syntax and complexity; the full manual is a whopping 183 pages. Fortunately, you can skip most of these and start with explicit rules of the following form:

target file: source file ordered

Here targetfile is the file you want...

Why use Make February 23, 2013Mike Bostock

I love Make. You can think of Make as just a tool for building large binaries or libraries (and that's almost a mistake), but it's much more than that. Makefiles are machine-readable documentation that makes your workflow repeatable.

To clarify, this article isn't just about GNU Make; it's about the benefits of capturing workflows through a file-based dependency tracking build system, including modern alternatives such as Rake and Waf.

To illustrate with a recent example: yesterday, Kevin and I needed to update a six-month-old drought chart to accompany a new article on thin snowpack in the West. The article was already on the homepage, so time was running out to republish it with new data as soon as possible.

Shamefully, I hadn't documented the data transformation process, and it's painfully easy to forget six-month details: I had a mess of CSV and GeoJSON data files, but not the URL exact NCDC source; I was temporarily confused as to the correct Palmer drought metric (Drought Severity Index or Z Index?) and corresponding categorical thresholds; Finally, I had to resurrect the code to calculate the drought coverage area.

Despite these challenges, we have republished the updated chart without much delay. But I thought how much easier it could have been if I had just saved the process the first time as a makefile. I could have just typed make in the terminal and that was it!

#It's files all the way

The beauty of Make is that it's just a rigorous way of recording what you're already doing. It doesn't fundamentally change how you do something, but it does encourage you to record every step of the process, allowing you (and your colleagues) to reproduce the entire process later.< /p>

The basic concept is that generated files depend on other files. When the generated files are missing or when the files they depend on have changed, the necessary files are recreated using a sequence of commands that you specify.

Let's say you're building a choropleth map of unemployment and you need a TopoJSON file of US counties. This file depends on the cartographic boundaries published by the U.S. Census Bureau. So your workflow might look like:

Download a Census Bureau zip archive. Extract the shapefile from the archive. Convert the shapefile to TopoJSON.

As a flowchart:

In a slightly mind-bending maneuver, Make encourages you to express your workflow backward as dependencies between files, rather than forward as a sequential recipe. For example, the shapefile depends on the zip archive because you need to download the archive before you can extract the shapefile (obviously). So to express your workflow in a language that Make understands, consider the dependency graph instead:

This way of thinking can be uncomfortable at first, but it has its benefits. Unlike a linear script, a dependency graph is flexible and modular; for example, you can augment the makefile to derive multiple shapefiles from the same zip archive without repeated downloads. Capturing dependencies also drives efficiency: you can recreate generated files with minimal effort when something changes. A well-designed makefile allows you to iterate quickly while keeping generated files consistent and up-to-date.

#The syntax is not pretty

The ugly side of Make is its syntax and complexity; the full manual is a whopping 183 pages. Fortunately, you can skip most of these and start with explicit rules of the following form:

target file: source file ordered

Here targetfile is the file you want...

What's Your Reaction?

like

dislike

love

funny

angry

sad

wow