The first step in any data processing is getting the data! Here is how to parse in and prepare common input formats using D3.js
D3 has a bunch of filetypes it can support when loading data, and one of the most common is probably plain old CSV (comma separated values).
Let's say you had a csv file with some city data in it:
cities.csv:
city,state,population,land area
seattle,WA,652405,83.9
new york,NY,8405837,302.6
boston,MA,645966,48.3
kansas city,MO,467007,315.0
Use d3.csv to convert it into an array of objects
d3.csv("/data/cities.csv").then(function(data) {
console.log(data[0]);
});
=> {city: "seattle", state: "WA", population: "652405", land area: "83.9"}
You can see that the headers of the original CSV have been used as the property names for the data objects. Using d3.csv
in this manner requires that your CSV file has a header row.
If you look closely, you can also see that the values associated with these properties are all strings. This is probably not what you want in the case of numbers. When loading CSVs and other flat files, you have to do the type conversion.
We will see more of this in other tasks, but a simple way to do this is to use the + operator (unary plus). forEach
can be used to iterate over the data array.
d3.csv("/data/cities.csv").then(function(data) {
data.forEach(function(d) {
d.population = +d.population;
d["land area"] = +d["land area"];
});
console.log(data[0]);
});
=> {city: "seattle", state: "WA", population: 652405, land area: 83.9}
Dot notation is a useful way to access the properties of these data objects. However, if your headers have spaces in them, then you will need to use bracket notation as shown.
This can also be done during the loading of the data, by d3.csv
directly. This is done by providing an accessor function to d3.csv
, whose return value will be the individual data objects in our data array.
d3.csv("/data/cities.csv", function(d) {
return {
city : d.city,
state : d.state,
population : +d.population,
land_area : +d["land area"]
};
}).then(function(data) {
console.log(data[0]);
});
=> {city: "seattle", state: "WA", population: 652405, land_area: 83.9}
In this form, you have complete control over the data objects and can rename properties (like land_area
) and convert values (like population
) willy-nilly. On the other hand, you have to be quite explicit about which properties to return. This may or may not be what you are into.
I typically allow D3 to load all the data, and then make modifications in a post-processing step, but it might be more effective for you to be more explicit with the modifications.
CSV is probably the most common flat file format, but in no way the only one.
I often like to use TSV (tab separated files) - to get around the issues of numbers and strings often having commas in them.
D3 can parse TSV's with d3.tsv.
Here is animals.tsv
, as an example:
name type avg_weight
tiger mammal 260
hippo mammal 3400
komodo dragon reptile 150
Loading animals.tsv with d3.tsv
:
d3.tsv("/data/animals.tsv").then(function(data) {
console.log(data[0]);
});
=> {name: "tiger", type: "mammal", avg_weight: "260"}
In fact, d3.csv
and d3.tsv
are only the tip of the iceberg. If you have a non-standard delimited flat file, you can parse them too using d3.dsv!
For example, here is a pipe-delimited file called animals_piped.txt
:
name|type|avg_weight
tiger|mammal|260
hippo|mammal|3400
komodo dragon|reptile|150
We first provide d3.dsv
with the delimiter, in this case, a pipe (|
), then read in our file:
d3.dsv("|", "/data/animals_piped.txt").then(function(data){
console.log(data[1]);
});
=> {name: "hippo", type: "mammal", avg_weight: "3400"}
For nested data, or for passing around data where you don't want to mess with data typing, its hard to beat JSON.
JSON has become the language of the internet for good reason. Its easy to understand, write, and parse. And with d3.json - you too can harness its power.
Here is an example JSON file called employees.json
:
[
{"name":"Andy Hunt",
"title":"Big Boss",
"age": 68,
"bonus": true
},
{"name":"Charles Mack",
"title":"Jr Dev",
"age":24,
"bonus": false
}
]
Loading employees.json
with d3.json
:
d3.json("/data/employees.json").then(function(data) {
console.log(data[0]);
});
=> {name: "Andy Hunt", title: "Big Boss", age: 68, bonus: true}
We can see that, unlike our flat file parsing, numeric types stay numeric. Indeed, a JSON value can be a string, a number, a boolean value, an array, or another object. This allows nested data to be dealt with easily.
D3's basic loading mechanism is fine for one file, but starts to get messy as we nest multiple callbacks.
For loading multiple files, we can use Promises to wait for multiple data sources to be loaded.
Promise.all([
d3.csv("/data/cities.csv"),
d3.tsv("/data/animals.tsv")
]).then(function(data) {
console.log(data[0][0]) // first row of cities
console.log(data[1][0]) // first row of animals
});
=> {city: "seattle", state: "WA", population: "652405", land area: "83.9"}
{name: "tiger", type: "mammal", avg_weight: "260"}
Note that inside the all
method we load two types of files - using two different loading functions - so this is an easy way to mix and match file types.
The method returns an array of our data sources. The first item returns our cities; the second, our animals.