Fork me on GitHub

Learn JS Data

Data cleaning, manipulation, and wrangling in JavaScript

Reading in Data

The first step in any data processing is getting the data! Here is how to parse in and prepare common input formats using D3.js

Parsing CSV Files

D3 has a bunch of filetypes it can support when loading data, and one of the most common is probably plain old CSV (comma separated values).

Let's say you had a csv file with some city data in it:

cities.csv:

city,state,population,land area
seattle,WA,652405,83.9
new york,NY,8405837,302.6
boston,MA,645966,48.3
kansas city,MO,467007,315.0

Use d3.csv to convert it into an array of objects

d3.csv("/data/cities.csv").then(function(data) {
  console.log(data[0]);
});

=> {city: "seattle", state: "WA", population: "652405", land area: "83.9"}
This code is using d3.js

You can see that the headers of the original CSV have been used as the property names for the data objects. Using d3.csv in this manner requires that your CSV file has a header row.

If you look closely, you can also see that the values associated with these properties are all strings. This is probably not what you want in the case of numbers. When loading CSVs and other flat files, you have to do the type conversion.

We will see more of this in other tasks, but a simple way to do this is to use the + operator (unary plus). forEach can be used to iterate over the data array.

d3.csv("/data/cities.csv").then(function(data) {
  data.forEach(function(d) {
    d.population = +d.population;
    d["land area"] = +d["land area"];
  });
  console.log(data[0]);
});

=> {city: "seattle", state: "WA", population: 652405, land area: 83.9}
This code is using d3.js

Dot notation is a useful way to access the properties of these data objects. However, if your headers have spaces in them, then you will need to use bracket notation as shown.

This can also be done during the loading of the data, by d3.csv directly. This is done by providing an accessor function to d3.csv, whose return value will be the individual data objects in our data array.

d3.csv("/data/cities.csv", function(d) {
  return {
    city : d.city,
    state : d.state,
    population : +d.population,
    land_area : +d["land area"]
  };
}).then(function(data) {
  console.log(data[0]);
});

=> {city: "seattle", state: "WA", population: 652405, land_area: 83.9}
This code is using d3.js

In this form, you have complete control over the data objects and can rename properties (like land_area) and convert values (like population) willy-nilly. On the other hand, you have to be quite explicit about which properties to return. This may or may not be what you are into.

I typically allow D3 to load all the data, and then make modifications in a post-processing step, but it might be more effective for you to be more explicit with the modifications.

Reading TSV Files

CSV is probably the most common flat file format, but in no way the only one.

I often like to use TSV (tab separated files) - to get around the issues of numbers and strings often having commas in them.

D3 can parse TSV's with d3.tsv.

Here is animals.tsv, as an example:

name    type    avg_weight
tiger    mammal    260
hippo    mammal    3400
komodo dragon    reptile    150

Loading animals.tsv with d3.tsv:

d3.tsv("/data/animals.tsv").then(function(data) {
  console.log(data[0]);
});

=> {name: "tiger", type: "mammal", avg_weight: "260"}
This code is using d3.js

Reading Other Flat Files

In fact, d3.csv and d3.tsv are only the tip of the iceberg. If you have a non-standard delimited flat file, you can parse them too using d3.dsv!

For example, here is a pipe-delimited file called animals_piped.txt:

name|type|avg_weight
tiger|mammal|260
hippo|mammal|3400
komodo dragon|reptile|150

We first provide d3.dsv with the delimiter, in this case, a pipe (|), then read in our file:

d3.dsv("|", "/data/animals_piped.txt").then(function(data){
  console.log(data[1]);
});

=> {name: "hippo", type: "mammal", avg_weight: "3400"}

This code is using d3.js

Reading JSON Files

For nested data, or for passing around data where you don't want to mess with data typing, its hard to beat JSON.

JSON has become the language of the internet for good reason. Its easy to understand, write, and parse. And with d3.json - you too can harness its power.

Here is an example JSON file called employees.json:

[
 {"name":"Andy Hunt",
  "title":"Big Boss",
  "age": 68,
  "bonus": true
 },
 {"name":"Charles Mack",
  "title":"Jr Dev",
  "age":24,
  "bonus": false
 }
]

Loading employees.json with d3.json:

d3.json("/data/employees.json").then(function(data) {
  console.log(data[0]);
});

=> {name: "Andy Hunt", title: "Big Boss", age: 68, bonus: true}
This code is using d3.js

We can see that, unlike our flat file parsing, numeric types stay numeric. Indeed, a JSON value can be a string, a number, a boolean value, an array, or another object. This allows nested data to be dealt with easily.

Loading Multiple Files

D3's basic loading mechanism is fine for one file, but starts to get messy as we nest multiple callbacks.

For loading multiple files, we can use Promises to wait for multiple data sources to be loaded.

Promise.all([
  d3.csv("/data/cities.csv"),
  d3.tsv("/data/animals.tsv")
]).then(function(data) {
  console.log(data[0][0])  // first row of cities
  console.log(data[1][0])  // first row of animals
});

=> {city: "seattle", state: "WA", population: "652405", land area: "83.9"}
{name: "tiger", type: "mammal", avg_weight: "260"}
This code is using d3.js

Note that inside the all method we load two types of files - using two different loading functions - so this is an easy way to mix and match file types.

The method returns an array of our data sources. The first item returns our cities; the second, our animals.

Next Task

Combining Data

See Also