Fork me on GitHub

Learn JS Data

Data cleaning, manipulation, and wrangling in JavaScript

Reading in Data

The first step in any data processing is getting the data! Here is how to parse in and prepare common input formats using D3.js

Parsing CSV Files

D3 has a bunch of filetypes it can support when loading data, and one of the most common is probably plain old CSV (comma separated values).

Let's say you had a csv file with some city data in it:

cities.csv:

city,state,population,land area
seattle,WA,652405,83.9
new york,NY,8405837,302.6
boston,MA,645966,48.3
kansas city,MO,467007,315.0

Use d3.csv to convert it into an array of objects

d3.csv("/data/cities.csv", function(data) {
  console.log(data[0]);
});
=> {city: "seattle", state: "WA", population: "652405", land area: "83.9"}
This code is using d3.js

You can see that the headers of the original CSV have been used as the property names for the data objects. Using d3.csv in this manner requires that your CSV file has a header row.

If you look closely, you can also see that the values associated with these properties are all strings. This is probably not what you want in the case of numbers. When loading CSVs and other flat files, you have to do the type conversion.

We will see more of this in other tasks, but a simple way to do this is to use the + operator (unary plus). forEach can be used to iterate over the data array.

d3.csv("/data/cities.csv", function(data) {
  data.forEach(function(d) {
    d.population = +d.population;
    d["land area"] = +d["land area"];
  });
  console.log(data[0]);
});
=> {city: "seattle", state: "WA", population: 652405, land area: 83.9}
This code is using d3.js

Dot notation is a useful way to access the properties of these data objects. However, if your headers have spaces in them, then you will need to use bracket notation as shown.

This can also be done during the loading of the data, by d3.csv directly. This is done by providing an accessor function to d3.csv, who's return value will be the individual data objects in our data array.

d3.csv("/data/cities.csv", function(d) {
  return {
    city : d.city,
    state : d.state,
    population : +d.population,
    land_area : +d["land area"]
  };
}, function(data) {
  console.log(data[0]);
});
=> {city: "seattle", state: "WA", population: 652405, land_area: 83.9}
This code is using d3.js

In this form, you have complete control over the data objects and can rename properties (like land_area) and convert values (like population) willy-nilly. On the other hand, you have to be quite explicit about which properties to return. This may or may not be what you are into.

I typically allow D3 to load all the data, and then make modifications in a post-processing step, but it might be more effective for you to be more explicit with the modifications.

Reading TSV Files

CSV is probably the most common flat file format, but in no way the only one.

I often like to use TSV (tab separated files) - to get around the issues of numbers and strings often having commas in them.

D3 can parse TSV's with d3.tsv.

Here is animals.tsv, as an example:

name    type    avg_weight
tiger    mammal    260
hippo    mammal    3400
komodo dragon    reptile    150

Loading animals.tsv with d3.tsv:

d3.tsv("/data/animals.tsv", function(data) {
  console.log(data[0]);
});
=> {name: "tiger", type: "mammal", avg_weight: "260"}
This code is using d3.js

Reading Other Flat Files

In fact, d3.csv and d3.tsv are only the tip of the iceberg. If you have a non-standard delimited flat file, you can parse them too with D3!

The process takes a few more steps. First we need to use the d3.dsvFormat function to create a new parser that knows how to read our strangely delimited file type.

For example, if we had a pipe-delimited file called animals_piped.txt that looked like this:

name|type|avg_weight
tiger|mammal|260
hippo|mammal|3400
komodo dragon|reptile|150

We could create a pipe separated values (PSV) parser using d3.dsvFormat:

var psv = d3.dsvFormat("|");

// This parser can parse pipe-delimited text:
var output = psv.parse("first|last\nabe|lincoln")
console.log(output[0])
=> {first: "abe", last: "lincoln"}

Note that this handles the parsing of the text, once we have the text.

The next step is to use this custom parser to build a d3.request that can read in our pipe-delimited file and parse it:

And then use this to parse this pipe-delimited file.

d3.request("/data/animals_piped.txt")
  .mimeType("text/plain")
  .response(function(xhr) { return psv.parse(xhr.responseText) })
  .get(function(data) {
    console.log("pipe-delimited data:")
    console.log(data[1]);
  });
=> pipe-delimited data:
{name: "hippo", type: "mammal", avg_weight: "3400"}
This code is using d3.js

Reading JSON Files

For nested data, or for passing around data where you don't want to mess with data typing, its hard to beat JSON.

JSON has become the language of the internet for good reason. Its easy to understand, write, and parse. And with d3.json - you too can harness its power.

Here is an example JSON file called employees.json:

[
 {"name":"Andy Hunt",
  "title":"Big Boss",
  "age": 68,
  "bonus": true
 },
 {"name":"Charles Mack",
  "title":"Jr Dev",
  "age":24,
  "bonus": false
 }
]

Loading employees.json with d3.json:

d3.json("/data/employees.json", function(data) {
  console.log(data[0]);
});
=> {name: "Andy Hunt", title: "Big Boss", age: 68, bonus: true}
This code is using d3.js

We can see that, unlike our flat file parsing, numeric types stay numeric. Indeed, a JSON value can be a string, a number, a boolean value, an array, or another object. This allows nested data to be dealt with easily.

Loading Multiple Files

D3's basic loading mechanism is fine for one file, but starts to get messy as we nest multiple callbacks.

For loading multiple files, we can use d3.queue to wait for multiple data sources to be loaded.

d3.queue()
  .defer(d3.csv, "/data/cities.csv")
  .defer(d3.tsv, "/data/animals.tsv")
  .await(analyze);

function analyze(error, cities, animals) {
  if(error) { console.log(error); }

  console.log(cities[0]);
  console.log(animals[0]);
}
=> {city: "seattle", state: "WA", population: "652405", land area: "83.9"}
{name: "tiger", type: "mammal", avg_weight: "260"}
This code is using d3.js

Note that we defer the loading of two types of files - using two different loading functions - so this is an easy way to mix and match file types.

The callback function passed into await gets each dataset as a parameter, with the first parameter being populated if an error has occurred in loading the data.

It can be useful to output the error, if it is defined, so you catch data loading problems quickly.

To add another data file, simply add another defer and extend the input parameters for your callback!

Next Task

Combining Data

See Also