Blog

Using FRICTIONLESS DATA SOFTWARE to turn data into insight

By Serah Rono on Feb 16, 2018

Share

facebook twitter

Getting insight from data is not always a straightforward process. Data is often hard to find, archived in difficult to use formats, poorly structured and/or incomplete. These issues create friction and make it difficult to use, publish and share data. The Frictionless Data Initiative at Open Knowledge International aims to reduce friction in working with data, with a goal to make it effortless to transport data among different tools and platforms for further analysis.

Computer Hope

In our session at Open Belgium, we will look at several sources of friction in working with data and how to alleviate them:

1) Content and Structural errors
Tabular data provides a good basis for data manipulation and analysis. However, common structural errors (missing or extra values, blank or duplicate headers) and content errors such as missing fields often get in the way and lower the quality of our data. We will see how to

  • - work with Frictionless Data validation tooling and libraries, both online and in the command line interface, to check data for these errors so they are fixed before data is shared or used elsewhere.

  • - validate our data against a set schema where one is available and

  • - set up automated, continuous validation in cases where our data is regularly updated.

2) Missing context in data
If you have just a CSV file, how do you know its contents? What about its source? Authors? License? There are some things you can infer, but that's unreliable, you need the context of the data. The Data Package specification provides a way to describe the context (i.e. metadata) of the data in a machine-readable way. This allows not only humans to read it, but also software can parse it. In this section, we will see how to create a data package.

3) Data transport among various tools for storage and analysis
We will share a couple of integrations that make it easier to work with Frictionless Data software with popular data tools. There are libraries for importing Data Packages into R, SQL databases and BigQuery, and existing integrations to Pandas, Jupyter, Nteract and others.

Our goal for this session is to show you how to streamline your data processes using Frictionless Data tooling and introduce you to our vibrant community of users and contributors as we make a case for the need to improve data quality in our ecosystem.

Written by

written by Serah Rono
Logo open Belgium 2019 quit