By Serah Rono on Feb 16, 2018
Getting insight from data is not always a straightforward process. Data is often hard to find, archived in difficult to use formats, poorly structured and/or incomplete. These issues create friction and make it difficult to use, publish and share data. The Frictionless Data Initiative at Open Knowledge International aims to reduce friction in working with data, with a goal to make it effortless to transport data among different tools and platforms for further analysis.
In our session at Open Belgium, we will look at several sources of friction in working with data and how to alleviate them:
1) Content and Structural errors
Tabular data provides a good basis for data manipulation and analysis. However, common structural errors (missing or extra values, blank or duplicate headers) and content errors such as missing fields often get in the way and lower the quality of our data. We will see how to
2) Missing context in data
If you have just a CSV file, how do you know its contents? What about its source? Authors? License? There are some things you can infer, but that's unreliable, you need the context of the data. The Data Package specification provides a way to describe the context (i.e. metadata) of the data in a machine-readable way. This allows not only humans to read it, but also software can parse it. In this section, we will see how to create a data package.
3) Data transport among various tools for storage and analysis
We will share a couple of integrations that make it easier to work with Frictionless Data software with popular data tools. There are libraries for importing Data Packages into R, SQL databases and BigQuery, and existing integrations to Pandas, Jupyter, Nteract and others.
Our goal for this session is to show you how to streamline your data processes using Frictionless Data tooling and introduce you to our vibrant community of users and contributors as we make a case for the need to improve data quality in our ecosystem.