By Miel Vander Sande on Feb 08, 2017
Today's truth is tomorrows lie, and in the data world, this is no different. Facts on the Web grow and change over time, appending an interesting history to our datasets. This enables analyzing the evolution of query results, for which possible use cases are not hard to imagine:
• How did a Belgium's population change in the last 10 years?
• What modifications were made to the data model;
are my applications still interoperable?
• How fast did certain topics in my dataset grow?
Answering these questions unlocks knowledge that was not previously accessible. In order to retrieve this knowledge from the Web, however, both the data and its history need to be queryable. Unfortunately for Linked Data, the growing amount of produced datasets still struggles to do both.
For most organizations, the barrier is a lack of funds. Despite of having much to gain, such data publishers simply can't afford the expensive infrastructure and maintenance to offer complex querying to the public. Not to mention adding the complexity of going back in time. Therefore, we need to take a step back and look at current Linked Data publishing strategies. Most of these are centered around a SPARQL endpoint: the database is made public as is. However, as queries can be arbitrarily complex, the load on the server is very high. For this reason, most publishers resort to hosting data dumps for download or putting Linked Data documents online. From a consumer perspective, these are not particularly useful as the data cannot be queried directly.
Hence, new sustainable strategies have to democratize Linked Data publishing further, as more available data increases its success and practicality. This means thinking more about 'good enough' solutions that, with a few compromises like increased query speed, enable us to get a lot more done.
In a collaboration between Los Alamos National Laboratory and Ghent University, a toolchain was composed that embodies this vision. A smart combination of the low-cost Triple Pattern Fragments API, HDT RDF archives, and Memento versioning framework for HTTP resources, allows under-resourced institutions to pragmatically publish Linked Data archives on the Web. During my session at Open Belgium, I will explain all the different parts, demonstrate its value and, more importantly, show the audience how to deploy it in no time.
Image ©Jason Powell