The gathering and deployment of project related data and data sources is a crucial step in almost every academic (and also nonacademic) projects. Especially in projects with partners from different disciplines and with differing concepts for data access, curation and storage. Better support for data acquisition and metadata integration in multi-disciplinary teams is an important goal of the COLABIS platform. Pilot Zero comprises a suite of web applications that ease data curation and provide solutions for data aquisition, management and publication.
The Data Management Platform, which is still in an alpha stadium, helps teams to manage and organize their data within the COLABIS platform. After a semi-automatic qualification and publication process this data can be pushed to a public web catalogue. Via adapters, the DMP also harvests and tracks external data sources.
Important results and data products in COLABIS are published in a web accessile catalogue. This cataluge ist built from CKAN (short for Comprehensive Knowledge Archive Network), a portal solution for open-data platforms, which was developed by the Open Knowledge Foundation. Last not least, the catalogue also links to important external resources, which are closely linked to the data products and COLABIS applications.
With the 52° North Sensor Web REST Proxy API and Web Application users can study the published data and perform time series analysis. At the current stage, we only provide a binding to point-based in-situ observations, but there is more to come. The following pilots will deliver bindings for typical geographic data formats and add support for mobile sensor platforms. Internally, we use this API to create demo applications for our stakeholders.
Data access in Pilot Zero is organized in two zones (see Figure 1). The private zone contains the Data Management Platform and restricts access to team members. It supports the scientific data management tasks and organizes the various data sets which come different formats and diverse schemas. As scientists are experts in their field of work but mostly not a specialist in data infrastructures the DMP helps them to create and publish standardized data products. With a straightforward workflow for saving raw scientific data files and a wizard to annotate the data with necessary metadata, the DMP eases the publication of well-defined data products.
Figure 1: The Workflow in Pilot Zero. Data is organized and maintained in a private data management platform. Subsequent qualification and publication steps transfer the data to a public catalogue and serve it to web applications for study and analysis.
The public zone provides access to well-defined data sets. CKAN is a well established open-source software solution for open data and an ideal solution for COLABIS' public data portal. It also provides a lightweight interface for the communication with 3rd party components and simpliefies the thematic, temporal and spatial search for scientific data. To create urban observatory applications for the monitoring and early warning the RESTProxy uses data served by CKAN and creates time series analysis and visualizations. Adapting the 52°North developments for a lightweight REST API to the COLABIS requirements helps to combine sensor data (or more common observations) and the spatial component of maps in a very simple way.
The Data Management Platform (DMP) is a platform to collect, manage and publish existing data which will become accessible via the COLABIS platform. A collaborative, semi-automatic process of semantic enrichment can be applied to the uploaded data, while the system keeps track about the changes. All data satisfying the needs of the COLABIS API can be publish with a single click to grant public access to it. While the requirements for this qualification process are specified using an iterative strategy, we are investigating methods to formalize these and minimize the required effort for the user. Furthermore, data-adapters are used to integrate not only existing datasets but external data sources as well. Connecting different sources such as Open Data Providers can improve the added value of the COLABIS platform significantly.
Currently, the focus of the DMP is mainly on researchers who want to:
The main component of the DMP acts as a file manager with some extension related to the qualification process. Its modular design allows adding additional function without digging deep into the existing code. The reactive core and the use of cloud-based technologies allows keeping the storage as well as the platform itself highly scalable. Even though, this is not essential for the current stage of the project, it becomes increasingly useful if a variety of different data sources will be integrated later.
CKAN is short for Comprehensive Knowledge Archive Network and is an open-source software for open data catalogs, which stores the metadata of the data and not (necessarily) the data itself. With the included SOLR search engine it makes it easy to find data, even for spatial searches. CKAN allows user to find collection of scientific data quickly and easily, irrespectively of their origin, discipline or community and to get quick overviews of available data by browsing through the collections using standardizes facets.
Our CKAN instance is primarily aimed at researchers and practitioners who want to:
CKAN is written in Python and has a JavaScript front-end. We operate it with a PostgreSQL database engine with PostGIS extension for geographic objects. The search in CKAN is powered by SOLR search engine. The modular architecture of CKAN is developer friendly and allows us to customize and adapt the core framework to our needs. Its stable and powerful API allows 3rd applications and services (like DMP and APPS) to be build around it.
Data sets served through CKAN at COLABIS are open to all researchers and scientists free of charge. Please respect the license tags that come with the data sets and provide proper citations.
With “APPS” we refer to the COLABIS application layer. This comprises not only the applications we are developing ourselves (e.g. our observation data viewer) but also the components which facilitate the creation of such applications. For example, our Sensor Web REST-API allows application developers to easily access different underlying data sources (e.g. the observation data available on the CKAN server or measurement data offered by OGC compliant SOS server) so that they do not have to deal with these sometimes complex interfaces.
Features of the Sensor Web REST-API
Features of the observation data viewer:
The different components of our APPS layer are targeted towards different types of users. The observation data viewer is intended for:
The underlying Sensor Web REST-API is especially useful for developers who would like to
The observation data viewer is written in JavaScript. It has been designed in a modular manner so that future enhancements and adjustments are easily possible. Furthermore, the design of the viewer is responsive so that is can be used on mobile phones, tablets, as well as regular desktop computers.
The Sensor Web REST-API acts as a proxy to underlying data sources. This component, which is developed in Java, is for example able to use CKAN servers as well as SOS servers as data sources. For this purpose, the proxy regularly analyses the content offered by these data sources and puts the collected data and metadata into its internal data store (as PostgreSQL database). Using this cache the Sensor Web REST-API is capable of handling incoming queries with a high level of performance.
All software components of the APPS layer are published by 52°North under open source licenses. Thus, every interested developer may use these tools