The gathering and deployment of project related data and data sources is a crucial step in almost every academic (and also nonacademic) projects. Especially in projects with partners from different disciplines and with differing concepts for data access, curation and storage. Better support for data acquisition and metadata integration in multi-disciplinary teams is an important goal of the COLABIS platform. Pilot Zero comprises a suite of web applications that ease data curation and provide solutions for data aquisition, management and publication.
The Data Management Platform, which is still in an alpha stadium, helps teams to manage and organize their data within the COLABIS platform. After a semi-automatic qualification and publication process this data can be pushed to a public web catalogue. Via adapters, the DMP also harvests and tracks external data sources.
Important results and data products in COLABIS are published in a web accessile catalogue. This cataluge ist built from CKAN (short for Comprehensive Knowledge Archive Network), a portal solution for open-data platforms, which was developed by the Open Knowledge Foundation. Last not least, the catalogue also links to important external resources, which are closely linked to the data products and COLABIS applications.
With the 52° North Sensor Web REST Proxy API and Web Application users can study the published data and perform time series analysis. At the current stage, we only provide a binding to point-based in-situ observations, but there is more to come. The following pilots will deliver bindings for typical geographic data formats and add support for mobile sensor platforms. Internally, we use this API to create demo applications for our stakeholders.
Data access in Pilot Zero is organized in two zones (see Figure 1). The private zone contains the Data Management Platform and restricts access to team members. It supports the scientific data management tasks and organizes the various data sets which come different formats and diverse schemas. As scientists are experts in their field of work but mostly not a specialist in data infrastructures the DMP helps them to create and publish standardized data products. With a straightforward workflow for saving raw scientific data files and a wizard to annotate the data with necessary metadata, the DMP eases the publication of well-defined data products.
Figure 1: The Workflow in Pilot Zero. Data is organized and maintained in a private data management platform. Subsequent qualification and publication steps transfer the data to a public catalogue and serve it to web applications for study and analysis.
The public zone provides access to well-defined data sets. CKAN is a well established open-source software solution for open data and an ideal solution for COLABIS' public data portal. It also provides a lightweight interface for the communication with 3rd party components and simpliefies the thematic, temporal and spatial search for scientific data. To create urban observatory applications for the monitoring and early warning the RESTProxy uses data served by CKAN and creates time series analysis and visualizations. Adapting the 52°North developments for a lightweight REST API to the COLABIS requirements helps to combine sensor data (or more common observations) and the spatial component of maps in a very simple way.
What is DMP?
The Data Management Platform (DMP) is a platform to collect, manage and publish existing data which will become accessible via the COLABIS platform. A collaborative, semi-automatic process of semantic enrichment can be applied to the uploaded data, while the system keeps track about the changes. All data satisfying the needs of the COLABIS API can be publish with a single click to grant public access to it. While the requirements for this qualification process are specified using an iterative strategy, we are investigating methods to formalize these and minimize the required effort for the user. Furthermore, data-adapters are used to integrate not only existing datasets but external data sources as well. Connecting different sources such as Open Data Providers can improve the added value of the COLABIS platform significantly.
Upload existing scientific or sensor related data.
Allow to integrate external data sources (e.g. Open Data).
Manage data, while keeping track about all changes.
Apply a semi-automatic qualification process to semantically enrich data.
Publish qualified data to the COLABIS platform.
Who needs it?
Currently, the focus of the DMP is mainly on researchers who want to:
Integrate preexisting datasets or external data sources to the COLABIS platform.
Enrich the data to fulfill the specified requirements, while keeping track about all changes.
Make data publicly available and hence make it usable in the COLABIS application layer.
How does it work?
The main component of the DMP acts as a file manager with some extension related to the qualification process. Its modular design allows adding additional function without digging deep into the existing code. The reactive core and the use of cloud-based technologies allows keeping the storage as well as the platform itself highly scalable. Even though, this is not essential for the current stage of the project, it becomes increasingly useful if a variety of different data sources will be integrated later.
CKAN is short for Comprehensive Knowledge Archive Network and is an open-source software for open data catalogs, which stores the metadata of the data and not (necessarily) the data itself. With the included SOLR search engine it makes it easy to find data, even for spatial searches. CKAN allows user to find collection of scientific data quickly and easily, irrespectively of their origin, discipline or community and to get quick overviews of available data by browsing through the collections using standardizes facets.
Based on data produced throughout the project and external services
Metadata mapped onto standardized facets
Supports faceted, geospatial and temporal metadata search
Allows searching and browsing based on keyword searches
Find related data
Preview a sample of the source as interactive charts, table and maps
Discover the datasets change history
Stable API for 3rd party services and applications
Who needs it?
Our CKAN instance is primarily aimed at researchers and practitioners who want to:
Quickly find useful data resources which they can use for their research purposes
Build new collections of data to address specific research questions
Browse and preview the available data sets
How does it work?
Who can use it?
Data sets served through CKAN at COLABIS are open to all researchers and scientists free of charge. Please respect the license tags that come with the data sets and provide proper citations.
With “APPS” we refer to the COLABIS application layer. This comprises not only the applications we are developing ourselves (e.g. our observation data viewer) but also the components which facilitate the creation of such applications. For example, our Sensor Web REST-API allows application developers to easily access different underlying data sources (e.g. the observation data available on the CKAN server or measurement data offered by OGC compliant SOS server) so that they do not have to deal with these sometimes complex interfaces.
Features of the Sensor Web REST-API
Use CKAN and SOS servers as data sources
Harvest metadata from the configured data sources
Caching of data offered by the configured data sources (only for CKAN servers)
Convenient operations for application developers to discover and access observation data sets
Support of mobile sensors (experimental)
Features of the observation data viewer:
Diagram and table views for time series data
Track display for visualising mobile sensors and their measurements (experimental)
Data export functionality (CSV files)
Parameterised URL calls to start the viewer with pre-selected data sets
Comfortable navigation tools
Support of different device types (smart phones, tablet, desktop PC)
Who needs it?
The different components of our APPS layer are targeted towards different types of users. The observation data viewer is intended for:
scientists who want to discover and explore available observation data
operators of sensor networks who want to get an overview of the collected measurements
The underlying Sensor Web REST-API is especially useful for developers who would like to
build applications consuming observation data (e.g. data viewers, data analysis tools)
consume observation data from multiple sources (e.g. OGC Sensor Observation Service, CKAN) without caring about different data formats and interfaces
re-use functionality to interact with OGC Sensor Web Enablement servers without implementing a client for these rather complex standards
How does it work?
The Sensor Web REST-API acts as a proxy to underlying data sources. This component, which is developed in Java, is for example able to use CKAN servers as well as SOS servers as data sources. For this purpose, the proxy regularly analyses the content offered by these data sources and puts the collected data and metadata into its internal data store (as PostgreSQL database). Using this cache the Sensor Web REST-API is capable of handling incoming queries with a high level of performance.
Who can use it?
All software components of the APPS layer are published by 52°North under open source licenses. Thus, every interested developer may use these tools