"The growing availability of data on the Web provided by Web 2.0 applications and, more recently through Linked Data, brought the computational pattern expressed as ETL to reemerge in a scenario with additional complexity, where the number of data sources and the data heterogeneity that needs to be supported by ETL drastically increases. In this scenario, issues with data quality and trustworthiness may strongly impact the data utility for end-users. The barriers involved in building an ETL infrastructure under the complexity and scale of the available Web-based data supply scenario, demands the deﬁnition of strategies which can provide data quality warranties and also minimize the effort associated with data management.
In this context, provenance, the representation of artifacts, processes and agents behind a piece of information, becomes a fundamental element of the data infrastructure. Provenance have a large spectrum of applications including documentation & reproducibility and data quality assessment & trustworthiness and consistency-checking & semantic reconciliation. However, in an environment where data is produced and consumed by different systems, the representation of provenance should be made interoperable across systems.
Standardization eﬀorts towards the convergence into a common provenance model generated the Open Provenance Model (OPM). OPM provides a basic description of provenance which allows interoperability on the level of workﬂow structure. The deﬁnition of this common provenance ground allows systems with different provenance representations to share at least a workﬂow-level semantics (the causal dependencies between artifacts, processes and the intervention of agents). OPM, however, is not intended to be a complete provenance model, but demands the complementary use of additional provenance models in order to enable uses of provenance which requires higher level of semantic interoperability.
Cogs is an ETL Provenance Vocabulary which extends the workflow semantics provided by OPM and Prov-O, allowing the description of ETL processes and objects. The Cogs vocabulary can be used to describe data transformations in general, outside the scope of ETL tools and practices. The core objective of the vocabulary is to improve the level of semantic interoperability of data transformation provenance descriptors, building upon the OPM and Prov-O standardization efforts."