Data Strategy: Part 1 - Data Management is Fundamental

Press Review

back to overview

In 2012, MIT Sloan published an article explaining the “Big Data Revolution.” The authors emphasized the necessity to utilize the tremendous amounts of data that are being generated through digitization, the Internet and the, then still in its infancy, Internet of Things (IoT). Fast forward to today. A few short months ago, we penned a paper entitled “Screwed: The Real Value of Data. In it, we argued that it was no longer a question of whether data has value, but rather whether your organization can transform itself into one that leverages data to build its digital foundation for the future.

Instead of just leaving you there with this platitude of data being the “digital screws” of the future, we at INFORM have decided to put together a three-part series on effective data strategy. It will have a good deal of meat to it that will assist readers in actually understanding the foundation of data strategy and enable them to start down the road to establishing one. In our series, we will give you an introduction to what cutting-edge technology enables us to do, what the implications for maritime operations are, and, most importantly, give you an idea of how to access these benefits. In Part 1, we kick it off by having a look at data management.

What is Data Management in Layman’s Terms?

Data Management is a term that can be interpreted in various ways. Its definition ranges from being used as a term for data sources, storage, connectivity, transfer, transformation, and modeling, but also less technical terms like governance, security, or cataloging. So, what is it? Well – all of the above. Data Management describes the collection, refinement, and provisioning of data. It includes everything that has to happen between the creation of a data point and that data being made available in an appropriate form for consumption by data analytics, data science, artificial intelligence, operations research, and other advanced computing practices.

Why is Data Integration Relevant?

Spoiler alert: Speed is what is important here.

One key component of data management is data integration. The value of datasets is vastly increased if they are enriched with context, especially across system boundaries. If we can, for example, integrate the information of inbound shipments with metadata from the port of origin, shipping line, container master data, handling equipment parameters, etc., we can start to form the ominous, digital twin that is quickly becoming a focal point of many ports and terminals around the world.

The availability of associated information with regards to every part of port and terminal operations would incredibly increase transparency, control, and provide the best foundation for making insight-based, split-second decisions. Speed is crucial. Insights, in retrospect, can be helpful in refining processes, but it does not help you identify problems in real-time and certainly does not give you data-based options for resolving the issues.

Modern Tools Make Data Integration Straight-forward

Up until recently, the entry cost and effort to implement solu ons for Extract, Transf tiorm, and Load (ETL), as well as storage (data warehouses especially), was a significant deterrent to implementing data integration as part of your data strategy. Add to this the lack of qualified personnel, and the challenges typically outwaited the ROI potential.

However, since 2012, many things have changed. “Big Data” has just become “data.” No one really bats an eye at millions of records anymore. Machine learning is ubiquitous in our everyday life, be it in navigation, shopping, meal recommendations, or smart assistants. Computing storage cost and power have been made highly accessible and extremely affordable through the propagation of cloud-based computation business and service models. Before, the scope of data-driven projects used to be limited by the horsepower available in one’s on-premises servers. Nowadays, fully scalable resources are available through cloud providers like Amazon, Google, and Microsoft – to name just a few of the prominent players.

This has seen the cost of storing a terabyte of data in a cloud data warehouse drop to as low as 23 USD (20 EUR) per month. To put this into perspective, a consumer, solid-state disk is five times the cost and does not come with built-in enterprise-level security. The same goes for the ability to run analytics queries on the data. Cloud computing power is scaled to facilitate whatever complex calculation is thrown at it and charged by the minute of usage. Gone are the days of paying for dormant CPUs that only spin up occasionally.

Another major development is the emergence of capable ETL tools that, most often, do not only move data from the source to the centralized data storage (be it data lake or warehouse – more on that below), but also will assess data quality (at a rudimentary level), create data models, and, in some cases, will automatically create data marts for immediate consumption by data analytics solutions. Every process along the value-added data chain where data gets handled, transferred. or transformed is also often referred to as “data in motion.” Capable contenders include Qlik Data Integration or TimeXtender as well as proprietary data pipelines like Snowpipe (Snowflake) or Microsoft Azure Data Factory (MS Azure). Other tools come with built-in data catalogs that allow business users to simply “shop” for data necessary to tackle the business challenges before them.

This allows companies to approach data management in a more flexible and versatile fashion. In traditional systems, the design of the solution determines the necessary data model. Based on the data model, lengthy data architecture projects are necessary to facilitate data analytics projects. If, at a later stage, additional fields or transformations are necessary, these changes could only be embedded after days, weeks, or even months of modeling. This greatly delays the benefits generated by the insights coming from that data, often to the point of redundancy.

Using ETL and the more modern form of ETL – Extract, Load, Transform (ELT) (i.e., you move the data, store it, and then transform it by purpose) tools – combined with data warehouse or data lake automation reduce the time, effort, and human resources required to react to new developments and requirements in the rapidly evolving context of data analytics and data science up to a factor of ten.

Continue reading - request the full article using the form below.

Data Protection*
Back to top