Get updates delivered to you daily. Free and customizable.
Dr Mehmet Yildiz
Introduction to Big Data Platforms
2021-02-08
Big Data is ubiquitous. It relates to all aspects of our lives. While Big Data is giving us intelligence and offering new capabilities, Big Data also cause grief and concerns for many of us. Reflecting these concerns, I came across memes calling Big Data as Big Brother.
Big Data is a broad topic. Thus. I plan to share my experience posting several articles in order of importance.
As a technologist, dealing with Big Data is my hobby, passion, and part of my profession. Data analytics and analysis serve fuel to my metalhead. My passion helps me earn my living in technology, so I am grateful to share my expertise with aspiring data professionals and technology enthusiasts.
To introduce Big Data, I want to start with platforms and cover other aspects in my upcoming posts.
In this article, I want to start with and introduce Big Data platforms to beginners. Data platforms are critical because every Big Data business solution requires a specific platform. A Big Data platform is consisting of several layers. These layers perform different functions but they are interrelated.
Let me briefly introduce these layers with some practical examples.
Layer One
The first layer of the Big Data platform is the shared operational information zone.
The information zone consists of data types such as:
data in motion,
data at rest, and
data in several other forms.
The information zone also includes:
legacy data sources,
new data sources,
master data hubs,
reference data hubs, and
content repositories.
Layer Two
The second layer of the data platform is called processing. This substantial layer includes:
data ingestion,
operational information,
landing area,
analytics zone,
archive,
real-time analytics,
exploration,
integrated warehouse,
data lakes, and
data mart zones.
This layer needs to have a governance model for metadata catalogue including data security and disaster recovery of systems, storage and hosting and other infrastructure components such as local processing and storage.
The critical infrastructure for Big Data platforms is Cloud computing and Edge Computing processing and storage. The IoT (Internet of Things) backbone also relate to this layer.
The third layer of the data platform is the analytics platform.
The analytics platform consists of:
functions,
process, and
tools
These functions, process and tools can include:
real-time analytics,
information planning,
forecasting,
decision making,
predictive analytics,
descriptive analytics,
prognostic analytics,
data discovery,
data visualisations,
executive dashboard, and
other analytics features as required in a particular Big Data solution.
This layer is also comprehensive and involves many practitioners such as data architects, data scientists, data speiclists, implementers and administrators.
In addition, substantial input may be required from business stakeholders such as executive decision-makers, CDO (Chief Data Officer), CMO (Chief Marketing Officer), even CFO (Chief Financial Officer).
The fourth layer of the data platform consists of outputs such as:
business processes,
decision-making schemes, and
point of interactions.
This layer of the data platform must be well-governed. Access needs to be provided with established controls for the data platform professionals such as data scientists, data architects, analytics experts, and business users.
After introducing these essential layers, I want to highlight a critical point: level of schema.
Level of schema
Level of the schema for the data platform is a crucial architectural and design consideration. We can classify the schema level under three categories:
no schema,
partially structured schema, and
full structured schema.
Schema reflects the structure of data and databases. We can think of a schema as a blueprint for data management.
Related to platforms another critical point is data processing levels.
Data processing levels
The data processing levels are the other architectural considerations.
The processing levels could be:
raw data,
validated data,
transformed data and
calculated data.
Other structural classifications of data in data platforms are related the business relevance.
Business relevance
We can categorise the business relevance of data as:
external data,
personal data,
departmental data, and
enterprise data.
Understanding Big Data platform function and components can be useful for all stakeholders of the Big Data solution in business organizations. While business executives like CIO, CISO, CDO, CMO, and CFO need to understand these layers at a high level, data architects, data scientists, data specialists, implementers, testers, and administrators need to understand them in more detailed level.
Get updates delivered to you daily. Free and customizable.
Welcome to NewsBreak, an open platform where diverse perspectives converge. Most of our content comes from established publications and journalists, as well as from our extensive network of tens of thousands of creators who contribute to our platform. We empower individuals to share insightful viewpoints through short posts and comments. It’s essential to note our commitment to transparency: our Terms of Use acknowledge that our services may not always be error-free, and our Community Standards emphasize our discretion in enforcing policies. We strive to foster a dynamic environment for free expression and robust discourse through safety guardrails of human and AI moderation. Join us in shaping the news narrative together.
Comments / 0