Dr Mehmet Yildiz

How To Deal With Big Data For Artificial Intelligence?

2020-12-20

An architectural approach and informed perspective to Big Data Management by an expert in the field.

https://img.particlenews.com/image.php?url=37oO6G_0Y3ntzwe00

I am a technologist and love dealing with data.

Are you a data architect and want to engage in artificial intelligence arena? This article can be a good starting point for you.

My goal is to increase data literacy and intelligence of my readers. I authored several industry whitepapers, academic papers, and published technical books in the field.

In this article, I provide a pragmatic architectural overview of the Big Data management lifecycle covering the distinct phases and essential definitions with critical solution considerations based on my architectural experience acquired from many successful and failed projects.

I learnt that architecting Big Data solutions lifecycle pragmatically with rigour can substantially contribute to the delivery of quality Artificial Intelligence (AI) and Cognitive solutions especially in enterprise modernisation and digital transformation programs. These transformative programs are empowered by Big Data and AI, integrated with other emerging technology domains e.g. Cloud, Edge, IoT, Blockchain, and Mobile technologies.

Let’s be mindful that the context in this article is Big Data Architects, not Data Scientists. Hence, upfront, I want the readers to review the content and understand the key messages from a solution architect’s perspective, not from a data scientist’s. The role of a Big Data Architect and a Data Scientist in the Big Data lifecyle is totally different and require different involvement. However, I’d welcome views from data scientist to extend the topic and highlight their expectations from the Big Data solution architects as this extended view can be synergistic.

As a key enabler of artificial intelligence, cognitive computing, and subsets of AI e.g. machine learning, deep learning, expert systems, and neural networks, Big Data solutions are critical business-focused domain at the global level. Therefore, understanding the Big Data lifecycle and architecting Big Data solutions with pragmatic rigour is a compelling capability for AI professionals and AI business stakeholders.

To maintain clarity, I want to start with defining data architecture at a high level. You can find various definitions of data architecture in the data management body of knowledge, textbooks, and user-generated content.

Data architecture is an established domain in the data science discipline. In this article, I’d offer my interpretation which suits the context, content, and purpose of my intended message. At the highest level, data architecture is the process of collecting data coming from multiple data sources and manipulating data sets, practice, and platforms from a current state to future state using established frameworks and models.

The architectural framework for data management includes describing the structure of the source data, its manipulation process, and the structure of the target data for future use in order to create business insights from the data solutions. The architectural term ‘description’ is the keyword in this definition; hence it needs to be understood for architecting data solutions.

The architectural description refers to describing the life cycle of how data is collected, processed, stored, used and archived. A Big Data solution architect can undertake the accountability of creating the architectural description from a current state to a target state.

The term ‘manipulation’ is also critical. It refers to the process of moving data, changing data structures, data items, data groups, and data stores. The manipulation process also includes major architectural activities such as integrating data artefacts to application landscape, communications, interactions, data flow, analysis, source and target locations, and data consumption profiles.

Let’s understand what Big Data is

One significant fact is that Big Data is ubiquitous. Big data is different from traditional data. The main differences come from characteristics such as volume, velocity, variety, veracity, value and overall complexity of data sets in a data ecosystem. Understanding these V words provide useful insights into the nature of Big Data.

There are many definitions in the industry and academia for Big Data; however the most succinct yet comprehensive definition which I agree comes from the Gartner: “Big data is high-volume, high-velocity, and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making”. The only missing keyword in this definition is the ‘veracity’. I’d also add to this definition that these characteristics are interrelated and interdependent.

Let me provide the brief definitions of these V-words who are new to the Big Data domains.

Volume refers to the size or amount of data sets in terabytes, petabytes or exabytes. However, there are no specific definitions to determine the threshold for Big Data volumes.

Velocity refers to the speed of producing data. Big Data sources generate high-speed data streams coming from real-time devices such as mobile phones, social media, IoT sensors, edge gateways, and the Cloud infrastructure.

Variety refers to multiple sources of data. The data sources include structured transactional data, semi-structured such as web site or system logs, and unstructured such as video, audio, animation, pictures.

Veracity means the quality of the data. Since volume and velocity are enormous in Big Data, maintaining veracity can be very challenging. It is essential to have quality output to make sense of data for business insights.

Value from veracity as the primary purpose of Big Data. The objective of Big Data solutions is to create business insights and gain business value. The value can be created with an innovative and creative architectural approach with input from all the stakeholders of a Big Data solution.

Overall complexity for Big Data refers to more data attributes and difficulty to extract desired value due to large volume, wide variety, enormous velocity, and required veracity to create desired business value.

Even though architecturally similar to the traditional data, Big Data requires newer methods and tools to deal with these particular characteristics. It is important to highlight that the traditional methods and tools are not adequate to process big data.

The process, in this context, refers to capturing a substantial amount of data from multiple sources, storing, analysing, searching, transferring, sharing, updating, visualising, and governing huge volumes data manifesting in petabytes or even exabytes in sizable business organisations.

Ironically, the main concern or aim of Big Data is not the amount of data but more advanced analytics techniques to produce business value out of these complex and large volumes of data. The advanced analytics, in this context, refers to approaches such as descriptive, predictive, prescriptive, and diagnostic analytics.

At the highest level, descriptive analytics deals with situations such as what is happening right now based on incoming data. The predictive analytics refers to what might happen in the future. Prescriptive analytics deals with actions to be taken. Diagnostic analytics ask the question of why something happened. Each analytics type serves difference scenarios and use-cases.

Data is architected and managed layer by layer

As Big Data architects, we use a top-down approach to start the solution description on a layer by layer basis. There are three layers that we need to consider from an architectural standpoint: conceptual, logical, and physical.

The first layer for description is the conceptual, representing the business entities for data.

The second layer is logical, describing the relationship between objects.

The third layer is physical, representing the data mechanisms and functionality.

Now, let’s look at the lifecycle management covering these layers.

An overview of Big Data lifecycle management

As Big Data solution architects, we must understand the lifecycle, as we are engaged in all phases of the lifecycle as a technical leader. Our roles and responsibilities may differ in different phases; however, we need to be on top of the lifecycle management from an end to end perspective.

From an architectural solutions perspective, based on my experience and input obtained from industry publications, a typical Big Data solution, similar to traditional data lifecycle, can include a dozen of distinct phases in the overall data lifecycle solution.

Big Data solution architects are engaged in all phases of the lifecycle, providing different input and producing different output for each phase. These phases may be implemented under various names in different data solution teams. There is no rigorous universal systemic approach to the Big Data lifecycle as the field is still evolving. The learnings from traditional data management are transferred and enhanced for particular solution use cases.

For awareness and guiding purposes to the aspiring Big Data architects, I propose the following distinct phases:

Phase 1: Foundations

Phase 2: Acquisition

Phase 3: Preparation

Phase 4: Input and Access

Phase 5: Processing

Phase 6: Output and Interpretation

Phase 7: Storage

Phase 8: Integration

Phase 9: Analytics and Visulisation

Phase 10: Consumption

Phase 11: Retention, Backup, and Archival

Phase 12: Destruction

Let me provide you with an overview of each phase with some guiding points. You can customise the names of these phases based on the requirements and organisational data practice of your Big Data solutions. The key point is that they are not set in stone.

Phase 1: Foundations

In data management process, the foundation phase includes various aspects such as understanding and validating data requirements, solution scope, roles and responsibilities, data infrastructure preparation, technical and non-technical considerations, and understanding data rules in an organisation.

This phase requires a detailed plan facilitated ideally by a data solution project manager with substantial input from the Big Data solution architect and some data domain specialists.

A Big Data solution project includes details such as plans, funding, commercials, resourcing, risks, assumptions, issues, and dependencies in a project definition report (PDR). Project Managers compile and author the PDR; however, the solution overview in this critical artefact is provided by the Big Data Architect.

Phase 2: Data Acquisition

Data Acquisition refers to collecting data. Data sets can be obtained from various sources. These sources can be internal and external to the business organisations. Data sources can be in structured forms such as transferred from a data warehouse, a data mart, various transaction systems, or semi-structured sources such as Weblogs, system logs, or unstructured sources such as coming from media files consist of videos, audios, and pictures.

Even though data collection is conducted by various data specialists and database administrators, the Big Data architect has a substantial role in facilitating this phase optimally. For example, data governance, security, privacy, and quality controls start with the data collection phase. Therefore, the Big Data architects take technical and architectural leadership of this phase.

The lead Big Data solution architect, in liaison with enterprise and business architects, lead and document the data collection strategy, user requirements, architectural decisions, use cases, and technical specifications in this phase. For comprehensive solutions of sizable business organisations, the lead Big Data architect can delegate some of these activities to various domain architects and data specialists.

Phase 3: Data Preparation

In the data preparation phase, the collected data — in raw format- is cleaned or cleansed — these two terms are interchangeably used in different data practices of various business organisations.

In the data preparation phase, data is rigorously checked for inconsistencies, errors, and duplicates. Redundant, duplicated, incomplete, and incorrect data are removed. The objective is to have clean and useable data sets.

The Big Data solution architect facilitates this phase. However, most data cleaning tasks, due to granularity of activities, can be performed by data specialists who are trained in data preparation and cleaning techniques.

Phase 4: Data Input and Access

Data input refers to sending data to planned target data repositories, systems, or applications. For example, we can send the clean data to determined destinations such as a CRM (Customer Relationship Management) application, a data lake for data scientists, or a data warehouse for use by specific departments. In this phase, data specialists transform the raw data into a useable format.

Data access refers to accessing data using various methods. These methods can include the use of relational databases, flat files, or NoSQL. The NoSQL is more relevant and widely used for Big Data solutions in various business organisations.

Even though the Big Data solution architect leads this phase; they usually delegate the detailed activities to data specialists and database administrators who can perform the input and access requirements in this phase.

Phase 5: Data Processing

Data Processing phase starts with processing the raw form of data. Then, we convert data into a readable format giving it the form and the context. After completion of this activity, we can interpret the data using the selected data analytics tools in our business organisation.

We can use common Big Data processing tools such as Hadoop MapReduce, Impala, Hive, Pig, and Spark SQL. The most common real-time data processing tool in most of my solutions were HBase, and the near real-time data processing tool was Spark Streaming. There are many open-source and proprietary tools on the market.

Data processing also includes activities such as data annotation, data integration, data aggregation, and data representation. Let me summarise them for your awareness.

Data annotation refers to labelling the data. For example, once the data sets are labelled, they can be ready for machine learning activities.

Data integration aims to combine data existing in different sources, and it aims to provide a unified view of data to the data consumers.

Data representation refers to the way data is processed, transmitted, and stored. These three essential functions depict the representation of data in the lifecycle.

Data aggregation aims to compile data from databases to combined datasets to be used for data processing.

In the data processing phase, data may change its format based on consumer requirements. Processed data can be used in various data outputs in data lakes, in enterprise networks, and connected devices.

We can further analyse the data sets for advanced processing techniques using various tools such as Spark MLib, Spark GraphX, and several other machine learning tools.

Big Data processing requires the involvement of various team members with different skills sets. While the lead Big Data solution architect leads the processing phase, most of the tasks are performed by data specialists, data stewards, data engineers, and data scientists. The Big Data solution architect facilitates the end to end process for this phase.

Phase 6: Data Output and Interpretation

In the data output phase, the data is in a format which is ready for consumption by the business users. We can transform data into useable formats such as plain text, graphs, processed images, or video files.

The output phase proclaims the data ready for use and sends the data to the next phase for storing. This phase, in some data practices and business organisation, is also called the data ingestion. For example, the data ingestion process aims to import data for immediate use or future use or keep it in a database format.

Data ingestion process can be in a real-time or in a batch format. Some standard Big Data ingestion tools that were commonly used in my solutions were Sqoop, Flume, and Spark streaming. These are popular open-source tools.

One of the activities is to interpret the ingested data. This activity requires analysing ingested data and extract information or meaning out of it to answer the questions related to the Big Data business solutions.

Phase 7: Data Storage

Once we complete the data output phase, we store data in designed and designated storage units. These units are part of the data platform and infrastructure design considering all non-functional architectural aspects such as capacity, scalability, security, compliance, performance and availability.

The infrastructure can consist of storage area networks (SAN), network-attached storage (NAS), or direct access storage (DAS) formats. Data and database administrators can manage stored data and allow access to the defined user groups.

Big Data storage can include underlying technologies such as database clusters, relational data storage, or extended data storage, e.g. HDFS and HBASE, which are open source systems.

In addition, the file formats such as text, binary, or other types of specialised formats such as Sequence, Avro, and Parquet must be considered in data storage design phase.

Phase 8: Data Integration

In traditional models, once the data is stored, it ends the data management process. However, for Big Data, there may be a need for the integration of stored data to different systems for various purposes.

Data integration is a complex and essential architectural consideration in Big Data solution process. Big Data architects are engaged to architect and design the use of various data connectors for the integration of Big Data solutions. There may be use cases and requirements for many connectors such as ODBC, JDBC, Kafka, DB2, Amazon S3, Netezza, Teradata, Oracle and many more based on the data sources used in the solution.

Some data models may require integration of data lakes with a data warehouse or data marts. There may also be application integration requirements for Big Data solutions.

For example, some integration activities may comprise integrating Big Data with dashboards, tableau, websites, or various data visualisation applications. This activity may overlap with the next phase, which is data analytics.

Phase 9: Data Analytics & Visualisation

Integrated data can be useful and productive for data analytics and visualisation.

Data analytics is a significant component of Big Data management process. This phase is critical because this is where business value is gained from Big Data solutions. Data visualisation is one of the key functions of this phase.

We can use many productivity tools for analytics and visualisation based on the requirements of the solution. In my Big Data solutions, the most commonly used tools were Scala, Phyton, and R notebooks. Phyton was selected as the most productive tool touching almost all aspects of the data analytics especially to empower machine learning initiatives.

In your business organisation, there can be a team responsible for data analytics led by a chief data scientist. Big Data solution architects have a limited role in this phase however they closely work with the data scientists to ensure the analytics practice and platforms are aligned with business goals. The Big Data solution architects need to ensure the phases of the lifecycle are completed with an architectural rigour.

Phase 10: Data Consumption

Once data analytics takes place, then the data is turned into information ready for consumption by the internal or external users, including customers of the business organisation.

Data consumption require architectural input for policies, rules, regulations, principles, and guidelines. For example, data consumption can be based on a service provision process. Data governance bodies create regulations for service provision.

The lead Big Data Solution Architect leads and facilitates the creation of these policies, rules, principles, and guidelines using an architectural framework selected in the business organisations.

Phase 11: Retention, Backup, & Archival

We know that critical data must be backed up for protection and meeting industry compliance requirements. We need to use established data backup strategies, techniques, methods, and tools. The Big Data solution architect must identify, document, and obtain approval for the retention, backup, and archival decisions.

The Big Data solution architect may delegate the detailed design of this phase to an infrastructure architect assisted by several data, database, storage, and recovery domain specialists.

Some data for regulatory or other business reasons may need to be archived for a defined period of time. Data retention strategy must be documented and approved by the governing body, especially by enterprise architects, and implemented by the infrastructure architects and the storage specialists.

Phase 12: Data Destruction

There may be regulatory requirements to destruct a particular type of data after a certain amount of times. The requirements may change based on the industries that the business organisations belong to.

Even though there is a chronological order for the life cycle management, for producing Big Data solutions, some phases may slightly overlap and can be done in parallel.

The life cycle proposed in this article is only a guideline for awareness of the overall process. You can customise the process based on the structure of the data solution team, unique data platforms, data solution requirements, use cases, and dynamics of the owner organisation, its departments, or the overall enterprise ecosystem.

Now that we covered an overview of the lifecycle phases, let me touch on and provide a high-level understanding of the Big Data solution components.

Big Data Solution Components

Big Data solution architecture begins with an understanding of the Big Data process on an end to end basis. Understanding solution components can help us, and other stakeholders see the big picture of Big Data process. We can categorise Big Data process under two broad categories. The first one is Data Management, and the second one is Data Analytics.

Data management includes multiple activities as described in the lifecycle, such as data acquisition, extraction, cleansing, annotation, processing, integration, aggregation, and representation.

Data Analytics includes activities such as data modelling, data analysis, data interpretation, and data visualisation.

As Big Data solution architects, we need to understand the critical components in the lifecycle such as data types, principles, platforms, quality specifications, governance, security, privacy, analytics, semantics, patterns, data lakes, data swamps, and traditional data warehouse concepts.

These are the fundamentals, and there may be several other components based on the solutions use cases and user requirements.

Let me briefly mention the data types as fundamental considerations in architecting Big Data solutions for your awareness.

Data Types

We can categorise data types as structured, semi-structured, and unstructured. Structured data is traditionally well-managed, relatively more straightforward, and not a big concern of the overall data management process.

However, the challenge is related to semi-structured and more importantly dealing with unstructured data. These two are critical considerations for Big Data solutions. These two data types can add real business value in obtaining the desired information and consuming them for business insights.

The primary concern with the semi-structured data is that this type of data does not conform to standards strictly. We can implement semi-structured data with the use of XML (Extensible Markup Language). XML is a textual language for exchanging data on the World Wide Web. XML uses user-defined data tags that can make them machine-readable.

Clickstream data is another example of semi-structured data. For example, this type of data provides comprehensive data sets about users’ behaviour and their browsing patterns to online shops. This data type is widespread and relevant to Big Data analytics for generating business insights.

Unstructured data is the concern of text analytics, which aims to extract the required information from textual data. Some textual data examples are blogs, articles, emails, documents, news, and other forms of content in social network sites.

Text analytics can include computational linguistics, machine learning, and traditional statistical analysis. Text analytics focus on converting massive volumes of a machine or human-generated text into meaningful structures to create business insights and support business decision-making.

We can use various text analytics techniques. For example, information extraction is one of the text analytics techniques which extract structured data from unstructured text.

Text summarization is a common technique which can automatically create a condensed summary of a document or selected groups of documents. This technique is particularly useful for blogs, articles, news items, product documents, and scientific papers.

NLP (Natural Language Processing) is a sophisticated text analytics technique interfaced as question and answers in natural language. NLP is commonly used by consumer products such as Siri by Apple and Alexa by Amazon.

One of the recent growing text analytics technique is sentiment analysis. It aims to analyse people’s views about individuals, publications, products or services. This is commonly used for marketing purposes. One example of sentiment analysis is the use of the microblogging site Twitter. We can analyse massive amounts of tweets to obtain positive, negative, or neutral sentiments for a business product or service.

In addition to text analytics, unstructured data is also analysed in human speech. This is referred to as speech analysis or audio analysis in some data management publications. Human speech is commonly used in call centres to improve customer satisfaction and meet specific regulatory requirements.

Another unstructured data analysis is a picture and video content analysis. These are still at infancy, but there is a trend to create new techniques to analyse photos and content of video for information insights. Machine learning and deep learning domains have a substantial focus on this type of analytics.

Due to the relatively large size of videos, this is not as easy as text analytics. One of the critical business applications of video content analysis is in the security domain commonly used in data generated by the CCTV cameras, automated security, and surveillance systems.

Let me briefly mention the data principles from an architectural point of view.

Data Principles

Data management process requires consideration of established principles. There are country or geography level principles produced by governing bodies. For example, the most popular ones are the GDPR (General Data Protection Regulation) and the CCPA (California Consumer Privacy Act).

In recent years, GDPR became extra popular in the media. GDPR is a regulation in European Union law on data protection and privacy for all individual citizens of the EU and the European Economic Area.

To give you an idea on the data management principles, GDPR offers the following seven principles. These principles sound universal as they are widely repeated in data management publications.

1. Lawfulness, fairness, and transparency

2. Purpose limitation

3. Data minimisation

4. Accuracy

5. Storage limitations

6. Integrity and confidentiality

7. Accountability

We don’t need to go into details for each principle here as they can be reviewed from the GDPR site. These principles are common sense and straight forward to understand by data professionals.

These principles cover significant aspects of data management in an organisation. As Big Data architects, we need to consider these principles and apply them to our Big Data solution governance model.

There may also be principles developed by our organisations’ governing bodies in addition to data management policies, processes, procedures, and guidelines. Our Big Data solution governance model must incorporate these principles.

Data Quality Specifications

Data quality is vital to the end goal of data usage for architectural, technical, governance, security, compliance, and user consumption purposes. Higher quality in data specifications can yield the better-desired results for the Big Data solutions.

As Big Data architects, we must consider the critical data quality factors such as data elements being complete, unique, current, and conforming. The data quality specifications can be developed using system-generated reports, auditing, and issues raised by users.

Completeness in terms of data quality refers to making sure that necessary elements of data are available at the lifecycle of the data management process.

The uniqueness of data refers to having no duplicates of the data elements.

The data currency refers to being up-to-date. Obsolete data is meaningless and useless.

Besides, we need to confirm that the data elements are specific to their domains.

Big Data quality can be gauged using relevant data sources, optimised analytical models, and obtaining favourable results translating into data consumer experience and profitability for the enterprise. These results can be tangible or intangible.

Another important aspect of Big Data solution is understanding the data platforms.

Big Data Platforms

Every Big Data solution requires a specific platform. A Big Data platform is consisting of several layers. The first layer of the Big Data platform is the shared operational information zone, consists of the data types such as data in motion, data at rest, and data in several other forms. It includes legacy data sources, new data sources, master data hubs, reference data hubs, and content repositories.

The second layer of the data platform is called processing. This substantial layer includes data ingestion, operational information, landing area, analytics zone, archive, real-time analytics, exploration, integrated warehouse, data lakes, data mart zones. This layer needs to have a governance model for metadata catalogue including data security and disaster recovery of systems, storage and hosting and other infrastructure components such as local processing and storage and the Cloud processing and storage.

The third layer of the data platform is the analytics platform. It consists of functions, process, and tools such as real-time analytics, information planning, forecasting, decision making, predictive analytics, data discovery, visualisations, dashboard, and other analytics features as required in a particular Big Data solution.

The fourth layer of the data platform consists of outputs such as business processes, decision-making schemes, and point of interactions. This layer of the data platform also needs to be well-governed, and access needs to be provided with established controls for the data platform professionals such as data scientists, data architects, analytics experts, and the business users.

Level of the schema for the data platform is a crucial architectural consideration. We can classify the level of schema under three categories, such as no schema, partially structured schema, and full structured schema. Schema reflects the structure of data and databases. We can think of a schema as a blueprint for data management.

Some examples of no schema are video, audio and picture files; social media feed, partial schema such as email, instant messaging logs, system logs, call centre logs; and high schema can be structured sensor data and relational transaction data.

The data processing levels are the other architectural considerations. The processing levels could be raw data, validated data, transformed data and calculated data.

Other structural classifications of data in data platforms are related the business relevance. We can categorise the business relevance of data as external data, personal data, departmental data, and enterprise data.

Let me touch on the Big Data Governance as this is one of the most important architectural consideration.

Big Data Governance

Data governance is a critical factor for Big Data solutions. The Big Data governance system needs to consider essential factors such as security, privacy, trust, operability, conformance, agility, usability, innovation, and transformation of data. These factors may result in competing actions for Big Data solution architects. For example, innovation and conformance are in two different ends of the spectrum hence require critical architectural tradeoffs.

It is also vital at a fundamental level that a data governance infrastructure must be established and evolved for adoption not only in program solution level but also at the enterprise level. We need to work closely with the Enterprise Architects to address data governance concerns.

Data governance may take consideration for different stakeholders in various data platforms. For example, data architects are responsible for developing the governance of big data models; data scientist are accountable for the governance of analytics. Business stakeholders are responsible for the governance of business models for producing business results for the data platforms in concern.

Big Data governance is a broad area and covers data components, scope, requirements handling, strategy, architectural decisions, design, development, analysis, tests, processing, implementation, stakeholder relationships, input, output, business goals, business insights, and several other aspects of data management and analytics process.

To conclude the article, as a final point, I want to touch on the importance of business vocabulary for Big Data solutions from an architectural standpoint.

Business Vocabulary

Business vocabulary is a critical aspect of the data management process. We must define the business vocabulary to maintain a shared understanding of Big Data pertinent to business analytics. Business vocabulary is also called business glossary in some methods and can be customised based on the various factors at organisational level.

The business vocabulary describes the business content supported by the data models. More importantly, from an architectural perspective, this vocabulary can be a crucial input to the metadata catalogue.

Business vocabulary provides consistent terms to be used by the whole organisation. In many organisations, the business units own the business vocabulary. Usually, in many organisations, business users maintain this vocabulary; however, enterprise architects and the Big Data solution architects lead and facilitate the governance of business vocabulary.

Conclusion

In this article, I provided a high-level view and quick introduction to the Big Data solution lifecycle with an emphasis on architectural rigour while developing business solutions. Big data solutions constitute critical success factors for artificial intelligence (AI). I am confident enough to make a bold statement that without architecting Big Data solutions with methodical rigour balanced with speedy pragmatic approach, it is not possible to produce effective, competitive, and sustainable AI solutions.

AI solutions are substantially dependent on volume, velocity, variety, and veracity of data. Big Data solutions revolve around dealing with these four key V characteristics plus generating market insights and business value.

I shared my Big Data solution experience in one of my latest books titled Big Data for Enterprise Architects. introducing the critical role of a Big Data solution architect to the Enterprise Architects for them to understand the implications and impact of the topic for rapidly changing business organisation reflected from practical and agile settings rather than the traditional views and theoretical approaches provided in textbooks.

I plan to cover AI data management practice and associated lessons learnt from the cognitive transformation programs in one of my upcoming articles. In the meantime, your input to extend the scope I provided in this article can be useful and synergistic.

I aim to increase data literacy and intelligence of my readers.

Follow me to see more articles like this.

How To Deal With Big Data For Artificial Intelligence?

An architectural approach and informed perspective to Big Data Management by an expert in the field.

Let’s understand what Big Data is

Data is architected and managed layer by layer

An overview of Big Data lifecycle management

Phase 1: Foundations

Phase 2: Data Acquisition

Phase 3: Data Preparation

Phase 4: Data Input and Access

Phase 5: Data Processing

Phase 6: Data Output and Interpretation

Phase 7: Data Storage

Phase 8: Data Integration

Phase 9: Data Analytics & Visualisation

Phase 10: Data Consumption

Phase 11: Retention, Backup, & Archival

Phase 12: Data Destruction

Big Data Solution Components

Data Types

Data Principles

Data Quality Specifications

Big Data Platforms

Big Data Governance

Business Vocabulary

Conclusion

Comments / 0