The evolution of smart data pipelines

The potential of artificial intelligence (AI) and machine learning (ML) is almost limitless in terms of the ability to derive and promote new sources of customer, product, service, operational, environmental, and social value. If your organization is to compete in the economy of the future, then artificial intelligence must become the core of your business operations.

Kearney’s project titled “Impact of 2020 analysisHighlighting the untapped profitability and business impact, these organizations are looking for reasons to accelerate their data science (AI/ML) and data management investments:

  • If explorers are as effective as leaders, they can increase profitability by 20%
  • If followers are as effective as leaders, their profitability can increase by 55%
  • If laggards are as effective as leaders, their profitability can increase by 81%

In addition to a major organizational challenge-data, the business, operational, and social impact can be staggering. The godfather of artificial intelligence Andrew Ng pointed out that data and data management have obstacles in empowering organizations and society to realize the potential of artificial intelligence and machine learning:

“The model and code of many applications is basically a solved problem. Now that the model has developed to a certain degree, we must also make the data work.” — Wu Enda

Data is the core of training AI and ML models. High-quality, trusted data orchestrated through efficient and scalable pipelines means that artificial intelligence can achieve these compelling business and operational results. Just as a healthy heart needs oxygen and reliable blood flow, a steady stream of clean, accurate, rich, and credible data is important to AI/ML engines.

For example, a CIO has a team of 500 data engineers who manages more than 15,000 extract, transform, and load (ETL) jobs, which are responsible for spanning hundreds of dedicated data repositories (data marts, data warehouses) , Data Lake and Data Lake Repository). They perform these tasks in the organization’s operations and customer-facing systems in accordance with extremely strict service level agreements (SLAs) to support their growing diverse data consumers. It looks like Rube Goldberg must have become a data architect (Figure 1).

Figure 1: Rube Goldberg data architecture

The debilitating spaghetti architecture structure that reduces disposable, dedicated, static ETL programs used to move, clean, align, and transform data greatly inhibits the “insight time” required for organizations to take full advantage of the unique economic characteristics of data ,this”The most precious resource in the world” according to economist.

The emergence of smart data pipelines

The purpose of the data pipeline is to automate and expand common and repetitive data collection, conversion, movement, and integration tasks. A correctly constructed data pipeline strategy can accelerate and automate the processing related to collecting, cleaning, transforming, enriching, and moving data to downstream systems and applications. As the amount, variety, and speed of data continue to grow, the demand for data pipelines that can be linearly expanded in cloud and hybrid cloud environments has become more and more important to business operations.

Data pipeline refers to a set of data processing activities, which integrates operation logic and business logic to perform high-level data source, transformation and loading. Data pipelines can run as planned, in real-time (streaming), or triggered by predetermined rules or sets of conditions.

In addition, logic and algorithms can be built into data pipelines to create “smart” data pipelines. Smart pipelines are reusable and scalable economic assets that can be dedicated to the source system and perform necessary data transformations to support the unique data and analysis requirements of the target system or application.

As machine learning and AutoML become more and more popular, data pipelines will become more and more intelligent. Data pipelines can move data between advanced data expansion and transformation modules, and neural networks and machine learning algorithms can create more advanced data transformations and expansions in it. This includes segmentation, regression analysis, clustering, and the creation of advanced indexes and propensity scores.

Finally, AI can be integrated into the data pipeline so that they can continuously learn and adapt based on the source system, the required data transformation and enrichment, and the evolving business and operational requirements of the target system and applications.

For example, the smart data pipeline in healthcare can analyze the grouping of medical diagnosis related group (DRG) codes to ensure the consistency and completeness of DRG submissions, and detect fraud systems when the data pipeline moves DRG data from the source to the analysis system.

Realize business value

Chief data officers and chief data analysis officers face the challenge of unlocking the business value of data-applying data to the business to drive quantifiable financial impact.

The ability to provide high-quality, credible data to the right data consumers at the right time to facilitate more timely and accurate decision-making will become a key differentiating factor for today’s data-rich companies. The Rube Goldberg system of ELT scripts and different, analysis-centric special repositories hindered the organization’s ability to achieve this goal.

Learn more about smart data pipelines Modern enterprise data pipeline (Ebook) Dell Technology here.

This content was produced by Dell Technologies. It was not written by the editors of MIT Technology Review.

Source link

Leave a Reply

Your email address will not be published.