Effectively, let’s have an OPEN and HONEST dialogue in regards to the Standing Quo and the Way forward for SQL Question Engines for Large Information.


In September 2021, Matt Turck printed a (lengthy!) submit, Pink Scorching: The 2021 Machine Studying, AI and Information (MAD) Panorama, and supplied a macro view of the MAD ecosystem for 2021. After I first noticed the insanely packed ecosystem map from Matt’s article, I instantly felt the PAINs and STRUGGLEs from CIOs and Gross sales Groups. CIOs may really feel deeply annoyed when considering by means of the professionals and cons of every possibility. Gross sales workforce leaders could be cursing lots, considering exhausting about how you can survive the endless gross sales cycles as extra rivals enter this market. Nonetheless, that is just the start of how modern the massive information ecosystem might change into. As information warehouses and lakehouses penetrate each group on the planet, indubitably, this panorama will change into much more bloated.


Pink Scorching: The 2021 Machine Studying, AI and Information (MAD) Panorama right here


As we analyze and maintain observe of the highest gamers on the checklist through the years, we discover every vendor has its distinctive worth and market match. For instance, Dremio and Databricks absolutely embrace the next-gen, no-copy lakehouse structure with a great deal of technical improvements and breakthroughs inside their choices; Clickhouse neighborhood is booming and able to scale within the international market; Presto/Starburst is featured with working interactive federated queries; Apache Druid is favored for its lightning-fast analytical question efficiency and real-time characteristic; PingCAP affords a groundbreaking answer that handles combined transactional and analytical workloads; Kyligence/Kylin modernizes and reinvents OLAP multi-dimensional cubes for cloud purposes. Some area of interest startups, corresponding to Firebolt, change into extra seen just lately, claiming second question response time. Total, all of the disruptors are creating new values and breaking the established order for the analytics neighborhood. They’re working exhausting to maneuver the needle for purchasers throughout industries.


What does this phenomenon point out?


My reply sounds a bit unfavourable — None of them is ready to fulfill the varied wants and evolving use circumstances throughout industries —from real-time analytics, OLTP, OLAP to a hybrid of some or extra.


What’s worse?


In as we speak’s market, every vendor has developed its personal distinctive worth proposition by specializing in fixing a selected problem arising from a selected use case for an outlined phase of patrons. It’s nearly unattainable for a single firm to construct an ideal all-in-one product and take market share from its rivals. We now have to confess the truth that:


There isn’t a one-fits-all answer!


Thus, each firm should buy a unique question engine for every particular use case. Wanting into the longer term, this development will proceed, whether or not it’s for avoiding vendor lock-in or filling some gaps that mainstream cloud distributors are unwilling to do. Every firm will find yourself adopting a couple of information analytics product, and unavoidably, a brand new type of information silos shall be created with information held in several methods.


Information Silos?


No, we positively don’t wish to deliver that again as we’re reinventing the wheels in lots of companies for a greater future.


There’s something mistaken with our market.

Why does this dilemma happen? Essentially, we imagine there’s a mindset flaw in all of the gamers concerned within the warfare of the trendy information stack. Every participant is considering from their perspective as a substitute of placing their clients first. “Your win isn’t equal to clients’ success.”

Prospects don’t need information silos created by every question engine they purchase. However in actuality, clients additionally perceive one dimension does NOT match all!

So, how you can get out of this dilemma?


Right here is our proposal:


All distributors have to collaboratively re-imagine and re-engineer the Subsequent Technology of SQL Question Engine for the good thing about all events:


A Unified Question Entry Level

on High of Decentralized Question Engines/Information Sources


For finish information customers, this center layer creates a single entry level for them to entry information silos transparently;

For tech distributors, they’ll play to their greatest power and give attention to fixing their well-defined issues;

For patrons/corporations, they’ll get the perfect out of all distributors with out worrying in regards to the integration work.

On high of that, we have to add extra values to this layer for our clients: This center layer ought to be tremendous performant, scalable, and LOW-COST.

That is our perception relating to what the longer term ought to appear to be:

The longer term SQL question engine ought to present a unified question entry level on high of decentralized information sources and assist high-concurrency, low-latency, real-time information entry at LOW COST.


Right here is our strive:


Let me stroll you thru the underlying logic of how Kyligence designs its question engine to match future wants.



Efficiency and Price


The Exponential Progress of Information is unbiased of Price & Question Efficiency


Initially, we firmly imagine, that efficiency and value are the 2 main elements clients care most about. Due to this fact, we took the multi-dimensional database idea and constructed a modernized, distributed multi-dimensional database that may match into any form of a knowledge lake. The main advantages supplied by dimensional cubes are: First, efficiency beneficial properties and excessive concurrency — question outcomes are preprocessed beforehand (in different phrases, heavy computation completes offline) and able to serve downstream information customers. So, at question run time, compute energy is principally used for retrieving question outcomes and sending them again to customers. That is the key of why the Kyligence engine can deal with a big quantity of concurrent queries with out sacrificing efficiency. Second, price cuts — precomputed question outcomes, aka indexes, shall be resued as a lot as potential and could be refreshed by segments and partitions. It will assist clients save a great deal of {dollars} for wet days in the long term.

For extra technical particulars of how,

please learn this weblog.


A Unified Question Entry Level on High of Decentralized Information Sources



A modernized OLAP dice sits between information purposes/customers and decentralized information sources as a unified question entry level. It serves as a skinny layer to allow customers to hook up with completely different information sources with out studying how for every supply.

Kyligence can question throughout information sources, together with HDFS, Hive, RDBMS, and different cloud storage. This isn’t the identical because the idea of federated queries.

An instance to clarify this characteristic is that some clients create a separate mission for every information supply within the Kyligence platform; In doing so, end-users from completely different enterprise models can entry information fashions constructed on high of every supply immediately by means of BI instruments. Kyligence additionally makes it simpler for the DevOps workforce to manage information entry in a single place.



Moreover, Kyligence AI-Augmented Engine can detect generally issued queries and automate index constructing to spice up question efficiency and keep away from losing compute energy on processing the identical queries again and again.


Actual-time Streaming Information


This characteristic is at present within the beta stage. With this performance in place, customers can simply mix streaming and batch information in a single information mannequin, no coding required.


Use Case #1 — Information Governance in Lakehouse Period



Multi-dimensional Mannequin, a tidy field of vast tables, eliminates duplications and minimizes price/question with good acceleration.

One other good thing about utilizing Kyligence’s modernized OLAP dice expertise is that it helps you handle, eradicate and reuse ETL pipelines. I do know it’s exhausting to make sense of it. However enable me to place it in context:

First, you’ll be able to consider Kyligence OLAP dice as a field of flat tables, aka indexes. Now, a easy use case will illustrate the way it works:

In 2021, one among Kyligence’s clients confronted the large problem of managing flat desk explosion. This concern was initially attributable to the truth that their inner groups weren’t used to reusing flat tables created by information pipelines owned by different groups. By adopting Kyligence as a knowledge administration device, all groups begin collaborating and creating shared cubes inside the Kyligence platform. Then Kyligence cubes will routinely generate “flat tables” for all groups and intelligently handle the reuse and lifecycles of “flat tables”. That is a part of their answer to cut back 1000k flat tables to an affordable quantity.

Extra on this concern, learn Cease the Insanity! 1000k information warehouse tables received created out of 6k supply tables in 2.5 years.


Use Case #2 — Information Mesh in Follow


Area-oriented decentralized information possession and structure


In the event you perceive the Information Mesh idea, you may discover an awesome match between Kyligence and the concept of “information infrastructure as a central, shared service platform” that Information Mesh requires.

Kyligence Ruled Information Marts matches

Information Area from Information Mesh;

Kyligence Unified Question Entry Level on High of Decentralized Information Sources matches

Decentralized Information Possession and Structure from Information Mesh;

… , and many others.

I typically work with companies that like to prepare their information into domain-oriented initiatives and cubes. These companies then use Kyligence Platform as a shared information infrastructure for all enterprise folks throughout groups. Let’s focus on this matter intimately in upcoming blogs.



Intelligently Handle Your Most Priceless Information | Kyligence

Kyligence ensures high-performance, high-concurrency information providers on your information evaluation and purposes, whereas…bit.ly


Associated Articles

BI Dashboards are Making a Technical Debt Black GapWeblog – Kyligence – Understanding the Metrics Retailer




By admin

Leave a Reply

Your email address will not be published.