This post continues our series on comparing massively parallel processing systems and engines. In a previous article I laid out the testing principles our team follows and shared results from both real-world production scenarios and synthetic benchmarks. That piece sparked discussion — some found the evidence convincing, others questioned whether the results were objective. As promised, here are the results of a benchmark run under the widely accepted TPC-DS standard. Today you'll find out whether switching methodologies changed anything.

Introduction
TPC-DS (Transaction Processing Performance Council – Decision Support) is the industry-standard benchmark for measuring the performance of Decision Support Systems (DSS). In plain terms, it tests how well a given system — a database or a big data platform — handles complex analytical queries under conditions that resemble real production workloads.
The standard was developed by the Transaction Processing Performance Council (TPC) to evaluate systems that process large data volumes and execute complex analytical tasks. TPC-DS models the operations of a retail company — sales, inventory, and marketing — and uses that model to generate a dataset and a query set.
Key characteristics of TPC-DS
Realistic workload: TPC-DS is considered to simulate an analytical workload close to real production use, making its results more meaningful for assessing how systems behave in practice.
Varied, complex queries: The benchmark includes 99 distinct analytical SQL queries that span a wide range of complexity — reporting, OLAP, and data mining. They exercise a system's ability to handle joins, aggregations, and filtering.
Scalable data volumes: TPC-DS can generate datasets ranging from a few gigabytes to hundreds of terabytes, making it possible to test systems at whatever scale factor appropriately matches the hardware under test.
Why TPC-DS matters
TPC-DS is regarded as an objective, standardized way to compare the performance of data analysis platforms. Most database and system vendors publish TPC-DS results to demonstrate what their products can do. The variety of queries in the standard frequently exposes strengths and weaknesses not just of the system as a whole, but of individual subsystems — the query optimizer, the cardinality estimator, and so on. This is why many vendors also use TPC-DS internally when working on optimizer strategies or other engine changes. The wider the adoption of a benchmark, the more credible its results become.
Limitations of the benchmark
TPC-DS models a retail business — sales, inventory, and marketing. Some organizations, particularly in finance, consider it inapplicable to their context and develop their own approaches. That was precisely the situation I described in the previous article.
Another limitation, in my view, is that the standard is weighted toward BI, ROLAP, ad hoc, and light-to-medium ETL scenarios (relative to the scale factor's data volume). It is less representative of heavy ETL workloads, which are characterized not only by query complexity but also by the materialization of results.
Test environment
-
Tests were run on cloud IaaS infrastructure
-
Managed services used:
-
S3 Storage
-
Managed Kubernetes, where the lakehouse platform's compute engines were deployed
-
-
TPC-DS scale-factor 10,000 (~10 TB uncompressed)
-
Data generation performed via Spark
-
File format: Parquet, ZSTD compression (ratio ~3), table format: Iceberg
-
Compressed data volume: ~2 TB
-
Data was partitioned
-
Impala engine version: 4.5
-
Trino engine version: 459
All queries were run as-is from the benchmark specification without any modification to query text. Both processing engines were tuned for maximum performance and maximum utilization of all available hardware resources. Between iterations the systems were restarted to clear local caches. Each engine collected and analyzed statistics independently.
Compute resources were allocated according to the principles established in the first article: the total RAM of the compute cluster must be significantly less than the dataset size, since that configuration reflects real production conditions. It bears repeating: never trust benchmarks where the dataset size is comparable to or smaller than available RAM.
Worker node configuration: 32 vCores, 256 GB RAM, local 100 GB SSD for spill and cache operations. Four worker nodes in total: 128 vCores, 1,024 GB RAM. Trino also had a dedicated coordinator node, but we did not count its resources. Impala operates perfectly well without a dedicated coordinator up to a certain load level — one we did not approach in this test.
Iteration 1
The first iteration ran in single-stream mode: all 99 queries executed sequentially. Results from this mode cannot be used to assess real production performance, but they are useful as an initial calibration pass and for smoothing out configuration rough edges.
The full table with all queries in order appears in the appendix at the end of the article. For easier analysis we split the queries into three groups: "Simple" (under 10 seconds), "Medium" (10–100 seconds), and "Heavy" (over 100 seconds).
Table. Execution time — "Simple" queries
|
Query |
Impala, sec |
Trino, sec |
|
query12 |
1 |
7 |
|
query20 |
1 |
9 |
|
query21 |
1 |
2 |
|
query41 |
1 |
1 |
|
query52 |
1 |
7 |
|
query42 |
2 |
6 |
|
query55 |
2 |
9 |
|
query56 |
2 |
6 |
|
query73 |
2 |
6 |
|
query83 |
2 |
5 |
|
query92 |
2 |
2 |
|
query32 |
3 |
4 |
|
query40 |
3 |
30 |
|
query77 |
3 |
8 |
|
query10 |
4 |
7 |
|
query58 |
4 |
9 |
|
query61 |
4 |
11 |
|
query69 |
4 |
7 |
|
query8 |
5 |
12 |
|
query19 |
6 |
10 |
|
query53 |
6 |
14 |
|
query68 |
6 |
14 |
|
query3 |
7 |
63 |
|
query33 |
7 |
7 |
|
query89 |
7 |
18 |
|
query98 |
7 |
17 |
|
query26 |
8 |
21 |
|
query30 |
8 |
17 |
|
query43 |
8 |
14 |
|
query5 |
9 |
92 |
|
query60 |
9 |
12 |
|
query63 |
9 |
15 |
|
query84 |
9 |
12 |
|
query90 |
9 |
13 |
|
query1 |
10 |
15 |
|
query39 |
10 |
12 |
|
query49 |
10 |
37 |
|
query86 |
10 |
23 |

Table. "Medium complexity" queries
|
Query |
Impala, sec |
Trino, sec |
|
query37 |
11 |
19 |
|
query62 |
11 |
31 |
|
query66 |
11 |
18 |
|
query48 |
12 |
87 |
|
query6 |
13 |
10 |
|
query15 |
13 |
9 |
|
query46 |
13 |
18 |
|
query79 |
13 |
25 |
|
query25 |
14 |
36 |
|
query45 |
14 |
10 |
|
query2 |
15 |
57 |
|
query80 |
16 |
131 |
|
query34 |
17 |
18 |
|
query59 |
17 |
108 |
|
query13 |
18 |
121 |
|
query36 |
18 |
31 |
|
query22 |
19 |
27 |
|
query7 |
20 |
31 |
|
query29 |
20 |
69 |
|
query82 |
20 |
37 |
|
query17 |
21 |
78 |
|
query18 |
21 |
26 |
|
query35 |
21 |
18 |
|
query71 |
21 |
23 |
|
query85 |
21 |
70 |
|
query94 |
21 |
30 |
|
query81 |
22 |
14 |
|
query31 |
26 |
39 |
|
query51 |
26 |
30 |
|
query70 |
27 |
83 |
|
query95 |
30 |
170 |
|
query91 |
31 |
4 |
|
query27 |
32 |
24 |
|
query54 |
32 |
48 |
|
query96 |
38 |
15 |
|
query99 |
45 |
64 |
|
query50 |
47 |
367 |
|
query16 |
49 |
69 |
|
query38 |
49 |
109 |
|
query87 |
50 |
104 |
|
query76 |
54 |
124 |
|
query44 |
74 |
213 |
|
query57 |
75 |
190 |
|
query97 |
88 |
147 |
|
query74 |
93 |
194 |

Table. "Heavy" queries.
|
Query |
Impala, sec |
Trino, sec |
|
query93 |
102 |
487 |
|
query65 |
121 |
137 |
|
query28 |
126 |
350 |
|
query88 |
126 |
112 |
|
query11 |
144 |
357 |
|
query47 |
156 |
368 |
|
query9 |
170 |
325 |
|
query24 |
194 |
497 |
|
query75 |
219 |
310 |
|
query4 |
262 |
798 |
|
query64 |
310 |
203 |
|
query72 |
333 |
65 |
|
query14 |
416 |
2778 |
|
query78 |
710 |
573 |
|
query23 |
1007 |
3436 |
|
query67 |
1324 |
878 |

Table. Overall efficiency — single stream
|
Impala |
Trino | |
|
Total test duration |
~ 2 hours 1 min |
~ 4 hours 17 min |
|
Compute cost |
$10 |
$22 |
Compute cost was calculated using the cloud provider's pricing calculator: monthly cost of four compute nodes / 30 days / 3,600 sec × test duration in seconds. The dedicated Trino coordinator was excluded from the cost calculation.

Iteration 2
Results in hand — can we draw conclusions? Not yet. Staying true to our own principles, we only make decisions based on scenarios that reflect production conditions — concurrent load. For the second iteration we ran TPC-DS with two streams executing simultaneously. Stream composition and query ordering followed the benchmark specification. The key idea is that queries across streams are arranged so they simulate users running different queries against different data, rather than executing the same query in parallel across multiple sessions.
Total test duration was measured as the elapsed time from the start of the run to the completion of the last query across both streams. Since both engines operated under concurrent load, we ran multiple tuning iterations to find optimal resource queue and cluster settings that minimized total elapsed time with zero failures across all 198 queries. Any run where even a single query failed was discarded and configuration tuning continued. For Trino this involved iterative tuning of retry_policy, query_max_total_memory, query_max_memory_per_node, and query_max_memory. Compute cluster resources remained unchanged from the single-stream tests. For Impala the relevant settings were mem_limit and mt_dop.
Table. Overall efficiency — 2 streams
|
Impala |
Trino | |
|
Total test duration |
~ 4 hours 1 min |
~ 11 hours 43 min |
|
Compute cost |
$20 |
$59 |
As load increased the two engines diverged. Impala degraded by approximately 2x relative to the single-stream run; Trino degraded by 2.7x.
Iteration 3
We continued increasing load and ran the benchmark with 4 concurrent streams on the same hardware. The system received 396 queries as input. The same pass/fail criteria applied — the test was considered successful only if all 396 queries completed without errors or crashes.
Table. Overall efficiency — 4 streams
|
Impala |
Trino | |
|
Total test duration |
~ 8 hours 11 min |
~ 34 hours 2 min |
|
Compute cost |
$42 |
$173 |
Impala degraded proportionally to the 4x load increase; Trino degraded by 8x.

Now imagine this cluster runs this workload on a scheduled daily basis for a full year — 365 runs, one per day. What does the accumulated compute cost look like?
|
Impala |
Trino | |
|
Annual compute cost, USD |
$15,155 |
$63,010 |
The difference: ~$48,000 — on 10 TB of raw data across four compute nodes. Now extrapolate that to 100 TB and 40 nodes.
What about GreenPlum?
Having covered the lakehouse engine comparison, it was time to bring GreenPlum into the picture. To do that we had to rebuild the test environment: GreenPlum is a traditional shared-nothing MPP system that runs on dedicated local disks rather than on top of object storage. For a fair comparison, all lakehouse services also had to run without relying on cloud-managed services — making the setup more representative of an on-premises installation.
Table. GreenPlum hardware
|
Node type |
vCPU |
RAM |
Disk |
|
Master × 1 |
16 |
64 |
1 SSD × 1.5 TB |
|
Segment × 4 |
32 |
256 |
4 SSD × 2 TB |
|
Total |
144 |
1,088 |
16 SSD storage layer |
A GreenPlum standby master was not needed for load testing — it contributes nothing to computation and only adds cost.
Table. Alphyn Lakehouse hardware
|
Node type |
vCPU |
RAM |
Disk |
|
MinIO host × 4 |
8 |
24 |
4 × 2 TB |
|
Worker host × 4 |
32 |
256 |
1 × 100 GB |
|
Total |
152 |
1,120 |
16 SSD storage layer |
Sizing followed a principle of equal compute resources and equal disk subsystems, even though the two system topologies differ fundamentally.
Alphyn Lakehouse was deployed on cloud IaaS infrastructure using a decoupled architecture: an isolated S3 cluster based on MinIO and an isolated compute cluster running Impala 4.4.1. The critical constraint was that the number and type of disks in the storage layer had to match exactly, using the highest-performance disks available from the cloud provider. GreenPlum was deployed on cloud VMs with equally high-performance storage.
Test parameters:
-
TPC-DS scale-factor 10,000 (~10 TB uncompressed)
-
GreenPlum tuned for maximum performance:
-
Optimal physical data model including partitioning, compression, storage format selection, and so on
-
Resource management configured for concurrent workloads
-
-
For Impala, data remained in Parquet + Iceberg with ZSTD compression, same as in previous tests
Pass/fail criteria were unchanged: in single-stream mode all 99 queries must complete successfully; in 4-stream mode all 396 queries must complete without errors or crashes.
Table. Benchmark results.
|
Alphyn Lakehouse — Impala 4.4.1 + S3 MinIO |
OSS GreenPlum 6.27.1 | |||
|
1 stream |
4 streams |
1 stream |
4 streams | |
|
Elapsed time |
~ 2 hours 8 min |
~ 8 hours 48 min |
~ 13 hours 20 min |
~ 53 hours 20 min |
|
Compute cost, USD |
$26 |
$109 |
$155 |
$621 |
Compute cost was calculated using the same methodology: monthly IaaS node rental cost (public price list) / 30 days / 24 hours / 3,600 sec × elapsed time in seconds.


Once again the data confirms what we see in practice: Alphyn Lakehouse, even with physically separated storage and compute, delivers at least 6x better cost efficiency than GreenPlum. Translating performance into cost reveals the difference in total cost of ownership from a capital expenditure standpoint. In reality the gap is even wider, since maintenance and licensing costs are derived from hardware sizing — and a GreenPlum cluster also occupies significantly more datacenter floor space and draws substantially more power. MPP systems that rely on full table scans have been obsolete for a decade.
Now let's apply the same methodology to annual compute cost. Assuming the cluster runs this workload on a scheduled basis for 12 months — 365 runs:
|
Alphyn Lakehouse / Impala |
GreenPlum 6 | |
|
Annual compute cost, USD |
$39,737 |
$226,488 |
Difference: ~$187,000.
The gap is significant. Put differently: on just 10 TB of data across a four-node cluster, an organization can redirect ~$187K from hardware budgets toward the data team — and end up with more functionality delivered on time.
What other benchmark standards exist — and are they worth using?
What other open benchmarks can be used for concurrent load testing beyond TPC-DS? One option is TPC-H. However, despite also targeting analytical systems, TPC-H has several weaknesses compared to TPC-DS:
-
Simpler data model:
-
Uses a simple star schema with fewer tables, rather than the snowflake schema used in TPC-DS
-
-
Limited query set:
-
Contains only 22 queries which, while analytical, are less complex and less varied than TPC-DS
-
-
Lower optimizer demand:
-
TPC-H is considered less demanding on the query optimizer due to the relative simplicity of its queries, so it may not fully expose the capabilities of sophisticated query optimizers in modern MPP systems
-
Recently I have seen ClickBench used increasingly often for comparative engine testing. This is a performance benchmark developed by the ClickHouse team using a real dataset and relatively simple queries characteristic of ClickHouse's target use case.
ClickBench is excellent for evaluating ClickHouse performance in the scenarios it was designed for — fast filtered aggregation over a single object. But ClickBench should not be used to select a primary platform engine. Among its 42 SQL queries, not one contains a single JOIN.
Upcoming testing plans
We are currently benchmarking StarRocks, another processing engine that is part of the Alphyn Lakehouse platform. We also plan to benchmark Spark 4 versus Spark 3.5 and compare their performance against other MPP SQL engines in ELT pipeline workloads.
We are also developing a methodology for objectively load-testing engines on "fast read access to a materialized data mart" scenarios. If you have ideas, we'd love to hear them.
A note to readers and the community
Our team is committed to objective, comparative technology testing — often an internal competitive process rather than a comparison against external market offerings. Our goal is to give customers a system that meets their performance and functionality expectations, and to calculate hardware sizing accurately, since it directly determines total cost of ownership. When choosing between technologies, users and platform owners should be guided not only by what feels convenient, but by a clear understanding of what that convenience will cost.
If you have doubts about any of the results presented here, reproduce them yourself. Share your observations, invite discussion, or reach out about joint testing. Thank you for reading.
Appendix
Table: TPC-DS query execution times. Single stream. In benchmark query order.
|
Query |
Impala, sec |
Trino, sec |
Query |
Impala, sec |
Trino, sec |
|
query1 |
10 |
15 |
query51 |
26 |
30 |
|
query2 |
15 |
57 |
query52 |
1 |
7 |
|
query3 |
7 |
63 |
query53 |
6 |
14 |
|
query4 |
262 |
798 |
query54 |
32 |
48 |
|
query5 |
9 |
92 |
query55 |
2 |
9 |
|
query6 |
13 |
10 |
query56 |
2 |
6 |
|
query7 |
20 |
31 |
query57 |
75 |
190 |
|
query8 |
5 |
12 |
query58 |
4 |
9 |
|
query9 |
170 |
325 |
query59 |
17 |
108 |
|
query10 |
4 |
7 |
query60 |
9 |
12 |
|
query11 |
144 |
357 |
query61 |
4 |
11 |
|
query12 |
1 |
7 |
query62 |
11 |
31 |
|
query13 |
18 |
121 |
query63 |
9 |
15 |
|
query14 |
416 |
2778 |
query64 |
310 |
203 |
|
query15 |
13 |
9 |
query65 |
121 |
137 |
|
query16 |
49 |
69 |
query66 |
11 |
18 |
|
query17 |
21 |
78 |
query67 |
1324 |
878 |
|
query18 |
21 |
26 |
query68 |
6 |
14 |
|
query19 |
6 |
10 |
query69 |
4 |
7 |
|
query20 |
1 |
9 |
query70 |
27 |
83 |
|
query21 |
1 |
2 |
query71 |
21 |
23 |
|
query22 |
19 |
27 |
query72 |
333 |
65 |
|
query23 |
1007 |
3436 |
query73 |
2 |
6 |
|
query24 |
194 |
497 |
query74 |
93 |
194 |
|
query25 |
14 |
36 |
query75 |
219 |
310 |
|
query26 |
8 |
21 |
query76 |
54 |
124 |
|
query27 |
32 |
24 |
query77 |
3 |
8 |
|
query28 |
126 |
350 |
query78 |
710 |
573 |
|
query29 |
20 |
69 |
query79 |
13 |
25 |
|
query30 |
8 |
17 |
query80 |
16 |
131 |
|
query31 |
26 |
39 |
query81 |
22 |
14 |
|
query32 |
3 |
4 |
query82 |
20 |
37 |
|
query33 |
7 |
7 |
query83 |
2 |
5 |
|
query34 |
17 |
18 |
query84 |
9 |
12 |
|
query35 |
21 |
18 |
query85 |
21 |
70 |
|
query36 |
18 |
31 |
query86 |
10 |
23 |
|
query37 |
11 |
19 |
query87 |
50 |
104 |
|
query38 |
49 |
109 |
query88 |
126 |
112 |
|
query39 |
10 |
12 |
query89 |
7 |
18 |
|
query40 |
3 |
30 |
query90 |
9 |
13 |
|
query41 |
1 |
1 |
query91 |
31 |
4 |
|
query42 |
2 |
6 |
query92 |
2 |
2 |
|
query43 |
8 |
14 |
query93 |
102 |
487 |
|
query44 |
74 |
213 |
query94 |
21 |
30 |
|
query45 |
14 |
10 |
query95 |
30 |
170 |
|
query46 |
13 |
18 |
query96 |
38 |
15 |
|
query47 |
156 |
368 |
query97 |
88 |
147 |
|
query48 |
12 |
87 |
query98 |
7 |
17 |
|
query49 |
10 |
37 |
query99 |
45 |
64 |
|
query50 |
47 |
367 |
- |
- |
- |
See it on your own data
If you're weighing how this would handle your workloads, we'd be glad to walk you through Alphyn Lakehouse on a real scenario. Book a sovereign-lakehouse walkthrough →
About Alphyn.AI
We build the Alphyn Lakehouse, a Kubernetes-native, high-performance, multi-engine lakehouse for any enterprise data and analytical workload — from agentic AI and BI to structured and unstructured data. Built entirely on open standards and an open architecture, Alphyn Lakehouse is a sovereign, on-premises solution for regulated enterprises across the GCC and the wider MENA region.