Benchmarking MPP Systems and Engines: TPC-DS Results

This post continues our series on comparing massively parallel processing systems and engines. In a previous article I laid out the testing principles our team follows and shared results from both real-world production scenarios and synthetic benchmarks. That piece sparked discussion — some found the evidence convincing, others questioned whether the results were objective. As promised, here are the results of a benchmark run under the widely accepted TPC-DS standard. Today you'll find out whether switching methodologies changed anything.

Introduction

TPC-DS (Transaction Processing Performance Council – Decision Support) is the industry-standard benchmark for measuring the performance of Decision Support Systems (DSS). In plain terms, it tests how well a given system — a database or a big data platform — handles complex analytical queries under conditions that resemble real production workloads.

The standard was developed by the Transaction Processing Performance Council (TPC) to evaluate systems that process large data volumes and execute complex analytical tasks. TPC-DS models the operations of a retail company — sales, inventory, and marketing — and uses that model to generate a dataset and a query set.

Key characteristics of TPC-DS

Realistic workload: TPC-DS is considered to simulate an analytical workload close to real production use, making its results more meaningful for assessing how systems behave in practice.

Varied, complex queries: The benchmark includes 99 distinct analytical SQL queries that span a wide range of complexity — reporting, OLAP, and data mining. They exercise a system's ability to handle joins, aggregations, and filtering.

Scalable data volumes: TPC-DS can generate datasets ranging from a few gigabytes to hundreds of terabytes, making it possible to test systems at whatever scale factor appropriately matches the hardware under test.

Why TPC-DS matters

TPC-DS is regarded as an objective, standardized way to compare the performance of data analysis platforms. Most database and system vendors publish TPC-DS results to demonstrate what their products can do. The variety of queries in the standard frequently exposes strengths and weaknesses not just of the system as a whole, but of individual subsystems — the query optimizer, the cardinality estimator, and so on. This is why many vendors also use TPC-DS internally when working on optimizer strategies or other engine changes. The wider the adoption of a benchmark, the more credible its results become.

Limitations of the benchmark

TPC-DS models a retail business — sales, inventory, and marketing. Some organizations, particularly in finance, consider it inapplicable to their context and develop their own approaches. That was precisely the situation I described in the previous article.

Another limitation, in my view, is that the standard is weighted toward BI, ROLAP, ad hoc, and light-to-medium ETL scenarios (relative to the scale factor's data volume). It is less representative of heavy ETL workloads, which are characterized not only by query complexity but also by the materialization of results.

Test environment

Tests were run on cloud IaaS infrastructure
Managed services used:
- S3 Storage
- Managed Kubernetes, where the lakehouse platform's compute engines were deployed
TPC-DS scale-factor 10,000 (~10 TB uncompressed)
Data generation performed via Spark
File format: Parquet, ZSTD compression (ratio ~3), table format: Iceberg
Compressed data volume: ~2 TB
Data was partitioned
Impala engine version: 4.5
Trino engine version: 459

All queries were run as-is from the benchmark specification without any modification to query text. Both processing engines were tuned for maximum performance and maximum utilization of all available hardware resources. Between iterations the systems were restarted to clear local caches. Each engine collected and analyzed statistics independently.

Compute resources were allocated according to the principles established in the first article: the total RAM of the compute cluster must be significantly less than the dataset size, since that configuration reflects real production conditions. It bears repeating: never trust benchmarks where the dataset size is comparable to or smaller than available RAM.

Worker node configuration: 32 vCores, 256 GB RAM, local 100 GB SSD for spill and cache operations. Four worker nodes in total: 128 vCores, 1,024 GB RAM. Trino also had a dedicated coordinator node, but we did not count its resources. Impala operates perfectly well without a dedicated coordinator up to a certain load level — one we did not approach in this test.

Iteration 1

The first iteration ran in single-stream mode: all 99 queries executed sequentially. Results from this mode cannot be used to assess real production performance, but they are useful as an initial calibration pass and for smoothing out configuration rough edges.

The full table with all queries in order appears in the appendix at the end of the article. For easier analysis we split the queries into three groups: "Simple" (under 10 seconds), "Medium" (10–100 seconds), and "Heavy" (over 100 seconds).

Table. Execution time — "Simple" queries

Query	Impala, sec	Trino, sec
query12	1	7
query20	1	9
query21	1	2
query41	1	1
query52	1	7
query42	2	6
query55	2	9
query56	2	6
query73	2	6
query83	2	5
query92	2	2
query32	3	4
query40	3	30
query77	3	8
query10	4	7
query58	4	9
query61	4	11
query69	4	7
query8	5	12
query19	6	10
query53	6	14
query68	6	14
query3	7	63
query33	7	7
query89	7	18
query98	7	17
query26	8	21
query30	8	17
query43	8	14
query5	9	92
query60	9	12
query63	9	15
query84	9	12
query90	9	13
query1	10	15
query39	10	12
query49	10	37
query86	10	23

Chart. Comparative execution time for "simple" queries. Lower is better.

Table. "Medium complexity" queries

Query	Impala, sec	Trino, sec
query37	11	19
query62	11	31
query66	11	18
query48	12	87
query6	13	10
query15	13	9
query46	13	18
query79	13	25
query25	14	36
query45	14	10
query2	15	57
query80	16	131
query34	17	18
query59	17	108
query13	18	121
query36	18	31
query22	19	27
query7	20	31
query29	20	69
query82	20	37
query17	21	78
query18	21	26
query35	21	18
query71	21	23
query85	21	70
query94	21	30
query81	22	14
query31	26	39
query51	26	30
query70	27	83
query95	30	170
query91	31	4
query27	32	24
query54	32	48
query96	38	15
query99	45	64
query50	47	367
query16	49	69
query38	49	109
query87	50	104
query76	54	124
query44	74	213
query57	75	190
query97	88	147
query74	93	194

Chart. Comparative execution time for "medium complexity" queries. Lower is better.

Table. "Heavy" queries.

Query	Impala, sec	Trino, sec
query93	102	487
query65	121	137
query28	126	350
query88	126	112
query11	144	357
query47	156	368
query9	170	325
query24	194	497
query75	219	310
query4	262	798
query64	310	203
query72	333	65
query14	416	2778
query78	710	573
query23	1007	3436
query67	1324	878

Chart. Comparative execution time for "heavy" queries. Lower is better.

Table. Overall efficiency — single stream

	Impala	Trino
Total test duration	~ 2 hours 1 min	~ 4 hours 17 min
Compute cost	$10	$22

Compute cost was calculated using the cloud provider's pricing calculator: monthly cost of four compute nodes / 30 days / 3,600 sec × test duration in seconds. The dedicated Trino coordinator was excluded from the cost calculation.

Chart. Total test duration — all queries. Lower is better.

Iteration 2

Results in hand — can we draw conclusions? Not yet. Staying true to our own principles, we only make decisions based on scenarios that reflect production conditions — concurrent load. For the second iteration we ran TPC-DS with two streams executing simultaneously. Stream composition and query ordering followed the benchmark specification. The key idea is that queries across streams are arranged so they simulate users running different queries against different data, rather than executing the same query in parallel across multiple sessions.

Total test duration was measured as the elapsed time from the start of the run to the completion of the last query across both streams. Since both engines operated under concurrent load, we ran multiple tuning iterations to find optimal resource queue and cluster settings that minimized total elapsed time with zero failures across all 198 queries. Any run where even a single query failed was discarded and configuration tuning continued. For Trino this involved iterative tuning of retry_policy, query_max_total_memory, query_max_memory_per_node, and query_max_memory. Compute cluster resources remained unchanged from the single-stream tests. For Impala the relevant settings were mem_limit and mt_dop.

Table. Overall efficiency — 2 streams

	Impala	Trino
Total test duration	~ 4 hours 1 min	~ 11 hours 43 min
Compute cost	$20	$59

As load increased the two engines diverged. Impala degraded by approximately 2x relative to the single-stream run; Trino degraded by 2.7x.

Iteration 3

We continued increasing load and ran the benchmark with 4 concurrent streams on the same hardware. The system received 396 queries as input. The same pass/fail criteria applied — the test was considered successful only if all 396 queries completed without errors or crashes.

Table. Overall efficiency — 4 streams

	Impala	Trino
Total test duration	~ 8 hours 11 min	~ 34 hours 2 min
Compute cost	$42	$173

Impala degraded proportionally to the 4x load increase; Trino degraded by 8x.

Chart. Total duration under concurrent load. Lower is better.

Now imagine this cluster runs this workload on a scheduled daily basis for a full year — 365 runs, one per day. What does the accumulated compute cost look like?

	Impala	Trino
Annual compute cost, USD	$15,155	$63,010

The difference: ~$48,000 — on 10 TB of raw data across four compute nodes. Now extrapolate that to 100 TB and 40 nodes.

What about GreenPlum?

Having covered the lakehouse engine comparison, it was time to bring GreenPlum into the picture. To do that we had to rebuild the test environment: GreenPlum is a traditional shared-nothing MPP system that runs on dedicated local disks rather than on top of object storage. For a fair comparison, all lakehouse services also had to run without relying on cloud-managed services — making the setup more representative of an on-premises installation.

Table. GreenPlum hardware

Node type	vCPU	RAM	Disk
Master × 1	16	64	1 SSD × 1.5 TB
Segment × 4	32	256	4 SSD × 2 TB
Total	144	1,088	16 SSD storage layer

A GreenPlum standby master was not needed for load testing — it contributes nothing to computation and only adds cost.

Table. Alphyn Lakehouse hardware

Node type	vCPU	RAM	Disk
MinIO host × 4	8	24	4 × 2 TB
Worker host × 4	32	256	1 × 100 GB
Total	152	1,120	16 SSD storage layer

Sizing followed a principle of equal compute resources and equal disk subsystems, even though the two system topologies differ fundamentally.

Alphyn Lakehouse was deployed on cloud IaaS infrastructure using a decoupled architecture: an isolated S3 cluster based on MinIO and an isolated compute cluster running Impala 4.4.1. The critical constraint was that the number and type of disks in the storage layer had to match exactly, using the highest-performance disks available from the cloud provider. GreenPlum was deployed on cloud VMs with equally high-performance storage.

Test parameters:

TPC-DS scale-factor 10,000 (~10 TB uncompressed)
GreenPlum tuned for maximum performance:
- Optimal physical data model including partitioning, compression, storage format selection, and so on
- Resource management configured for concurrent workloads
For Impala, data remained in Parquet + Iceberg with ZSTD compression, same as in previous tests

Pass/fail criteria were unchanged: in single-stream mode all 99 queries must complete successfully; in 4-stream mode all 396 queries must complete without errors or crashes.

Table. Benchmark results.

	Alphyn Lakehouse — Impala 4.4.1 + S3 MinIO		OSS GreenPlum 6.27.1
	1 stream	4 streams	1 stream	4 streams
Elapsed time	~ 2 hours 8 min	~ 8 hours 48 min	~ 13 hours 20 min	~ 53 hours 20 min
Compute cost, USD	$26	$109	$155	$621

Compute cost was calculated using the same methodology: monthly IaaS node rental cost (public price list) / 30 days / 24 hours / 3,600 sec × elapsed time in seconds.

Chart. Comparison of Alphyn Lakehouse and OSS GreenPlum. Lower is better.

Fig. GreenPlum resource utilization graphs. TPC-DS 4 streams.

Once again the data confirms what we see in practice: Alphyn Lakehouse, even with physically separated storage and compute, delivers at least 6x better cost efficiency than GreenPlum. Translating performance into cost reveals the difference in total cost of ownership from a capital expenditure standpoint. In reality the gap is even wider, since maintenance and licensing costs are derived from hardware sizing — and a GreenPlum cluster also occupies significantly more datacenter floor space and draws substantially more power. MPP systems that rely on full table scans have been obsolete for a decade.

Now let's apply the same methodology to annual compute cost. Assuming the cluster runs this workload on a scheduled basis for 12 months — 365 runs:

	Alphyn Lakehouse / Impala	GreenPlum 6
Annual compute cost, USD	$39,737	$226,488

Difference: ~$187,000.

The gap is significant. Put differently: on just 10 TB of data across a four-node cluster, an organization can redirect ~$187K from hardware budgets toward the data team — and end up with more functionality delivered on time.

What other benchmark standards exist — and are they worth using?

What other open benchmarks can be used for concurrent load testing beyond TPC-DS? One option is TPC-H. However, despite also targeting analytical systems, TPC-H has several weaknesses compared to TPC-DS:

Simpler data model:
- Uses a simple star schema with fewer tables, rather than the snowflake schema used in TPC-DS
Limited query set:
- Contains only 22 queries which, while analytical, are less complex and less varied than TPC-DS
Lower optimizer demand:
- TPC-H is considered less demanding on the query optimizer due to the relative simplicity of its queries, so it may not fully expose the capabilities of sophisticated query optimizers in modern MPP systems

Recently I have seen ClickBench used increasingly often for comparative engine testing. This is a performance benchmark developed by the ClickHouse team using a real dataset and relatively simple queries characteristic of ClickHouse's target use case.

ClickBench is excellent for evaluating ClickHouse performance in the scenarios it was designed for — fast filtered aggregation over a single object. But ClickBench should not be used to select a primary platform engine. Among its 42 SQL queries, not one contains a single JOIN.

Upcoming testing plans

We are currently benchmarking StarRocks, another processing engine that is part of the Alphyn Lakehouse platform. We also plan to benchmark Spark 4 versus Spark 3.5 and compare their performance against other MPP SQL engines in ELT pipeline workloads.

We are also developing a methodology for objectively load-testing engines on "fast read access to a materialized data mart" scenarios. If you have ideas, we'd love to hear them.

A note to readers and the community

Our team is committed to objective, comparative technology testing — often an internal competitive process rather than a comparison against external market offerings. Our goal is to give customers a system that meets their performance and functionality expectations, and to calculate hardware sizing accurately, since it directly determines total cost of ownership. When choosing between technologies, users and platform owners should be guided not only by what feels convenient, but by a clear understanding of what that convenience will cost.

If you have doubts about any of the results presented here, reproduce them yourself. Share your observations, invite discussion, or reach out about joint testing. Thank you for reading.

Appendix

Table: TPC-DS query execution times. Single stream. In benchmark query order.

Query	Impala, sec	Trino, sec	Query	Impala, sec	Trino, sec
query1	10	15	query51	26	30
query2	15	57	query52	1	7
query3	7	63	query53	6	14
query4	262	798	query54	32	48
query5	9	92	query55	2	9
query6	13	10	query56	2	6
query7	20	31	query57	75	190
query8	5	12	query58	4	9
query9	170	325	query59	17	108
query10	4	7	query60	9	12
query11	144	357	query61	4	11
query12	1	7	query62	11	31
query13	18	121	query63	9	15
query14	416	2778	query64	310	203
query15	13	9	query65	121	137
query16	49	69	query66	11	18
query17	21	78	query67	1324	878
query18	21	26	query68	6	14
query19	6	10	query69	4	7
query20	1	9	query70	27	83
query21	1	2	query71	21	23
query22	19	27	query72	333	65
query23	1007	3436	query73	2	6
query24	194	497	query74	93	194
query25	14	36	query75	219	310
query26	8	21	query76	54	124
query27	32	24	query77	3	8
query28	126	350	query78	710	573
query29	20	69	query79	13	25
query30	8	17	query80	16	131
query31	26	39	query81	22	14
query32	3	4	query82	20	37
query33	7	7	query83	2	5
query34	17	18	query84	9	12
query35	21	18	query85	21	70
query36	18	31	query86	10	23
query37	11	19	query87	50	104
query38	49	109	query88	126	112
query39	10	12	query89	7	18
query40	3	30	query90	9	13
query41	1	1	query91	31	4
query42	2	6	query92	2	2
query43	8	14	query93	102	487
query44	74	213	query94	21	30
query45	14	10	query95	30	170
query46	13	18	query96	38	15
query47	156	368	query97	88	147
query48	12	87	query98	7	17
query49	10	37	query99	45	64
query50	47	367	-	-	-

See it on your own data

If you're weighing how this would handle your workloads, we'd be glad to walk you through Alphyn Lakehouse on a real scenario. Book a sovereign-lakehouse walkthrough →

About Alphyn.AI

We build the Alphyn Lakehouse, a Kubernetes-native, high-performance, multi-engine lakehouse for any enterprise data and analytical workload — from agentic AI and BI to structured and unstructured data. Built entirely on open standards and an open architecture, Alphyn Lakehouse is a sovereign, on-premises solution for regulated enterprises across the GCC and the wider MENA region.

Learn more at alphyn.ai and follow us on LinkedIn.

Benchmarking MPP Systems and Engines: TPC-DS Results

See it on your own data

Get the latest posts in your inbox

Continue Reading

StarRocks Instead of Oracle for Mixed Analytical Workloads: A Practical Test

MPP Benchmarks: Impala vs Trino vs Greenplum — Methodology and Real Results

Terabytes of Data from Teradata to Trino: An Efficient Transfer Method