Engineering

Terabytes of Data from Teradata to Trino — Efficient Transfer Method

How to efficiently extract terabytes from Teradata into Trino using AMP-aware multi-threaded reads and push-down optimizations in Alphyn Lakehouse.

by Alphyn.ai Engineering Team·13 min read

Terabytes of Data from Teradata to Trino — Efficient Transfer Method

The Lakehouse architectural principle assumes that you operate on all data loaded into the system. But sometimes you need to perform ad hoc analysis outside its perimeter, because the necessary data is not in the Lakehouse platform for various reasons. In such cases, federated access comes to the rescue. The standard engine for this task is Trino. It can extract data from external DBMS systems and in some cases can even push down certain computations to the source system side. The key requirement is having a suitable connector for the target DBMS that can work with it efficiently.

Transfer terabytes from Teradata in multiple streams without significantly increasing the load on the source — using AMP-aware extraction and push-down optimizations.

Recently, a new Trino Teradata Connector was added to Alphyn Lakehouse. It allows users to pull the necessary data slices from Teradata within ad hoc queries and solves the efficient data transfer problem: you can transfer terabytes in multiple streams without significantly increasing the load on the source.

In this article we will cover:

  • How to organize efficient multi-threaded access to Teradata:
    • Where mistakes are commonly made
    • What the correct solution looks like
  • What capabilities the Alphyn Trino Teradata Connector provides:
    • Multi-threaded transfer
    • Push-down optimizations

A preliminary note: We do not recommend using the Alphyn Trino Teradata Connector specifically for data migration tasks into the Lakehouse platform or for regular data loading. For those purposes we have a dedicated efficient batch data transfer tool — Alphyn Flex Loader — which when working with Teradata uses (at the user's or administrator's choice) either the Native Object Storage (NOS) mechanism or Teradata Parallel Transporter (TPT). The Trino-based access issue typically arises for our clients in federated application tasks such as data reconciliation, data quality checking applications, source system data profiling, and similar use cases.

Efficient Client Access to Teradata

A Brief Look at Teradata Architecture

A detailed deep-dive into Teradata is not among the goals of this article, but to make the technical details understandable and allow the overall picture to come together, we'll briefly touch on the key architectural aspects. Many readers may be familiar with Greenplum, so we'll also draw parallels with it.

Teradata is an MPP DBMS based on a shared-nothing architecture: each worker node has its own dedicated compute resources (CPU, RAM, Disk) for processing tasks. Worker nodes in Teradata are called AMPs (Access Module Processors — the full name is rarely used). Table records are distributed across AMPs for storage and processing. AMPs are connected via a common interconnect network called BYNET. Users connect and submit queries to Parsing Engine (PE) components — this is where query parsing, query plan construction, task dispatch to AMPs, execution monitoring, response assembly, and delivery to the client all occur.

Teradata was designed long ago, when RAM was much scarcer and significantly more expensive. Therefore the DBMS was originally built to actively use the disk subsystem for storing intermediate computations and prepared result sets — this approach continues to the present day. In query plans you can frequently see the system writing to Spool — that is what this refers to.

For those familiar with Greenplum, here is a quick analogy:

  • Master Server maps to Parsing Engine (PE), of which there can be several; each PE can handle approximately 100+ sessions
  • Segment maps to AMP. On a sufficiently powerful physical Segment Host you typically run several Segment Servers, but on a similarly powerful Teradata Node there are always more AMPs — so the total AMP count is usually higher in Teradata, often in the hundreds or thousands. Each AMP has fewer resources at its disposal and stores/processes less data than a single Segment
  • Distributed Key maps to Primary Index. Data is distributed across worker nodes by hashing the fields in the distribution key. The goal is usually to achieve even distribution across worker nodes on large tables (small tables typically use a different storage strategy). There are many differences in the details of the data distribution mechanism, but we'll skip those here
  • Spill maps to Spool. Unlike Greenplum, which tries to keep computations in RAM until it runs out, Teradata immediately writes to Spool

Where the Common Mistake Is Made

In practice, we have encountered several solutions for working with Teradata and subsequent loading into Hadoop or Lakehouse, and many of them shared the same problem with efficient data extraction from Teradata. And not only because some extracted hundreds of GB in a single thread. More critically, when extraction runs in N threads, it creates N times the load on the source system.

For parallel extraction in such cases, multiple queries were formed, each designed to process its own subset of records. Two key variants exist:

SELECT ... FROM table WHERE hashbucket(hashrow(key_field)) MOD 4 = 0;
SELECT ... FROM table WHERE hashbucket(hashrow(key_field)) MOD 4 = 1;
SELECT ... FROM table WHERE hashbucket(hashrow(key_field)) MOD 4 = 2;
SELECT ... FROM table WHERE hashbucket(hashrow(key_field)) MOD 4 = 3;
SELECT ... FROM table WHERE hashamp(hashbucket(hashrow(key_field))) MOD 4 = 0;
SELECT ... FROM table WHERE hashamp(hashbucket(hashrow(key_field))) MOD 4 = 1;
SELECT ... FROM table WHERE hashamp(hashbucket(hashrow(key_field))) MOD 4 = 2;
SELECT ... FROM table WHERE hashamp(hashbucket(hashrow(key_field))) MOD 4 = 3;

Each of these queries will indeed return its own non-overlapping subset of records (for simplicity we'll leave out the case of concurrent modifications). But what work does the DBMS have to perform to accomplish this?

For processing, each AMP scans all records and checks which ones satisfy the condition hash(key_field) MOD 4 = X. Each AMP writes those records to Spool. Then each AMP delivers those records from Spool to the client. In other words, during one such query we scan the entire table, spend additional CPU computing predicates for all rows, and only after all that does any resource savings begin.

But we won't have just one such query — there will be N of them. That means extraction in N threads leads to N full table scans: by increasing the number of threads, we linearly increase the load on the source system. Want to extract a 10 TB table from Teradata into the Lakehouse but a single thread is too slow? Use 10 threads and force the system to do 10 times the work — instead of scanning 10 TB through the I/O subsystem, you push 100 TB, plus burn a lot of CPU computing predicates.

As a result, this method scales very poorly. Extraction in N threads produces:

  • N-fold increase in I/O operations in Teradata
  • N-fold increase in CPU load in Teradata
  • Minimal reduction in extraction time. The time to prepare data on the Teradata side does not decrease; the only gain is from reducing the data volume per thread on the receiving side.

What Is Required for Efficient Extraction

Let's consider what we know at the point when we want to extract data from Teradata:

  • The table data is already distributed across hundreds of AMPs (in large installations, even thousands)
  • Each AMP holds approximately an equal portion of the data
  • Since there are quite a few portions (hundreds to thousands), there is no need to split them further during extraction — in fact, the opposite is true: data from multiple AMPs can be extracted within a single thread (connection/session)

You can draw a parallel with table partitioning, which is widely used in various DBMS systems including Teradata. Partitioning places data in separate portions (partitions) and allows reading just one specific partition rather than scanning everything. But what we want here is to read the data portion on one specific AMP within one extraction thread.

Teradata actually has a mechanism that allows reading data from a specific AMP — you just need to specify its sequence number. You can also specify several AMPs at once, which is very convenient for building various workloads for different scenarios.

Knowing the number of AMPs, we can distribute the extraction task across N threads. For example, with 10 worker processes on the Lakehouse side, a Teradata installation with 1,000 AMPs can be distributed as follows:

  • AMPs [1, 2, 3, ... 100] --> worker 1
  • AMPs [101, 102, 103, ... 200] --> worker 2
  • ...
  • AMPs [901, 902, 903, ... 1000] --> worker 10

With this approach, each AMP reads its own portion of data exactly once, regardless of the number of threads. The scaling problems described earlier disappear:

  • I/O operations are constant for the table, regardless of thread count (instead of N-fold increase)
  • CPU load on AMPs is constant for the table, regardless of thread count (instead of N-fold increase)
  • Time decreases linearly, provided all queries actually run in parallel (instead of minimal reduction)

To close this section, a note on the Skew issue. At the extreme, one query will read data from one AMP — meaning all computation is performed on a single AMP, which corresponds to 100% Skew. Administrators usually try to avoid such heavy skews, so they sometimes raise questions about such queries. The explanation is: if you sum all queries, each of which extracted its own data from one AMP (and therefore showed 100% Skew metrics), the combined total across all queries will show 0-1-2% Skew — which corresponds to the skew of the source table itself.

Trino Teradata Connector in Alphyn Lakehouse

The Teradata Connector ships as part of the Alphyn Lakehouse distribution. No additional steps are required for installation — you can immediately proceed to configuring a catalog that allows Trino to connect to a Teradata cluster:

CREATE CATALOG teradata_cluster USING teradata
WITH (
  "connection-url" = 'jdbc:teradata://hostname/DATABASE=mydb,DBS_PORT=1025',
  -- Using credential passthrough
  "user-credential-name" = 'td_user',
  "password-credential-name" = 'td_password'

  -- Alternative: without user credential passthrough,
  -- configure a shared technical account
  -- "connection-user" = 'user',
  -- "connection-password" = 'password'
);

Immediately after this, you can send queries from Trino that will interact with Teradata:

SELECT * FROM teradata_cluster.db_name.table_name WHERE ...;

SELECT ...
FROM lakehouse_db.lakehouse_table
LEFT JOIN teradata_cluster.db_name.table_name ON ...
WHERE ...;

During execution of these queries, Trino will read data from Teradata. By default, reading occurs in a single thread, but this is easy to change:

SET SESSION teradata_cluster.max_scan_parallelism = 4;

The connector will independently determine the number of AMPs in Teradata and distribute the extraction workload across Trino Workers. Each AMP scans its own data exactly once regardless of the number of extraction threads. If max_scan_parallelism is set higher than the number of AMPs, the thread count will be capped at the AMP count — no more. In practice, 4-8-16 threads is sufficient for fast extraction depending on table size — you are very far from hitting this limit.

The connector also supports an additional filtering step: if Trino needs data for only one day, it makes sense to filter out unnecessary data at the source side and not transfer it at all — Trino can push down certain expressions to the source to optimize the volume transferred.

Supported Push-down Optimizations

WHERE Predicate Push-down

To filter the dataset at the source side, Trino can push down certain operators and functions (and their combinations via AND / OR). The Teradata Connector currently supports:

  • Comparison operators: =, <>, <, <=, >, >=
  • Arithmetic operations on integer types: +, -, *, /, MOD
  • Unary minus for integer types
  • LIKE operator (with optional ESCAPE)
  • NULL checks: IS NULL, IS NOT NULL
  • Logical negation: NOT
  • NULLIF function

For example:

SELECT dt, client_id, phone_number
FROM teradata_cluster.db_name.table_name
WHERE dt = '2026-01-01' AND phone_number IS NOT NULL;

Aggregate Function Push-down

When aggregating, we usually reduce the dataset size, so push-down can also reduce the volume of transferred data. Ultimately, much less data needs to be transferred, which can favorably affect execution time — especially when the final result set shrinks by orders of magnitude (10x, 100x, 1000x).

However, since the original user query in Trino can contain any aggregations that Trino knows about, it is important that their equivalents also exist in the DBMS, and that the connector understands how to translate them. Currently the Teradata Connector supports the standard set: COUNT / COUNT(DISTINCT), SUM, MIN, MAX, AVG, and the less commonly known STDDEV_SAMP, STDDEV_POP, VARIANCE, VAR_POP. Therefore the following Trino query will be pushed down for computation in Teradata:

SELECT report_dt, COUNT(*), AVG(salary)
FROM teradata_cluster.db_name.table_name
GROUP BY report_dt;

Importantly, if even one aggregation function in the query is not supported for push-down, all aggregations in that query will be executed in Trino. If detailed data must be transferred for the computations anyway, it is more efficient to do a single computation pass over them in Trino than to do two passes (one in Trino and one in another DBMS).

If aggregation push-down is not needed, it can be disabled:

SET SESSION teradata_cluster.aggregation_pushdown_enabled = false;

JOIN Push-down

An important disclaimer: JOIN push-down works only for objects within the same catalog. If the query uses objects from different catalogs, Trino will load the data from each object locally and perform the join itself.

It is also important that the join conditions do not contain Trino-specific functions that have no equivalent in Teradata.

Otherwise, this is also a good option for reducing the final result set size before transferring to Trino — especially when the data volume shrinks significantly.

However, if the join is expected to increase the number of records, it is better not to push down such a join. You can disable it like this:

SET SESSION teradata_cluster.join_pushdown_enabled = false;

Limitations of Push-down Computations in Teradata

We discussed fast multi-threaded extraction from Teradata where each thread picks up already-prepared data from one or several AMPs. Adding a simple WHERE filter to this is not a problem — it can be applied on each AMP independently.

However, when it comes to JOINs or aggregations, the initial conditions are different. The system now needs to do much more complex work, which may include:

  • Redistribution of one or more tables across AMPs
  • Duplication of a table across all AMPs
  • Additional sorting of records on each AMP for one or more tables
  • Building a hash table

These computations cannot be confined to a single AMP — they require data from other AMPs for correct operation. Therefore each extraction thread will engage not just one or a few specified AMPs, but all of them simultaneously. We discussed the related problem at the beginning of the article.

What are the options in this case?

  1. "Push-down Priority." Push down the join/aggregation and extract in a single thread
  2. "Multi-threaded extraction." Without push-down — disable them
  3. "Materialization." Run CREATE TABLE AS SELECT with the needed join/aggregation in Teradata, then perform multi-threaded extraction

The exact prescription depends on many specific conditions, but here are general recommendations for several cases:

  • If the final result set is small — "Push-down Priority"
  • If the final result set is significantly reduced, usually stick with "Push-down Priority" or "Materialization" when there is appropriate sandbox space in Teradata
  • If the result set will be approximately the same size after the join/aggregation — "Multi-threaded extraction", since the transfer volume is the same regardless

Conclusion

In closing, here are the key ideas and takeaways:

  • For efficient multi-threaded data transfer from Teradata to the client side, each thread must work with its own subset of AMPs. Each AMP should be engaged exactly once, regardless of the number of threads.
  • In some cases it may be more efficient to execute computations at the source side to reduce the result set size. Trino can push down certain computations, but the connector must also support them.
  • When pushing down computations, it is important to understand how they will execute at the source — some combinations should not be mixed with multi-threaded extraction. But this can still be worthwhile, especially when the result is a substantially smaller dataset.
teradatatrinodata-integrationlakehousemppfederation

Get the latest posts in your inbox

Subscribe to our blog and get the latest posts delivered to your inbox.

By clicking "Subscribe" you agree to receive Alphyn communications. We respect your privacy.