How to create Partitioned primary index in Teradata?
Contents
Partitioned primary index in Teradata
Partitioning is an indexing technique in Teradata which helps to improve the query performance on the large tables. In general, the rows are distributed among the AMP(Access Module Processor) based on primary index. Then within each AMP, the records are sorted by row Id which is generated by Teradata.
If we create the partitioned primary index(PPI) for a table, the rows are sorted by partition value in AMP. Row distribution among the AMP still happens based on the primary index only. Using PPI, we can avoid full table scan in Teradata especially on Range Queries.
Example: Table without partitioned column in Teradata
Lets look at the Customer transaction table below which doesn’t have the PPI column. Here the primary index column is Customer_Id. So the records are distributed among the AMP based on the Customer_Id.
Assume that we are going to run the delete query with the filter condition of Transaction date.
1 2 3 4 5 6 7 |
DELETE FROM Banking_DB.Customer_Transaction WHERE Transaction_Date BETWEEN CAST('2019-01-01' AS DATE) AND CAST('2019-05-31' AS DATE); |
Since we don’t have the index on Transaction date, PE will instruct AMP’s to do a full table scan.
Example: Table with partitioned column in Teradata
Lets create the Partitioned by column for the transaction date in Customer_Transaction table and then insert the required records.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
CREATE SET TABLE Banking_DB.Customer_Transaction,NO FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT, DEFAULT MERGEBLOCKRATIO ( Customer_Id DECIMAL(18,0), Amount DECIMAL(18,0), Transaction_Date DATE FORMAT 'yyyy-mm-dd') PRIMARY INDEX (Customer_Id) PARTITION BY RANGE_N(Transaction_Date BETWEEN DATE '1999-10-01' AND DATE '2020-12-31' EACH INTERVAL '1' MONTH, NO RANGE OR UNKNOWN); |
As we defined the PPI for the Transaction_Date with INTERVAL ‘1’ MONTH, partitions are created for each month within the AMP. Each partitioning level is defined independently using a RANGE_N expression.
Now if we run the delete query on this table, each AMP reads records from their respective partition(monthly). Here Parsing engine will not instruct AMP to do a full table scan.
1 2 3 4 5 6 7 |
DELETE FROM Banking_DB.Customer_Transaction WHERE Transaction_Date BETWEEN CAST('2019-01-01' AS DATE) AND CAST('2019-05-31' AS DATE); |
- AMP1 – reads January and February month partition. It doesn’t need other months data as per the delete query.
- AMP2 – reads January,March & May month partition. It doesn’t need other months data as per the delete query.
Create Partitioned primary index in Teradata
Lets consider that Online payment industry in which millions of transactions happens for each day. If we bring those data into Teradata system, we need to define the partitioned primary index for the table to improve the performance of the query.we can create the partitions for each day, month and year as below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
CREATE SET TABLE Banking_DB.Customer_Transaction,NO FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT, DEFAULT MERGEBLOCKRATIO ( Customer_Id DECIMAL(18,0), Amount DECIMAL(18,0), Transaction_Date DATE FORMAT 'yyyy-mm-dd') PRIMARY INDEX ( Customer_Id) PARTITION BY RANGE_N(Transaction_Date BETWEEN DATE '1999-10-01' AND DATE '2008-12-31' EACH INTERVAL '1' YEAR , DATE '2009-01-01' AND DATE '2016-12-31' EACH INTERVAL '1' MONTH , DATE '2017-01-01' AND DATE '2022-12-31' EACH INTERVAL '1' DAY , NO RANGE OR UNKNOWN); |
- For historical records, we can specify range with the interval of 1 year partitions.
- Intermediate records – It is good to have range with the interval of 1 month partitions
- Recent records(frequently accessing data) – we can specify range with the interval of 1 day partitions.
Recommended Articles