How to Perform Incremental Load In Informatica IICS

When managing huge data sets in ETL tasks, the easy and effective approach involves processing only the data that has been newly added or modified since the last execution, rather than reprocessing the entire dataset in every cycle.

This approach to handling newly added or modified data typically results in shorter processing times, reduces risks, and maintains the historical accuracy of the data.

Let’s explore how this approach can be implemented within the framework of Informatica Cloud (IICS).

Incremental data loading refers to the practice of selectively loading data that has either been updated or newly created from the source system to the target system.

This differs from a full data load, where the entirety of the data is processed with each loading cycle.

Uses of Incremental Data Loading.

  • Focusing on selective data from the source system minimizes the overhead on the ETL process, thereby reducing the overall runtime.
  • The likelihood of errors or failures in any ETL load process is significant due to various factors. However, as only selective data is processed, the probability of encountering risks is diminished.
  • Preserving the historical accuracy of the data facilitates straightforward tracking of the processed data throughout specific time periods.

How to Implement Incremental Load in IICS.

There exist several approaches to implementing Incremental Data Loading in Informatica Cloud, with each method offering its own unique advantages.

Before delving into the specifics of each method, let’s establish the source and target configurations that will be utilized in the examples outlined below.

Let's Create Source and Target Tables for Demo.

The table EMPLOYEE serves as the source, containing employee information. The fields Created_date_time and Modified_date_time are set to systimestamp by default, indicating that the system timestamp is automatically inserted whenever a record is created.

CREATE TABLE EMPlOYEES (
EMPLOYEE_ID NUMBER,
EMP_NAME VARCHAR2(255),
SALARY NUMBER,
DEPARTMENT_ID NUMBER,
ACTIVE_FLAG VARCHAR2(255),
DATE_CREATED TIMESTAMP DEFAULT SYSTIMESTAMP,
DATE_MODIFIED TIMESTAMP DEFAULT SYSTIMESTAMP
);

However, it’s essential for the Date_Modified to update whenever the record is modified. To accomplish this, let’s implement a straightforward trigger for that field.

CREATE OR REPLACE
TRIGGER DATE_MODIFIED_TRIGGER
BEFORE INSERT OR UPDATE ON EMPLOYEES
FOR EACH ROW
BEGIN
:new.DATE_MODIFIED := SYSTIMESTAMP;
END;

Afterward, whenever a record is created, the system timestamp is loaded into both date_Created and date_Modified. However, when a record is updated, the date_Created remains unchanged while the date_Modified is updated to the current timestamp.

Let’s add some sample data to the table.

INSERT INTO EMPLOYEE(EMPLOYEE_ID,EMP_NAME,SALARY,DEPARTMENT_ID,ACTIVE_FLAG) VALUES(1000,’Ramu’,5400,10,’Y’);
INSERT INTO EMPLOYEE(EMPLOYEE_ID,EMP_NAME,SALARY,DEPARTMENT_ID,ACTIVE_FLAG) VALUES(1001,’Krishna’,23000,20,’Y’);
INSERT INTO EMPLOYEE(EMPLOYEE_ID,EMP_NAME,SALARY,DEPARTMENT_ID,ACTIVE_FLAG) VALUES(1002,’Gopal’,5000,20,’Y’);
INSERT INTO EMPLOYEE(EMPLOYEE_ID,EMP_NAME,SALARY,DEPARTMENT_ID,ACTIVE_FLAG) VALUES(1003,’Debarchan’, 61000,30,’Y’);
INSERT INTO EMPLOYEE(EMPLOYEE_ID,EMP_NAME,SALARY,DEPARTMENT_ID,ACTIVE_FLAG) VALUES(1004,’Alex’,2100,30,’Y’);
INSERT INTO EMPLOYEE(EMPLOYEE_ID,EMP_NAME,SALARY,DEPARTMENT_ID,ACTIVE_FLAG) VALUES(1005,’Mark’,3900,30,’Y’);
INSERT INTO EMPLOYEE(EMPLOYEE_ID,EMP_NAME,SALARY,DEPARTMENT_ID,ACTIVE_FLAG) VALUES(1006,’Dom’,2700,30,’Y’);
INSERT INTO EMPLOYEE(EMPLOYEE_ID,EMP_NAME,SALARY,DEPARTMENT_ID,ACTIVE_FLAG) VALUES(1007,’Furious’,3600,30,’Y’);
INSERT INTO EMPLOYEE(EMPLOYEE_ID,EMP_NAME,SALARY,DEPARTMENT_ID,ACTIVE_FLAG) VALUES(1008,’Kale’,1500,30,’Y’);

Select * from Employees;

placeholder

NOTE: The Date_Created and Date_Modified fields in the EMPPLOYEES table are automatically populated.

Now, let’s create a target table to load the data and monitor the incremental changes.

CREATE TABLE EMPlOYEES_TGT (
EMPLOYEE_ID NUMBER,
EMP_NAME VARCHAR2(255),
SALARY NUMBER,
DEPARTMENT_ID NUMBER,
ACTIVE_FLAG VARCHAR2(255),
DATE_CREATED,
DATE_MODIFIED
);

NOTE: Unlike the source table, the Date_Created and Date_Modified do not have default values assigned.

Implementing incremental data loading using IICS system variables.

IICS provides access to the following system variables, which can be used as data filter variables to filter newly inserted or updated records.

$LastRunDate returns only the date of the last successful execution of the task.

$LastRunTime returns the timestamp of the last successful execution of the task.

To learn more about these system variables, refer to this article.

 

Mapping Design For Implementing Incremental Load.

placeholder

Choose the EMPLOYEES table as the source and specify the filter condition on the Date_Modified field as demonstrated below

placeholder

The data in the Oracle table EMPLOYEES is stored in IST (Indian Standard Time). but The system variable $LastRunTime is stored in GMT (Greenwich Mean Time).

Therefore, if the data we are processing is not stored in GMT timezone, we need to apply the appropriate conversion factor to the variable.

Since IST is five and a half hours ahead of GMT timezone, the Oracle conversion factor of 5.5/24 is added.

For the initial run, the default value of $LastRunTime will be ‘1970-01-01 00:00:00’. (Low end Date).

In the target transformation, select EMPLOYEES_TGT as the target table as mentioned below. Map the source fields in the Field Mapping section and save the mapping.

placeholder

Explanation

After saving and triggering the mapping, all 9 records from the EMPLOYEES table will be loaded into the EMPLOYEES_TGT table.

placeholder

Internally Informatica IICS will trigger the following query. we can check this in the session log.

placeholder

Now lets update few records in the source table and see by rerunning the mapping like how it behaves.

UPDATE EMPLOYEES SET ACTIVE_FLAG = ‘N’ where EMPLOYEE_ID = ‘1002’;

We can notice that the Date_Modified is updated for ID 1002 in EMPLOYEES table.

placeholder

FAQ's

Incremental Loading is a data integration technique used to update a target dataset with only the changes that have occurred in the source since the last load, rather than reloading the entire dataset each time.

Incremental Loading reduces the time and resources required for data integration processes by only processing and transferring the changed data, thus improving efficiency and reducing the risk of errors.

Incremental Loading in IICS typically involves comparing the source and target datasets using key fields or timestamps to identify the changed records. Then, only the changed records are extracted, transformed (if necessary), and loaded into the target dataset.

  • Faster data integration processes: By processing only the changed data, Incremental Loading reduces processing time.
  • Reduced resource consumption: Less data to process means lower resource utilization, such as network bandwidth and computational resources.
  • Improved data consistency: Incremental Loading ensures that the target dataset is consistently updated with the latest changes from the source.
  • Timestamp-based Incremental Loading: Using a timestamp field to identify records that have been updated or added since the last load.
  • Key-based Incremental Loading: Comparing unique identifiers (keys) between the source and target datasets to identify changed records.
  • CDC (Change Data Capture) Incremental Loading: Leveraging CDC mechanisms provided by databases to capture and process only changed data
  • Select appropriate key fields or timestamps for identifying changed records.
  • Implement error handling and logging to track the status of Incremental Loading processes.
  • Regularly monitor and optimize Incremental Loading performance to ensure efficient data integration.

Yes, Incremental Loading can be applied to various types of data sources and targets supported by IICS, including databases, files, cloud applications, and more.

  • Data consistency: Ensuring that the Incremental Loading process captures all changes accurately and maintains data consistency.
  • Performance overhead: Depending on the volume and frequency of changes, Incremental Loading may introduce additional overhead compared to full data loads.
  • Complexity: Implementing and maintaining Incremental Loading processes may require additional configuration and monitoring efforts.

Leave a Reply