Informatica Scenario Based Interview Questions
1. What is INFORMATICA? Why do we need it?
Answer: INFORMATICA is a software development firm which offers some data integration solution for ETL, data virtualization, master data management, data quality, data replica, ultra messaging, etc.
Some of the popular INFORMATICA products are:
- INFORMATICA PowerCenter
- INFORMATICA PowerConnect
- INFORMATICA Power Mart
- INFORMATICA Power Exchange
- INFORMATICA Power Analysis
- INFORMATICA Power Quality
We need INFORMATICA while working with data systems which contain data to perform certain operations along with a set of rules. INFORMATICA facilitates operations line cleaning and modifying data from structured and unstructured data systems.
2. What is a Filter Transformation and why it is an Active one?
Answer: A Filter transformation is an Active and Connected transformation that can filter rows in a mapping.
Only the rows that meet the Filter Condition pass through the Filter transformation to the next transformation in the pipeline. TRUE and FALSE are the implicit return values from any filter condition we set. If the filter condition evaluates to NULL, the row is assumed to be FALSE.
The numeric equivalent of FALSE is zero (0) and any non-zero value is the equivalent of TRUE.
As an ACTIVE transformation, the Filter transformation may change the number of rows passed through it. A filter condition returns TRUE or FALSE for each row that passes through the transformation, depending on whether a row meets the specified condition. Only rows that return TRUE pass through this transformation. Discarded rows do not appear in the session log or reject files.
3. Does an Informatica Transformation support only Aggregate expressions?
Answer: Apart from aggregate expressions Informatica Aggregator also supports non-aggregate expressions and conditional clauses.
4. How will the document be delivered to me?
Answer: Master Informatica Question and Answer Set™ will be delivered in PDF format in your email ID provided by you during the checkout process. Therefore, please ensure that you provide your email ID accurately during the buying/check-out process. Delivery generally takes place within 8 hours of purchase.
5. What kind of support is available if I have difficulty understanding the question or answer?
Answer: If you need any assistance to understand the questions or answers of the Master Informatica Question & Answer Set™, or if you have any doubt; please feel free to contact our forum. In the forum, you can post your questions with MIQA tag to draw attention to our experts who will happily provide personalized guidance to each and every query you have
6. Is Master Informatica Question & Answer set™ copyrighted?
Answer: Master Informatica Question and Answer set™ is copyrighted and only intended to be used by the buyer of the material. Any breach in the copy-rights will be subjected to legal prosecution. You are explicitly prohibited from copying or distributing the material in full or part for any purpose other than within the context of personal use
7. What Informatica product line does Master Informatica Question and Answer set™ cover?
Answer: We cover Informatica PowerCenter® Mapping Designer, Workflow Manager, Repository Manager, Workflow Monitor and PowerCenter® Administration Console (version in
8. What is the difference between Source Qualifier transformations Source Filter to Filter transformation?
Answer: SQ Source Filter Filter Transformation Source Qualifier transformation filters row when reading from a source. Filter transformation filters rows from within a mapping Source Qualifier transformation can only filter rows from Relational Sources. Filter transformation filters row coming from any type of source system in the mapping level.
Source Qualifier limits the row set extracted from a source. Filter transformation limits the row set sent to a target.
Source Qualifier reduces the number of rows used throughout the mapping and hence it provides better performance. To maximize session performance, include the Filter transformation as close to the sources in the mapping as possible to filter out unwanted data early in the flow of data from sources to targets.
The filter condition in the Source Qualifier transformation only uses standard SQL as it runs in the database. Filter Transformation can define a condition using any statement or transformation function that returns either a TRUE or FALSE value.
9. What are some examples of Informatica ETL programs?
Some basic Informatica programs are:
Mappings: A mapping is designed in the Designer. It defines all the ETL processes. Data are read from their original sources by mappings before the application of transformation logic to the read data. The transformed data is later written to the targets.
Workflows: The processes of runtime ETL are described by a collection of different tasks are known as a workflow. Workflows are designed in the Workflow Manager.
Task: This is a set of actions, commands, or functions that are executable. How an ETL process behaves during runtime can be defined by a sequence of different tasks.
10. Which development components of Informatica have the highest usage?
Answer: There are many development components in Informatica. However, these are the most widely used of them:
Expression: This can be used to transform data that have functions.
Lookups: They are extensively used to join data.
Sorter and Aggregator: This is the right tool for sorting data and aggregating them.
Java transformation: Java transformation is the choice of developers if they want to invoke variables, java methods, third-party APIs and java packages that are built-in.
Source qualifiers: Many people use this component to convert source data types to the equivalent Informatica data types.
Transaction control: If you want to create transactions and have absolute control over rollbacks and commits, count on this component to bail you out.
11. What are the uses of ETL tools?
Answer: ETL tools are quite different from other tools. They are used for performing some actions such as:
Loading important data into a data warehouse from any source known as Target.
Extracting data from a data warehouse from any sources such as database tables or files.
Transforming the data received from different sources in an organized way. Some of the notable sources where data are received include SAP solutions, Teradata, or web services.
12. What is a Joiner Transformation and why it is an Active one?
- A Joiner is an Active and Connected transformation used to join source data from the same source system or from two related heterogeneous sources residing in different locations or file systems.
- The Joiner transformation joins sources with at least one matching column. The Joiner transformation uses a condition that matches one or more pairs of columns between the two sources.
- The two input pipelines include a master pipeline and a detailed pipeline or a master and a detailed branch.
- The master pipeline ends at the Joiner transformation, while the detail pipeline continues to the target.
- In the Joiner transformation, we must configure the transformation properties namely Join Condition, Join Type and Sorted Input option to improve Integration Service performance.
- The join condition contains ports from both input sources that must match for the Integration Service to join two rows. Depending on the type of join selected, the Integration Service either adds the row to the result set or discards the row.
- The Joiner transformation produces result sets based on the join type, condition, and input data sources. Hence it is an Active transformation.
13. State the limitations where we cannot use Joiner in the mapping pipeline?
Answer: The Joiner transformation accepts input from most transformations. However, the following are the limitations:
Joiner transformation cannot be used when either of the input pipelines contains an Update Strategy transformation.
Joiner transformation cannot be used if we connect a Sequence Generator transformation directly before the Joiner transformation.
14. Define the various Join Types of Joiner Transformation?
- In a normal join, the Integration Service discards all rows of data from the master and detail source that do not match, based on the join condition.
- A master outer join keeps all rows of data from the detail source and the matching rows from the master source. It discards the unmatched rows from the master source.
- A detail outer join keeps all rows of data from the master source and the matching rows from the detail source. It discards the unmatched rows from the detail source.
- A full outer join keeps all rows of data from both the master and detail sources.
15. What is INFORMATICA Worklet?
Answer: Worklet works as a Mapplet with the feature of Reusability, the only difference is that we can apply worklet to any number of workflows in INFORMATICA, unlike mapplet. Worklet saves the logic and tasks at a single place to reuse.
Worklet is much similar to the Mapplet and is defined as the group of tasks that can be either reusable or non-reusable at the workflow level. It can be added to as much number of workflows as required. With its reusability feature, much time is saved as reusable logic can be developed once and can be placed from where it can be reused.
In the INFORMATICA Power center environment, Mapplets are considered as the most advantageous feature. Mapplets are created in Mapplet designer and are a part of the Designer tool.
- It basically contains a set of transformations that are designed to be reused in multiple mapping.
- Mapplets are said to be reusable objects which simplify mapping by:
- Including multiple transformations and source definitions.
- Not required to connect to all input and output ports.
- Accept data from sources and pass to multiple transformations
- Well, overall when it is required to reuse the mapping logic then the logic should be placed in Mapplet.
16. What is Code page Compatibility?
Answer: Transfer of data takes place from one code page to another keeping that both code pages have the same character sets then data failure cannot occur.
17. Write the prerequisite tasks to achieve the session partition?
Answer: In order to perform session partition, one needs to configure the session to partition source data and then installing the Informatica server machine in multifold CPU’s.
18. In informatics server Which files are created during the session rums?
Answer: Errors log, Bad file, Workflow low and session log namely files are created during the session rums.Company.
19. How to implement Security Measures using Repository manager?
Answer: There are 3 ways to implement security measures.
- Folder Permission within owner, groups, and users.
- Locking (Read, Write, Retrieve, Save and Execute).
- Repository Privileges viz.
- Browse Repository.
- Use Workflow Manager(To create session and batches and set its properties).
- Workflow Operator(To execute Session and batches).
- Use Designer, Admin Repository(Allows any user to create and manage repository).
- Admin User(Allows the user to create a repository server and set its properties).
- SuperUser(All the privileges are granted to the user).
20. What are the types of groups in router transformation?
- Input group
- Output group
- Default group
- Learn Transformations In Depth
21. How can we store previous session logs?
Answer: If you run the session in the time stamp mode then automatically session log out will not overwrite the current session log.
- Go to Session Properties –> Config Object –> Log Options
- Select the properties as follows:
- Save session log by –> SessionRuns
- Save session log for these runs –> Change the number that you want to save the number of log files (Default is 0)
- If you want to save all of the log files created by every run, and then select the option Save session log for these runs –> Session TimeStamp
- You can find these properties in the session/workflow Properties.
22. Suppose we configure Sorter transformations in the master and detail pipelines with the following sorted ports in order: ITEM_NO, ITEM_NAME, PRICE. When we configure the join condition, what are the guidelines we need to follow to maintain the sort order?
Answer: If we have sorted both the master and detail pipelines in order of the ports say ITEM_NO, ITEM_NAME and PRICE we must ensure that:
- Use ITEM_NO in the First Join Condition.
- If we add a Second Join Condition, we must use ITEM_NAME.
- If we want to use PRICE as a Join Condition apart from ITEM_NO, we must also use ITEM_NAME in the Second Join Condition.
- If we skip ITEM_NAME and join on ITEM_NO and PRICE, we will lose the input sort order and the Integration Service fails the session.
23. How an Expression Transformation differs from Aggregator Transformation?
Answer: An Expression Transformation performs a calculation on a row-by-row basis. An Aggregator Transformation performs calculations on groups.
24. What is Incremental Aggregation?
Answer: We can enable the session option, Incremental Aggregation for a session that includes an Aggregator Transformation. When the Integration Service performs incremental aggregation, it actually passes changed source data through the mapping and uses the historical cache data to perform aggregate calculations incrementally.
For reference check Implementing Informatica Incremental Aggregation.
25. What are the performance considerations when working with Aggregator Transformation?
- Filter the unnecessary data before aggregating it. Place a Filter transformation in the mapping before the Aggregator transformation to reduce unnecessary aggregation.
- Improve performance by connecting only the necessary input/output ports to subsequent transformations, thereby reducing the size of the data cache.
- Use Sorted input which reduces the amount of data cached and improves session performance.
26. What will happen if the SELECT list COLUMNS in the Custom override SQL Query and the OUTPUT PORTS order in SQ transformation do not match?
Answer: Mismatch or Changing the order of the list of selected columns to that of the connected transformation output ports may result in session failure.
27. What are the different ways to implement parallel processing in Informatica?
Answer: We can implement parallel processing using various types of partition algorithms:
Database partitioning: The Integration Service queries the database system for table partition information. It reads partitioned data from the corresponding nodes in the database.
Round-Robin Partitioning: Using this partitioning algorithm, the Integration service distributes data evenly among all partitions. It makes sense to use round-robin partitioning when you need to distribute rows evenly and do not need to group data among partitions.
Hash Auto-Keys Partitioning: The Powercenter Server uses a hash function to group rows of data among partitions. When the hash auto-key partition is used, the Integration Service uses all grouped or sorted ports as a compound partition key. You can use hash auto-keys partitioning at or before Rank, Sorter, and unsorted Aggregator transformations to ensure that rows are grouped properly before they enter these transformations.
Hash User-Keys Partitioning: Here, the Integration Service uses a hash function to group rows of data among partitions based on a user-defined partition key. You can individually choose the ports that define the partition key.
Key Range Partitioning: With this type of partitioning, you can specify one or more ports to form a compound partition key for a source or target. The Integration Service then passes data to each partition depending on the ranges you specify for each port.
Pass-through Partitioning: In this type of partitioning, the Integration Service passes all rows from one partition point to the next partition point without redistributing them.
28. Mention a few design and development best practices for Informatica?
- Mapping design tips: Standards – sticking to consistent standards is beneficial in the long run. This includes naming conventions, descriptions, environment settings, parameter files, documentation, among others.
- Reusability – in order to react quickly to potential changes, use Informatica components like applets, worklets, and reusable transformations.
- Scalability – when designing and developing mappings, it is a good practice to keep volumes in mind. This is caching, queries, partitioning, initial vs incremental loads.
- Simplicity – it is recommended to create multiple mappings instead of a few complex ones. Use Staging Area and try to keep the processing logic as clear and simple as possible.
- Modularity – use the modular design technique (common error handling, reprocessing).
- Mapping development best practices
- Source Qualifier – use shortcuts, extract only the necessary data, limit the reading of columns and rows on the source. Try to use the default query options (User Defined Join, Filter) instead of using SQL Query override which may impact database resources and make unable to use partitioning and push-down.
- Expressions – use local variables to limit the number of redundant calculations, avoid datatype conversions, reduce invoking external scripts (coding outside of Informatica), provide comments, use operators (||, +, /) instead of functions. Keep in mind that numeric operations are generally faster than string operations.
- Filter – use the Filter transformation as close to the source as possible. If multiple filters need to be applied, usually it’s more efficient to replace them with Router.
- Aggregator – use sorted input, also use as early (close to the source) as possible and filter the data before aggregating.
- Joiner – try to join the data in Source Qualifier wherever possible, and avoid outer joins. It is good practice to use a source with fewer rows, such as a Master source.
- Lookup – relational lookup should only return ports that meet the condition. Call Unconnected Lookup in expression (IIF). Replace large lookup tables with joins whenever possible. Review the database objects and add indexes to database columns when possible. Use Cache Calculator in session to eliminate paging in lookup cache.
29. What is Enterprise Data Warehousing?
Answer: When a large amount of data is assembled at a single access point then it is called as Enterprise Data Warehousing. This data can be reused and analyzed at regular intervals or as per the need of the time requirement.
Considered as the central database or say a single point of access, Enterprise data warehousing provides a complete global view and thus helps in decision support.
It can be more understood from the following points which define its features:
- All important business information stored in this unified database can be accessed from anywhere across the organization.
- Although the time required is more, periodic analysis on this single source always produces better results.
- Security and integrity of data are never compromised while making it accessible across the organization.
30. How can we use Batches?
Answer: Batches are the collection of sessions which is used to migrate the data from the source to target on a server. Batches can have the largest number of sessions in it but they cause more network traffic whereas less number of sessions in a batch can be moved rapidly.
31. What is a Sequence Generator Transformation?
Answer: A Sequence Generator transformation is a Passive and Connected transformation that generates numeric values. It is used to create unique primary key values, replace missing primary keys, or cycle through a sequential range of numbers. This transformation by default contains ONLY Two OUTPUT ports namely CURRVAL and NEXTVAL. We cannot edit or delete these ports neither we cannot add ports to this unique transformation. We can create approximately two billion unique numeric values with the widest range from 1 to 2147483647.
32. What is Decode in INFORMATICA?
Answer: To understand Decode in an easy way, let’s consider it as similar to the CASE statement in SQL. It is basically the function that is used by an expression transformation in order to search a specific value in a record.
There can be unlimited searches within the Decode function where a port is specified for returning result values. This function is usually used in cases where it is required to replace nested IF statements or to replace lookup values by searching in small tables with constant values.
Decode is a function that is used within Expression Transformation. It is used just like CASE Statement in SQL to search a specific record.
Below is a simple example of a CASE in SQL:
WHEN EMPLOYEE_AGE <= 20 THEN ‘Young’ WHEN EMPLOYEE_AGE > 30 AND AGE <= 40 THEN ‘Knowledgeable’ WHEN EMPLOYEE_AGE > 40 AND AGE = 60 THEN ‘Wise’
ELSE ‘Very Wise’
END AS EMPLOYEE_WISDOM
33. What is Mapping?
Answer: Mapping is a collection of source and targets which are linked with each other through certain sets of transformations such as Expression Transformation, Sorter Transformation, Aggregator Transformation, Router Transformation, etc.
34. What is Rank Transformation?
1. This a type of an active T/R which allows you to find out either top performance or bottom performers.
2. Rank T/R is created with the following types of the port:
i. Input Port (I)
ii. Output Port (O)
iii. Rank Port (R)
iv. Variable Port (V)
35. What are the transformations that cannot be placed between the sort origin and the Joiner transformation so that we do not lose the input sort order?
Answer: The best option is to place the Joiner transformation directly after the sort origin to maintain sorted data. However, do not place any of the following transformations between the sort origin and the Joiner transformation:
XML Parser transformation
XML Generator transformation
Mapplet [if it contains any one of the above-mentioned transformations]
36. What is the format of INFORMATICA objects in a repository? What are the databases that INFORMATICA can connect to Windows?
Answer: INFORMATICA objects can be written in XML format.
Following is the list of databases that INFORMATICA can connect to:
- SQL Server
- MS Access
- MS Excel
37. How does the Aggregator Transformation handle NULL values?
Answer: By default, the aggregator transformation treats null values as NULL in aggregate functions. But we can specify to treat null values in aggregate functions is NULL or zero.
38. What differs when we choose the Sorted Input for Aggregator Transformation?
Answer: Integration Service creates the index and data caches files in memory to process the Aggregator transformation. If the Integration Service requires more space as allocated for the index and data cache sizes in the transformation properties, it stores overflow values in cache files i.e. paging to disk. One way to increase session performance is to increase the index and data cache sizes in the transformation properties. But when we check Sorted Input the Integration Service uses memory to process an Aggregator transformation it does not use cache files.
39. Under what conditions selecting Sorted Input in aggregator will still not boost session performance?
- Incremental Aggregation, session option is enabled.
- The aggregate expression contains nested aggregate functions.
- Source data is data-driven.
40. Suppose we have a Source Qualifier transformation that populates two target tables. How do you ensure TGT2 is loaded after TGT1?
Answer: In the Workflow Manager, we can Configure Constraint-based load ordering for a session. The Integration Service orders the target load on a row-by-row basis. For every row generated by an active source, the Integration Service loads the corresponding transformed row first to the primary key table, then to the foreign key table.
Hence if we have one Source Qualifier transformation that provides data for multiple target tables having primary and foreign key relationships, we will go for Constraint-based load ordering.
Revisiting Filter Transformation
41. How can we create Index after completion of Load Process?
Answer: Command Tasks are used to create the Index. Command Task scripts can be used in a session of the workflow to create an index.
42. What are Sessions?
Answer: Session is a Set of Instructions that are used while moving data from the source to destination. We can partition the session to implement several sequences of sessions to improve server performance.
After creating session we can use server manager or command-line program pmcmd to stop or start the session.
43. What is the benefit of Session Partitioning?
Answer: While Integration Service is running in the environment the workflow is partitioned for better performance. These partitions are then used to perform Extraction, Transformation, and Loading.
44. What is a predefined event?
Answer: Predefined event is the file-watch event. It waits for a specific file to arrive at a specific location.
45. What is Joiner Transformation in INFORMATICA?
Answer: With the help of Joiner Transformation, you can make use of Joins in INFORMATICA.
It is based on two sources namely:
- Master Source
- Detail Source
- Following joins can be created using Joiner transformation as in SQL.
- Normal Join
- Full Outer Join
- Master outer join(Right Outer Join)
- Detail outer join(Left Outer Join)
46. What is Aggregator Transformation in INFORMATICA?
Answer: Aggregator Transformation can be active or connected. It works as the GROUP BY clause in SQL. It is useful to perform aggregate calculations on groups in INFORMATICA PowerCenter. It performs an aggregate calculation on data using aggregate type function viz. SUM, AVG, MAX, and MIN.
47. What is the Sequence Generator Transformation in INFORMATICA?
Answer: Sequence Generator Transformation can be Passive or Connected. Its basic use is to generate integer value with NEXTVAL and CURRVAL.
48. What is Rank Index in Rank Transformation?
Answer: Rank Index is assigned by the task designer to each record. The rank index port is used to store ranking position for each row. Rank Transformation identifies each row from the top to bottom and then assigns Rank Index.