Informatica Advanced Interview Questions And Answers
1. What is Informatica PowerCenter?
Answer: InformaticaPowerCenter is one of the Enterprise Data Integration products developed by Informatica Corporation. InformaticaPowerCenter is an ETL tool used for extracting data from the source, transforming and loading data into the target. The Extraction part involves understanding, analyzing and cleaning of the source data.
The transformation part involves cleaning of the data more precisely and modifying the data as per the business requirements.
The loading part involves assigning the dimensional keys and loading them into the warehouse.
2. What is Mapplet in INFORMATICA?
Answer: Mapplet is a reusable object in INFORMATICA that contains a certain set of rules for transformation and transformation logic that can be used in multiple mappings. Mapplet is created in the Mapplet Designer in the Designer Tool.
3. What is the Session task and Command task?
Answer: Session Task is a set of instructions that are to be applied while transferring data from source to target using Session Command. Session Command can be either a pre-session command or post-session command.
Command Task is a specific task that allows one or multiple shell commands of UNIX to run in Windows during the workflow.
4. What is a Standalone command task?
Answer: The standalone command task can be used to run Shell Command anywhere and anytime in the workflow.
5. What is Workflow? What are the components of the Workflow Manager?
Answer: Workflow is the way of a manner in which the task should be implemented. It is a collection of instructions that inform the server about how to implement the task.
Given below are the three major components of the Workflow Manager:
6. Describe the scenarios where we go for Joiner transformation instead of Source Qualifier transformation?
Answer: While joining Source Data of heterogeneous sources as well as to join flat files we will use the Joiner transformation. Use the Joiner transformation when we need to join the following types of sources:
Join data from different Relational Databases.
Join data from different Flat Files.
Join relational sources and flat files.
7. Describe the impact of several join conditions and join order in a Joiner Transformation?
Answer: We can define one or more conditions based on equality between the specified master and detail sources. Both ports in a condition must have the same datatype.
If we need to use two ports in the join condition with non-matching datatypes we must convert the datatypes so that they match. The Designer validates datatypes in a join condition.
Additional ports in the join condition increase the time necessary to join two sources.
The order of the ports in the join condition can impact the performance of the Joiner transformation. If we use multiple ports in the join condition, the Integration Service compares the ports in the order we specified.
8. What are the different LookUp Caches?
INFORMATICA Lookup Can be either cached or uncached. It is divided into five parts.
- Static Cache
- Dynamic Cache
- Persistent Cache
- Shared Cache
Static Cache remains as it is without change while a session is running.
Dynamic Cache keeps updating frequently while a session is running.
9. What is Source Qualifier Transformation in INFORMATICA?
Answer: Source Qualifier Transformation is useful in Mapping, whenever we add relational flat files it is automatically created. It is an active and connected transformation that represents those rows which are read by integration service. ( tableau training videos )
10. Suppose we have two Source Qualifier transformations SQ1 and SQ2 connected to Target tables TGT1 and TGT2 respectively. How do you ensure TGT2 is loaded after TGT1?
Answer: If we have multiple Source Qualifier transformations connected to multiple targets, we can designate the order in which the Integration Service loads data into the targets.
In the Mapping Designer, We need to configure the Target Load Plan based on the Source Qualifier transformations in mapping to specify the required loading order. ( data science online training )
11. Write the different tools in the workflow manager?
The different tools in the workflow manager are:
- Task Developer
- Task Designer
- Workflow Designer
12. Differentiate between a repository server and a powerhouse?
Answer: Repository server mainly guarantees repository reliability and uniformity while the powerhouse server tackles the execution of many procedures between the factors of the server’s database repository.
13. What do you understand by a term domain?
Answer: Domain is the term in which all interlinked relationships and nodes are undertaken by the sole organizational point.
14. What is the use of an aggregator cache file?
Answer: If extra memory is needed aggregator provides extra cache files for keeping the transformation values. It also keeps the transitional value that is there in the local buffer memory. ( hadoop training online )
15. What is the Advantage of Informatica?
Answer: Its GUI tool, Coding in any graphical tool is generally faster than hand-code scripting.
Can communicate with all major data sources (mainframe/RDBMS/Flat Files/XML/VSM/SAP etc).
Can handle very large/huge data very effectively?
Users can apply Mappings, extract rules, cleansing rules, transformation rules, aggregation logic, and loading rules are in separate objects in an ETL tool. Any change in any of the objects will give the minimum impact of another object.
Reusability of the object (Transformation Rules)
Informatica has different “adapters” for extracting data from packaged ERP applications (such as SAP or PeopleSoft).
Availability of resources in the market.
It can be run on Window and Unix environment.
16. What are pre and post-session shell commands?
Answer: Command task can be called the pre or post-session shell command for a session task. One can run it as pre-session command r post-session success command or post-session failure command.
17. How we can create indexes after completing the loan process?
Answer: With the help of the command task at session-level we can create indexes after the loading procedure. ( devops training )
18. What are the advantages of using Informatica as an ETL tool over Teradata?
Answer: First up, Informatica is a data integration tool, while Teradata is an MPP database with some scripting (BTEQ) and fast data movement (load, FastLoad, Parallel Transporter, etc) capabilities. Informatica over Teradata1) Metadata repository for the organization’s ETL ecosystem.
Informatica jobs (sessions) can be arranged logically into worklets and workflows in folders.
This leads to an ecosystem that is easier to maintain and quicker for architects and analysts to analyze and enhance.2) Job monitoring and recovery-
Easy to monitor jobs using Informatica Workflow Monitor.
Easier to identify and recover in case of failed jobs or slow running jobs.
Ability to restart from failure row/step.3) InformaticaMarketPlace- one-stop-shop for lots of tools and accelerators to make the SDLC faster, and improve application support.4) Plenty of developers in the market with varying skill levels and expertise5) Lots of connectors to various databases, including support for Teradata mode, trump, FastLoad, and Parallel Transporter in addition to the regular (and slow) ODBC drivers. Some ‘exotic’ connectors may need to be procured and hence could cost extra. Examples – Power Exchange for Facebook, Twitter, etc which source data from such social media sources.6) Surrogate key generation through shared sequence generators inside Informatica could be faster than generating them inside the database.7) If the company decides to move away from Teradata to another solution, then vendors like Infosys can execute migration projects to move the data, and change the ETL code to work with the new database quickly, accurately and efficiently using automated solutions.8) Pushdown optimization can be used to process the data in the database.9) Ability to code ETL such that processing load is balanced between ETL server and the database box – useful if the database box is aging and/or in case the ETL server has a fast disk/ large enough memory & CPU to outperform the database in certain tasks.10) Ability to publish processes as web services.Teradata over Informatica
Cheaper (initially) – No initial ETL tool license costs (which can be significant), and lower OPEX costs as one doesn’t need to pay for yearly support from Informatica Corp.
A great choice if all the data to be loaded is available as structured files – which can then be processed inside the database after an initial stage load.
Good choice for a lower complexity ecosystem
Only Teradata developers or resources with good ANSI/Teradata SQL / BTEQ knowledge required to build and enhance the system.
19. How to elaborate PowerCenter Integration Service?
Answer: Integration Services control the workflow and execution of PowerCenter processes.
There are three components of INFORMATICA Integration Services as shown in the below figure.
Powercenter Integration Service
Integration Service Process: It is called as the preserver, Integration Service can start more than one processes to monitor the workflow.
Load Balancing: Load Balancing refers to distributing the entire workload across several nodes in the grid. Load Balancer conducts different tasks that include commands, sessions, etc.
Data Transformation Manager(DTM): Data Transformation Manager allows to perform the following data transformations:
Active: To change the number of rows in the output.
Passive: Cannot change the number of rows in the output.
Connected: Link to the other transformation.
Unconnected: No link to other transformations.
20. What is PowerCenter on Grid?
Answer: INFORMATICA has the feature of Grid Computing which can be utilized for the largest data scalability to the performance. The grid feature is used for load balancing and parallel processing.
PowerCenter domains contain a set of multiple nodes to configure the workload and then run it on the Grid.
Informatica- PowerCenter on Grid…
A domain is a foundation for efficient service administration served by the PowerCenter.
Node is an independent physical machine that is logically represented for running the PowerCenter environment.
21. Name the different lookup cache(s)?
Answer: Informatica lookups can be cached or un-cached (no cache). Cached lookups can be either static or dynamic. A lookup cache can also be divided as persistent or non-persistent based on whether Informatica retains the cache even after completing the session run or if it deletes it.
- Static cache
- Dynamic cache
- Persistent cache
- Shared cache
22. What are the various types of transformation?
- Aggregator transformation
- Expression transformation
- Filter transformation
- Joiner transformation
- Lookup transformation
- Normalizer transformation
- Rank transformation
- Router transformation
- Sequence generator transformation
- Stored procedure transformation
- Sorter transformation
- Update strategy transformation
- XML source qualifier transformation
23. When do you use SQL override in a lookup transformation?
Answer: You should override the lookup query in the following circumstances:
Override the ORDER BY clause. Create the ORDER BY clause with fewer columns to increase performance. When you override the ORDER BY clause, you must suppress the generated ORDER BY clause with a comment notation.
Note: If you use pushdown optimization, you cannot override the ORDER BY clause or suppress the generated ORDER BY clause with a comment notation.
A lookup table name or column names contains a reserved word. If the table name or any column name in the lookup query contains a reserved word, you must ensure that they are enclosed in quotes.
Use parameters and variables. Use parameters and variables when you enter a lookup SQL override. Use any parameter or variable type that you can define in the parameter file. You can enter a parameter or variable within the SQL statement, or use a parameter or variable as the SQL query. For example, you can use a session parameter, $ParamMyLkpOverride, as the lookup SQL query, and set $ParamMyLkpOverride to the SQL statement in a parameter file. The designer cannot expand parameters and variables in the query override and does not validate it when you use a parameter or variable. The integration service expands the parameters and variables when you run the session. ( data science training online )
A lookup column name contains a slash (/) character. When generating the default lookup query, the designer and integration service replace any slash character (/) in the lookup column name with an underscore character. To query lookup column names containing the slash character, override the default lookup query, replace the underscore characters with the slash character, and enclose the column name in double-quotes.
Add a WHERE clause. Use a lookup SQL override to add a WHERE clause to the default SQL statement. You might want to use the WHERE clause to reduce the number of rows included in the cache. When you add a WHERE clause to a Lookup transformation using a dynamic cache, use a Filter transformation before the Lookup transformation to pass rows into the dynamic cache that match the WHERE clause.
Note: The session fails if you include large object ports in a WHERE clause.
Other. Use a lookup SQL override if you want to query lookup data from multiple lookups or if you want to modify the data queried from the lookup table before the Integration Service caches the lookup rows. For example, use TO_CHAR to convert dates to strings.
24. What is the difference between a repository server and a powerhouse?
Answer: Repository server controls the complete repository which includes tables, charts, and various procedures, etc.
A powerhouse server governs the implementation of various processes among the factors of the server’s database repository.
25. What are the transformations that are not supported in Mapplet?
Answer: Normalizer, Cobol sources, XML sources, XML Source Qualifier transformations, Target definitions, Pre- and post-session Stored Procedures, Other Mapplets.
26. Describe Data Concatenation?
Answer: Data concatenation is the bringing of different pieces of the record together.
27. Differentiate between sessions and batches?
Answer: Session is a set of commands for the server to move data to the target.
Batch is a set of tasks that can include one or more tasks.
28. What are data-driven sessions?
Answer: When you configure a session using an updated strategy, the session property data-driven instructs the Informatica server to use the instructions coded in mapping to flag the rows for insert, update, delete or reject. This is done by mentioning DD_UPDATE or DD_INSERT or DD_DELETE in the update strategy transformation.
“Treat source rows as” property in session is set to “Data-Driven” by default when using an update strategy transformation in a mapping.
29. What is the need for an ETL tool?
Answer: The problem comes with traditional programming languages where you need to connect to multiple sources and you have to handle errors. For this, you have to write complex code. ETL tools provide a ready-made solution for this. You don’t need to worry about handling these things and can concentrate only on coding the required part. ( python training )
30. Which is the T/R that builts only single cache memory?
Answer: Rank can build two types of cache memory. But the sorter always built only one cache memory.
– The cache is also called Buffer.
31. What is a Snowflake Schema?
A large denormalized dimension table is splitted into multiple normalized dimensions.
Select Query performance increases.
Maintenance cost increases due to more no. of tables.
32. What is Mapping Debugger?
– A debugger is a tool. By using this we can identify records are loaded or not and correct data is loaded or not from one T/R to other T/R.
– Session succeeded but records are not loaded. In this situation, we have to use the Debugger tool.
33. What is a Repository Manager?
Answer: It is a GVI based administrative client which allows performing the following administrative tasks:
1. Create, edit and delete folders.
2. Assign users to access the folders with reading, write and execute permissions.
3. Backup and Restore repository objects.
34. Scheduling a Workflow?
1. A schedule is automation of running the workflow at a given date and time.
2. There are 2 types of schedulers:
(i) Reusable scheduler
(ii) Non Reusable scheduler
(i) Reusable scheduler:-
A reusable scheduler can be assigned to multiple workflows.
(ii) Non-Reusable scheduler:-
– A nonreusable scheduler is created specifically for the workflow.
– A nonreusable scheduler can be converted into a reusable scheduler.
The following are the 3rd party schedulers:
1. Cron (Unix based scheduling process)
3. Control M
6. WLM (work hard manager)
– 99% of the production people will do the scheduling.
– Before we run the workflow manually. Through scheduling, we run workflow this is called Auto Running.
35. What is Workflow Monitor?
i. It is a GUI based client application that allows users to monitor ETL objects running an ETL Server.
ii. Collect runtime statistics such as:
a. No. of records extracted.
b. No. of records loaded.
c. No. of records rejected.
d. Fetch session loge. Throughput
– Complete information can be accessed from the workflow monitor.
– For every session, one log file is created.
36. If Informatica has its scheduler why using third party scheduler?
Answer: The client uses various applications (mainframes, oracle apps use Tivoli scheduling tool) and integrate different applications & scheduling that applications it is very easy by using third party schedulers.
37. What is a Dimensional Model?
1. Data Modeling:- It is a process of designing the database by fulfilling business requirements specifications.
2. A Data Modeler (or) Database Architect Designs the warehouse Database using a GUI based data modeling tool called “Erwin”.
3. Erwin is a data-modeling tool from Computer Associates (A).
4. Dimensional modeling consists of following types of schemas designed for Datawarehouse:
a. Star Schema.
b. Snowflake Schema.
c. Gallery Schema.
5. A schema is a data model that consists of one or more tables.
38. What are the new features of Informatica 9.x at the developer level?
Answer: From a developer’s perspective, some of the new features in Informatica 9.x are as follows:
Now Lookup can be configured as an active transformation – it can return multiple rows on a successful match
Now you can write SQL override on an un-cached lookup also. Previously you could do it only on cached lookup
You can control the size of your session log. In a real-time environment, you can control the session log file size or time
Database deadlock resilience feature – this will ensure that your session does not immediately fail if it encounters any database deadlock, it will now retry the operation again. You can configure several retry attempts.
39. Suppose we do not group by on any ports of the aggregator what will be the output?
Answer: If we do not group values, the Integration Service will return only the last row for the input rows.
40. Give one example for each of Conditional Aggregation, Non-Aggregate expression, and Nested Aggregation?
Answer: Use conditional clauses in the aggregate expression to reduce the number of rows used in the aggregation. The conditional clause can be any clause that evaluates to TRUE or FALSE.
SUM( SALARY, JOB = CLERK )
Use non-aggregate expressions in the group by ports to modify or replace groups.
IIF( PRODUCT = Brown Bread, Bread, PRODUCT )
The expression can also include one aggregate function within another aggregate function, such as:
MAX( COUNT( PRODUCT )
41. What is a Rank Transform?
Answer: Rank is an Active Connected Informatica transformation used to select a set of top or bottom values of data.
42. How does a Rank Transform differ from Aggregator Transform functions MAX and MIN?
Answer: Like the Aggregator transformation, the Rank transformation lets our group information. The Rank Transform allows us to select a group of top or bottom values, not just one value as in the case of Aggregator MAX, MIN functions.
43. What are the restrictions of Union Transformation?
1. All input groups and the output group must have matching ports. The precision, datatype, and scale must be identical across all groups.
2. We can create multiple input groups, but only one default output group.
3. The Union transformation does not remove duplicate rows.
4. We cannot use a Sequence Generator or Update Strategy transformation upstream from a Union transformation.
5. The Union transformation does not generate transactions.
44. What is Persistent Lookup Cache?
Answer: Lookups are cached by default in Informatica. Lookup cache can be either non-persistent or persistent. The Integration Service saves or deletes lookup cache files after a successful session run based on whether the Lookup cache is checked as persistent or not.