Informatica ETL Testing Interview Questions and Answers
Are you looking for Informatica ETL Testing Interview Questions for Fresher’s or experienced, you are at best place. There are plenty of possibilities for various reputed corporations in the business. According to analysis Informatica ETL has a business share of approximately 3.0%. Therefore, you further have a chance to lead ahead in your profession in Informatica ETL Development. SVR Technologies gives the latest Informatica ETL Interview Questions that benefit you in answering questions in interview & procure your goal & profession as Informatica MDM Developer.
Informatica ETL distinguishes several significant data inside a company and comprises a whole of technologies that holds data integration, data quality, and business process management. Informatica MDM Training Online MDM is a unique answer that is comfortable to extend and manageable enough to solve all difficulties. That search is particularly intended to check data administration abilities and enterprise method administration talents of Informatica ETL person – as per Industry Norms. Our Informatica Training ETL skilled test is formulated and confirmed by Subject Concern Masters.
Informatica ETL Testing Interview Questions
Informatica video is a comprehensive program practiced to allow an initiative for connecting all of its significant data to a particular file additionally identified as the master file, presenting a regular point of source. When performed in a conventional way, Informatica Online Training ETL benefits in streamlining the method of data distributing between departments and employees.
SVR Technologies presents, real-time scenario-based Informatica e-Learning ETL interview questions with answers for experienced and fresher’s New 40 Informatica ETL Interview Questions and Answers. These sections of questions and answers will assist you to solve your Informatica ETL Job Interview.
1. What is ETL?
Answer: ETL stands for extract, transform, and load. It can consolidate the scattered data for any organization while working with different departments. It can very well handle the data coming from different departments.
2. What is Datamart?
Answer: A Data Mart is a subset of a data warehouse that can provide data for reporting and analysis on a section, unit or department like Sales Dept, HR Dept, etc. The Data Mart is sometimes also called as HPQS (Higher Performance Query Structure).
3. What is a Power Center Repository?
Answer: The Power Center Repository allows you to share metadata across repositories to create a data mart domain. In a data mart domain, you can create a single global repository to store metadata used across an enterprise and several local repositories to share the global metadata as needed.
4. What is the difference between Data Warehousing and Data Mining?
Answer: Data mining, the operational information is investigated utilizing analytical procedures and clustering methods to detect unknown designs and trends. So, the data works do an amazing variety of summarization of the data and can be applied by data repositories for active analytical processing for marketing intellect.
The data repository may produce the utility of a data mine for penetrating the processing of the data in a faster process.
5. What is the Factless Fact table?
Answer: The Fact Table which does not contain facts is called Fact Table. Generally when we need to combine two data marts, then one data mart will have a factless fact table and another one with the common fact table.
6. Explain how to mine an OLAP cube?
An extension of data mining can be used for slicing the data from the source cube in discovered data mining.
The case table is dimensioned at the time of mining a cube.
7. What is Data Acquisition Process?
Answer: The process of extracting the data from different source (operational databases) systems, integrating the data and transforming the data into a homogenous format and loading into the target warehouse database. Simple called ETL (Extraction, Transformation, and Loading). The Data Acquisition process designs are called in different manners by different ETL vendors.
Informatica —-> Mapping
Data Stage —-> Job
Abinitio —-> Graph
8. What is the worklet?
Answer: The worklet is a group of sessions. To execute the worklet we have to create the workflow.
9. Explain what factless fact schema is and what is Measures?
Answer: A fact table without measures is known as a Factless fact table. It can view the number of occurring events. For example, it is used to record an event such as employee count in a company.
The numeric data based on columns in a fact table is known as Measures.
10. What are the types of OLAP?
DOLAP: The OLAP tool in which words with desktop databases are called DOLAP. Example: Cognos EP 7 Series and Business Objects, Micro strategy.
ROLAP: The OLAP which works with Relational databases are called ROLAP. Example: Business Object, Micro strategy, Cognos ReportNet, and BRIO.
MOLAP: The OLAP which is responsible for creating multi-dimensional structures called cubes are called MOLAP. Example: Cognos ReportNet.
HOLAP: The OLAP which uses the combined features of ROLAP and MOLAP are called HOLAP. Example Cognos ReportNet.
11. Explain what are the differences between Unconnected and Connected lookup?
- Connected lookup participates in mapping
- Multiple values can be returned
- It can be connected to another transformation and returns a value
- The static or dynamic cache can be used for connected Lookup
- Connected lookup supports user-defined default values
- In Connected Lookup, multiple columns can be returned from the same row or insert into dynamic lookup cache • It is used when lookup function is used instead of an expression transformation while mapping
- Only returns one output port
- Another transformation cannot be connected
- Unconnected as only static cache
- Unconnected lookup does not support user-defined default values
- Unconnected lookup designate one return port and returns one column from each row
12. What does the Top-Down Approach say?
Answer: The Top-Down Approach is coined by Bill Ammon. According to his approach, he says “First we need to implement the Enterprise data warehouse by extracting the data from individual departments and the Enterprise data warehouse develop subject-oriented databases called “Data Marts”.
13. What is data cleaning?
Data cleaning is also known as data scrubbing.
Data cleansing is a process that ensures the set of data is correct and accurate. Data accuracy and consistency, data integration is checked during data cleaning. Data cleaning can be applied for a set of records or multiple sets of data that need to be merged.
14. What are the Fact Tables?
Answer: A Fact Table is a table that contains summarized numerical (facts) and historical data. This Fact Table has a foreign key-primary key relation with a dimension table. The Fact Table maintains the information in 3rd normal form.
A star schema is defined as a logical database design in which there will be a centrally located fact table that is surrounded by at least one or more dimension tables. This design is best suited for Data Warehouse or Data Mart.
15. What is Transformation?
Answer: A transformation is a repository object that generates, modifies, or passes data. Transformations in a mapping represent the operations the PowerCenter Server performs on the data. Data passes into and out of transformations through ports that you link in a mapping or mapplet. Transformations can be active or passive. An active transformation can change the number of rows that pass through it. A passive transformation does not change the number of rows that pass through it.
16. What are Update Strategy’s target table options?
Update as Update: Updates each row flagged for an update if it exists in the table.
Update as Insert: Inserts a new row for each update.
Update else Inserts Updates if the row exists, else inserts.
17. Where do we use connected and unconnected lookups?
Answer: If restoring port simply one next we can perform unconnected. Further than one entry port is not viable besides Unconnected. If exceeding than one return port before goes to Connected.
If you need powerful cache i.e. where your data will vary dynamically later you can go for associated lookup. If your information is inactive wherever your data won’t change while the session loads you can perform for unrelated lookups.
18. Compare ETL & Manual development?
ETL – The process of extracting data from multiple sources. (ex. flat files, XML, COBOL, SAP, etc) is simpler with the help of tools.
Manual – Loading the data other than flat files and Oracle table need more effort.
ETL – High and clear visibility of logic.
Manual – complex and not so user-friendly visibility of logic.
ETL – Contains Metadata and changes can be done easily.
Manual – No Metadata concept and changes need more effort.
ETL- Error handling, log summary, and load progress makes life easier for the developer and maintainer.
Manual – need maximum effort from the maintenance point of view.
ETL – Can handle Historic data very well.
Manual – as data grows the processing time degrades.
19. Explain these terms Session, Worklet, Mapplet, and Workflow?
- Mapplet: It arranges or creates sets of transformation
- Worklet: It represents a specific set of tasks given
- Workflow: It’s a set of instructions that tell the server how to execute tasks
- Session: It is a set of parameters that tells the server how to move data from sources to target
20. What is cube grouping?
Answer: A transformer built set of similar cubes is known as cube grouping. They are generally used in creating smaller cubes that are based on the data in the level of dimension.
21. What is the difference between ETL tool and OLAP tools?
Answer: An ETL tool is meant for extraction data from the legacy systems and load into the specified database with some process of cleansing data.
ex: Informatica,data stage ….etc
22. What is a Star schema design?
Answer: A star schema is defined as a logical database design in which there will be a centrally located fact table that is surrounded by at least one or more dimension tables. This design is best suited for Data Warehouse or Data Mart.
23. Mention what are the types of data warehouse applications and what is the difference between data mining and data warehousing?
The types of data warehouse applications are:
- Info Processing
- Analytical Processing
- Data Mining
Data mining can be defined as the process of extracting hidden predictive information from large databases and interpret the data while data warehousing may make use of a data mine for analytical processing of the data in a faster way. Data warehousing is the process of aggregating data from multiple sources into one common repository
24. Explain what are Cubes and OLAP Cubes?
Cubes are data processing sections composed of particular tables and dimensions of the data repository. It gives a multi-dimensional investigation.
OLAP stands for Online Analytics Processing, and OLAP cube reserves big data in a multi-dimensional pattern for reporting goals. It consists of events called volumes classified by dimensions.
25. What is Denormalization?
Answer: Denormalization means a table with a multi duplicate key. The dimension table follows the Denormalization method with the technique of surrogate key.
26. What are snapshots? What are materialized views & where do we use them? What is a materialized view?
Answer: A materialized view is a view in which data is also stored in some temp tables. i.e if we will go with the View concept in DB in that we only store query and once we call View it extract data from DB. But In materialized View data is stored in some temp tables.
27. What are data modeling and data mining?
Answer: Data Modeling is a technique used to define and analyze the requirements of data that supports an organization’s business process. In simple terms, it is used for the analysis of data objects to identify the relationships among these data objects in any business.
Data Mining is a technique used to analyze datasets to derive useful insights/information. It is mainly used in retail, consumer goods, telecommunication and financial organizations that have a strong consumer orientation to determine the impact on sales, customer satisfaction, and profitability.
28. What is Metadata?
Answer: Data about data is called Metadata. The Metadata contains the definition of data.
29. What are the features of the Informatica Repository Server?
Highlights of Informatica Repository Server.
1. Informatica customer purpose and Informatica server access the container database tables by the Repository Server.
2. Informatica consumer joins to the repository server by the hostname/ IP address and its port number.
3. The Container Server can administrate multiple repositories on various devices on the interface.
4. Concerning every repository database registered among the Repository Server it configures and controls a Repository Agent method.
5. The Repository Agent is a multi-threaded method that executes the operations required to recover, insert and update metadata in the repository database records.
30. Define slowly changing dimensions (SCD)?
SCD is dimensioned whose data changes very slowly.
eg: city or an employee.
This dimension will change very slowly.
The row of this data in the dimension can be either replaced completely without any track of old record OR a new row can be inserted, OR the change can be tracked.
31. What is a Data warehouse?
Answer: A Data warehouse is a subject-oriented, integrated, time-variant, nonvolatile collection of data in support of management’s decision-making process.
32. What are the types of Fact Tables?
The types of Fact Tables are:
1. Cumulative Fact Table: This type of fact table generally describes what was happened over the period. They contain additive facts.
2. Snapshot Fact Table: This type of fact table deals with a particular period. They contain non-additive and semi-additive facts.
33. What does a Mapping document contain?
The Mapping document contains the following information:
Source Definition – from where the database has to be loaded
Target Definition – to where the database has to be loaded
Business Logic – what logic has to be implemented in the staging area
34. Explain what is partitioning, hash partitioning and round-robin partitioning?
To enhance execution, events are subdivided, this is called as Partitioning. Partitioning allows Informatics Server for creating of various association to multiple reservoirs
The kinds of partitions are:
- Over Informatica data is allocated evenly between all partitions
- In every partition wherever the number of rows to the method is almost identical this partitioning is appropriate
- For the goal of partitioning solutions to assembly data between partitions, Informatica server implements a mash role
- It is practiced when guaranteeing the methods groups of rows with the same partitioning key in the same partition want to be assured
35. What does the Bottom-Up Approach or Ralph Kimball Approach say?
Answer: The Bottom Down Approach is coined by Ralph Kimball. According to his approach, he says “First we need to develop a subject-oriented database called “Data Marts” then integrate all the Data Marts to develop the Enterprise data warehouse.
36. Explain what staging area is and what is the purpose of a staging area?
Answer: Data staging is an area where you hold the data temporary on the data warehouse server. Data staging includes the following steps
- Source data extraction and data transformation ( restructuring )
- Data transformation (data cleansing, value transformation )
- Surrogate key assignments
37. What are the various tools? Name a few?
A few more
Cognos Decision Stream
Oracle Warehouse Builder
Business Objects XI (Extreme Insight)
SAP Business Warehouse
SAS Enterprise ETL Server
Along with the above, need to include the below tools –Informatica –Abintio –DataStage
38. What is real-time data-warehousing?
In real-time data-warehousing, the warehouse is updated every time the system performs a transaction.
It reflects real-time business data.
This means that when the query is fired in the warehouse, the state of the business at that time will be returned.
39. Do we need an ETL tool? When do we go for the tools in the market?
It is used to Extract(E) data from multiple source systems(like RDBMS, Flat files, Mainframes, SOAP, XML, etc) transform(T) them based on Business requirements and Load(L) in target locations. (like tables, files, etc).
The need for ETL Tool:
An ETL tool is typically required when data scattered across different systems. (like RDBMS, Flat files, Mainframes, SOAP, XML, etc).
40. What is the Snow Flake schema design?
Answer: In a Snow Flake design, the dimension table (de-normalized table) will be further divided into one or more dimensions (normalized tables) to organize the information in a better structural format. To design snowflake we should first design star schema design.
41. Mention what is the advantage of using DataReader Destination Adapter?
Answer: The benefit of utilizing the DataReader Destination Adapter is that it populates an ADO recordset (consist of records and columns) in memory and exhibits the data from the DataFlow duty with performing the DataReader interface so that other administration can utilize the data.
42. What is data warehousing?
Answer: A data warehouse can be considered as a storage area where relevant data is stored irrespective of the source.
Data warehousing merges data from multiple sources into an easy and complete form.
43. In what case do you use dynamic cache and static cache in connected and unconnected transformations?
- The dynamic cache is used when you have to update the master table and slowly changing dimensions (SCD) type 1
- For flat files Static cache is used
44. Using SSIS ( SQL Server Integration Service) what are the possible ways to update the table?
To update a table using SSIS the possible ways are:
- Use a SQL command
- Use a staging table
- Use Cache
- Use the Script Task
- Use the full database name for updating if MSSQL is used
45. Explain what are the ETL testing operations includes?
ETL testing includes:
- Verify whether the data is transforming correctly according to business requirements
- Verify that the projected data is loaded into the data warehouse without any truncation and data loss
- Make sure that ETL application reports invalid data and replaces with default values
- Make sure that data loads at the expected time frame to improve scalability and performance