Ab Initio Interview Questions And Answers

1. Explain what is de-partition in Abinitio?
Answer: De-partition is done to read data from multiple flow or operations and is used to re-join data records from different flows. There are several de-partition components available which include Gather, Merge, Interleave, and Concatenation.

2. Explain what is SANDBOX?
Answer: A SANDBOX is referred for the collection of graphs and related files that are saved in a single directory tree and behaves as a group for navigation, version control, and migration.

3. What do you mean by the overflow errors?
Answer: While processing data, bulky calculations are often there and it is not always necessary that they fit the memory allocated for them. In case a character of more than 8-bits is stored there, these errors result simply.

4. What is data encoding?
Answer: Data needs to be kept confidential in many cases and it can be done through this approach. It simply makes sure of information remains in a form which no one else than the sender and the receiver can understand.

5. What is the use of aggregation when we have rollup as we know rollup component in abinitio is used to summarize a group of data record? Then where we will use aggregation?
Answer: Aggregation and Rollup both can summarize the data but rollup is much more convenient to use. To understand how a particular summarization being rollup is much more explanatory compared to aggregate. Rollup can do some other functionality like input and output filtering of records. Aggregate and rollup perform the same action, rollup display intermediate result in main memory, Aggregate does not support intermediate result.

6. What does dependency analysis mean in Ab Initio?
Answer: Dependency analysis will answer the questions regarding data lineage. That is where does the data come from, what applications produce and depend on this data etc.
We can retrieve the maximum (surrogate key) from the existing data, the by using the scan or next_in_sequence/reformat we can generate a further sequence for new records. (E Learning pottal)

7. Describe the elements you would review to ensure multiple scheduled batch jobs do not collide with each other?
Answer: Because every job depends upon another job for example if your first job result is successfully then another job will execute otherwise your job doesn’t work.

8. How to create a repository in abinitio for stand-alone system(LOCAL NT)?
Answer: If you are trying to install the Ab -Initio on a stand-alone machine, then it is not necessary to create the repository, While installing It creates automatically for you under abinitio folder ( where you installing the Ab-Initio) If you are still not clear please ask your Question on the same portal.

9. Describe the process steps you would perform when defragmenting a data table. Does this table contain mission-critical data?
Answer:
There are several ways to do this:
1) We can move the table in the same or other tablespace and rebuild all the indexes on the table.
alter table move this activity reclaims the defragmented space in the table
analyze table table_name compute statistics to capture the updated statistics.

2)Reorg could be done by taking a dump of the table, truncate the table and import the dump back into the table.

10. Why might you create a stored procedure with the with recompile option?
Answer: Recompile is useful when the tables referenced by the stored proc undergoes a lot of modification/deletion/addition of data. Due to the heavy modification activity, the execute plan becomes outdated and hence the stored proc performance goes down. If we create the stored proc with recompile option, the sql server won’t cache a plan for this stored proc and it will be recompiled every time it is run.

11. State the first_defined function with an example?
Answer: This function is similar to the function NVL() in Oracle database
It performs the first values which are not null among other values available in the function and assigns to the variable
Example: A set of variables, say v1,v2,v3,v4,v5,v6 are assigned with NULL.
Another variable num is assigned with value 340 (num=340)
num = first_defined(NULL, v1,v2,v3,v4,v5,v6,NUM)
The result of num is 340.

12. Explain PDL with an example?
Answer: To make a graph behave dynamically, PDL is used
Suppose there is a need to have a dynamic field that is to be added to a predefined DML while executing the graph
Then a graph level parameter can be defined
Utilize this parameter while embedding the DML in the output port.
For Example: define a parameter named field with a value “string(“ | “) name;”
Use ${mystring} at the time of embedding the DML in out port.
Use $substitution as an interpretation option.

13. Describe the Evaluation of Parameters order?
Answer:
Following is the order of evaluation:

  • The host setup script will be executed first
  • All Common parameters, that is, included, are evaluated
  • All Sandbox parameters are evaluated
  • The project script – project-start.ksh is executed
  • All form parameters are evaluated
  • Graph parameters are evaluated
  • The Start Script of the graph is executed

14. Explain what is Sort Component in Abinitio?
Answer: The Sort Component in Abinitio re-orders the data. It comprises of two parameters “Key” and “Max-core”.
Key: It is one of the parameters for sort component which determines the collation order
Max-core: This parameter controls how often the sort component dumps data from memory to disk.

15. Explain the methods to improve the performance of a graph?
Answer:
The following are the ways to improve the performance of a graph:

• Make sure that a limited number of components are used in a particular phase
• Implement the usage of the optimum value of max core values to sort and join components.
• Utilize the minimum number of sort components
• Utilize the minimum number of sorted join components and replace them by in-memory join/hash join, if needed and possible
• Restrict only the needed fields in sort, reformat, join components
• Utilize phasing or flow buffers when merged or sorted joins
• Use sorted join, when two inputs are huge, otherwise use hash join

16. What are the types of data processing you are familiar with?
Answer: The very first one is the manual data approach. In this, the data is generally processed without the dependency on a machine and thus it contains several errors. In the present time, this technique is not generally followed or only a limited data is to proceed with this approach. The second type is the Mechanical data processing. The mechanical devices have some important roles in this approach. When the data is a combination of different formats, this approach is adopted. The next approach is the Electronic data processing which is regarded as fastest and is widely adopted in the current scenario. It has top accuracy and reliability.

17. Explain the difference between the truncate and delete commands?
Answer:
Truncate:
It is a DDL command, used to delete tables or clusters. Since it is a DDL command hence it is auto commit and Rollback can’t be performed. It is faster than delete.

Delete:
It is DML command, generally used to delete a record, clusters or tables. Rollback command can be performed, to retrieve the earlier deleted things. To make deleted things permanently, “commit” command should be used.

18. What is BROADCASTING and REPLICATE?
Answer: Broadcast – Takes data from multiple inputs, combines it and sends it to all the output ports.
Eg – You have 2 incoming flows (This can be data parallelism or component parallelism) on Broadcast component, one with 10 records & other with 20 records. Then on all the outgoing flows (it can be any number of flows) will have 10 + 20 = 30 records
Replicate – It replicates the data for a particular partition and send it out to multiple outports of the component, but maintains the partition integrity.

Eg – Your incoming flow to replicate has a data parallelism level of 2. with one partition having 10 recs & other one having 20 recs. Now suppose you have 3 output flows from replicate. Then each flow will have 2 data partitions with 10 & 20 records respectively.

19. We know rollup component in Abinitio is used to summarize a group of data record then why do we use aggregation?
Answer:
• Aggregation and Rollup, both are used to summarize the data.
• Rollup is much better and convenient to use.
• Rollup can perform some additional functionality, like input filtering and output filtering of records.
• Aggregate does not display the intermediate results in main memory, whereas Rollup can.
• Analyzing a particular summarization is much simpler compared to Aggregations.

20. How data is processed and what are the fundamentals of this approach?
Answer: There are certain activities which require the collection of the data and the best thing is processing largely depends on the same in many cases. The fact is data needs to be stored and analyzed before it is processed. This task depends on some major factors are they are

1. Collection of Data
2. Presentation
3. Final Outcomes
4. Analysis
5. Sorting

These are also regarded as the fundamentals that can be trusted to keep up the pace in this matter.

21. What are the factors on which storage of data depends?
Answer:

It depends on the sorting and filtering. In addition to this, it largely depends on the software one uses.

22. What do you mean by data sorting?
Answer: It is not always necessary that data remains in a well-defined sequence. It is always a random collection of objects. Sorting is nothing but arranging the data items in desired sets or sequence. 

23. When running a stored procedure definition script how would you guarantee the definition could be rolled back in the event of problems?
Answer:
There are quite a few factors that determine the approach such as what type of version control is used, what is the size of the change, what is the impact of the change, is it a new procedure or replacing an existing and so on.
If it is new, then just drop the wrong one.
if it is a replacement then how big is the change and what will be the possible impact, depending upon you can have the entire database backed up or just create a script for your original procedure before messing it up or you just do ed and change the file back to original and reapply. you may rename the old procedure as old and then work on new and so on.

few issues to keep in mind are synonyms, dependencies, grants, any job calling the procedure at the time of change and so on. In nutshell, the scenario can be varied and the solution also can be varied.

24. Have you used the rollup component? Describe how?
Answer: If the user wants to group the records on particular field values then rollup is the best way to do that. Rollup is a multi-stage transform function and it contains the following mandatory functions.
1. initialize
2. rollup
3. finalize
Also, need to declare one temporary variable if you want to get counts of a particular group.

For each of the group, first, it does call the initialize function once, followed by rollup function calls for each of the records in the group and finally calls the finalize function once at the end of last rollup call.

25. What is AB_LOCAL expression where do you use it in ab-initio?
Answer: ablocal_expr is a parameter of the table component of Ab Initio.LOCAL() is replaced by the contents of ablocal_expr. Which we can make use in parallel unloads. There are two forms of AB_LOCAL() construct, one with no arguments and one with a single argument as a table name(driving table).

The use of AB_LOCAL() construct is in Some complex SQL statements contain grammar that is not recognized by the Ab Initio parser when unloading in parallel. You can use the LOCAL() construct in this case to prevent the Input Table component from parsing the SQL (it will get passed through to the database). It also specifies which table to use for the parallel clause.

26. How to get DML using Utilities in UNIX?
Answer: If your source is a COBOL copybook, then we have a command in Unix which generates the required in Ab Initio. here it is,

27. Define ramp limit in AB Initio?
Answer: Generally, the ramp is referred to the percentage value ranging from 0 to 1. To represent the number of reject events, the limit parameter possesses an integer called a ramp limit. The reject events can be calculated by the following formula,

28. What are the different types of parallelism in AB Initio?
Answer:
There are three types of parallelism such as,
Data parallelism
Pipeline parallelism and
Component parallelism
Data parallelism

A graph with divided segments from data operating on each segment at the same time employs data parallelism
Pipeline parallelism
A graph that consists of one or more components running at the same time on the same data makes use of pipeline parallelism

Component Parallelism

When a graph with one or more processes running simultaneously on individual data, then it uses component parallelism.

29. What are the benefits of data processing according to you?
Answer: Well, processing of data derives a very large number of benefits. Users can put separate many factors that matter to them. In addition to this, with the help of this approach, one can easily keep up the pace simply by deriving data into different structures from an unstructured format. In addition to this, the processing is useful in eliminating various bugs that are often associated with the data and cause problems in a later section. It is because of no other reason than this, data processing has a wide application in several tasks. 

30. What exactly do you understand with the term data processing and businesses can trust this approach?
Answer: Processing is a procedure that simply covert the data from a useless form into a useful one without making a lot of efforts. However, the same may vary depending on factors such as the size of data and its format. A sequence of operations is generally carried out to perform this task and depending on the type of data, this sequence could be automatic or manual. Because in the present scenario, most of the devices that perform this task are PC’s automatic approach is more popular than ever before. Users are free to obtain data in forms such as a table, vectors, images, graphs, charts and so on. This is the best things that business owners can simply enjoy.

31. What are the primary keys and foreign keys?
Answer: In RDBMS the relationship between the two tables is represented as Primary key and foreign key relationship. Whereas the primary key table is the parent table and the foreign key table is the child table. The criteria for both the tables are there should be a matching column. company

32. When using multiple DML statements to perform a single unit of work, is it preferable to use implicit or explicit transactions, and why?
Answer: Because implicit is using for internal processing and explicit is using for using open data required.

33. Describe how you would ensure that database object definition (Tables, Indices, Constraints, Triggers, Users, Logins, Connection Options, and Server Options, etc) are consistent and repeatable between multiple database instances (i.e.: a test and production copy of a database)?
Answer:
Take an entire database backup and restore it in a different instance.
Take statistics of all valid and invalid objects and match.
Periodically refresh.

34. What is an outer join?
Answer: An outer join is used when one wants to select all the records from a port – whether it has satisfied the join criteria or not.

35. What is local and formal parameter?
Answer: Two are graph level parameters but in local you need to initialize the value at the time of declaration whereas globe no need to initialize the data it will print at the time of running the graph for that parameter.

36. What is the relation between eme, gde, and co-operating system?
Answer: Eme is said as enterprise metadataenv, gde as graphical development env and the co-operating system can be said as abinitio server relation b/w this co-op, eme and gde are as follows operating system is the abinitio server. This co-op is installed on particular o.s platform that is called native o.s .coming to the eme, it’s just as a repository in Informatica, its hold the metadata, transformations, dbconfig files source and targets information. Coming to gde its is an end-user environment where we can develop the graphs (mapping just like in Informatica) designer uses the gde and designs the graphs and save to the eme or sandbox it is at the user side. Where eme is at the server-side.

37. What is a ramp limit?
Answer:
• A limit is an integer parameter which represents several reject events
• Ramp parameter contains a real number representing a rate of reject events of certain processed records
• The formula is – No. of bad records allowed = limit + no. of records x ramp
• A ramp is a percentage value from 0 to 1.
• These two provides the threshold value of bad records.

38. What is the MAX CORE of a component?
Answer: MAX CORE is the space consumed by a component that is used for calculations
Each component has different MAX COREs
Component performances will be influenced by the MAX CORE’s contribution
The process may slow down / fasten if a wrong MAX CORE is set.

39. How do you add default rules in the transformer?
Answer:
The following is the process to add default rules in transformer

Double click on the transform parameter in the parameter tab page in component properties
Click on Edit menu in Transform editor
Select Add Default Rules from the dropdown list box.
It shows Match Names and Wildcard options. Select either of them.

40. Explain what does dependency analysis mean in Abinitio?
Answer: In Abinitio, dependency analysis is a process through which the EME examines a project entirely and traces how data is transferred and transformed- from component-to-component, field-by-field, within and between graphs.

41. What are the kinds of layouts does ab initio supports?
Answer: there are serial and parallel layouts supported by AbInitio. A graph can have both at the same time. The parallel one depends on the degree of data parallelism. If the multi-file system is 4-way parallel then a component in a graph can run 4 way parallel if the layout is defined such as it’s same as the degree of parallelism.

42. How do you improve the performance of a graph?
Answer:
There are many ways the performance of the graph can be improved.
1) Use a limited number of components in a particular phase
2) Use an optimum value of max core values for sort and join components
3) Minimize the number of sort components
4) Minimize sorted join component and if possible replace them by in-memory join/hash join
5) Use only required fields in the sort, reformat, join components
6) Use phasing/flow buffers in case of a merge, sorted joins
7) If the two inputs are huge then use sorted join, otherwise, use hash join with proper driving port
8) For large dataset don’t use broadcast as a partitioner
9) Minimize the use of regular expression functions like re_index in the transfer functions
10) Avoid repartitioning of data unnecessarily
Try to run the graph as long as possible in MFS. For these input files should be partitioned and if possible output file should also be partitioned.

43. Why do you think data processing is important?
Answer: The fact is data is generally collected from different sources. Thus, the same may vary largely in several terms. The fact is this data needs to be passed from various analysis and other processes before it is stored. This process is not as easy as it seems in most of the cases. Thus, processing matter. A lot of o time can be saved by processing the data to accomplish various tasks that largely matters. The dependency on the various factors for reliable operation can also be avoided by a good extent.

Note: Browse latest Ab Initio Interview Questions and Ab Initio Tutorial Videos. Here you can check Ab Initio Training details and Ab Initio Training Videos for self learning. Contact +91 988 502 2027 for more information.

Leave a Comment

Scroll to Top