Monday, 11 June 2012

SQL Interview preparation


What is RDBMS?


Relational Data Base Management Systems (RDBMS) are database management systems that maintain
data records and indices in tables. Relationships may be created and maintained across and among the
data and tables. In a relational database, relationships between data items are expressed by means of
tables. Interdependencies among these tables are expressed by data values rather than by pointers.
This allows a high degree of data independence. An RDBMS has the capability to recombine the data
items from different files, providing powerful tools for data usage.

What is normalization?


Database normalization is a data design and organization process applied to data structures based on
rules that help build relational databases. In relational database design, the process of organizing data
to minimize redundancy. Normalization usually involves dividing a database into two or more tables and
defining relationships between the tables. The objective is to isolate data so that additions, deletions,
and modifications of a field can be made in just one table and then propagated through the rest of the
database via the defined relationships.

What are different normalization forms?


1NF: Eliminate Repeating Groups
Make a separate table for each set of related attributes, and give each table a primary key. Each field
contains at most one value from its attribute domain.

2NF: Eliminate Redundant Data
If an attribute depends on only part of a multi-valued key, remove it to a separate table.

3NF: Eliminate Columns Not Dependent On Key 
If attributes do not contribute to a description of the key, remove them to a separate table. All
attributes must be directly dependent on the primary key

BCNF: Boyce-Codd Normal Form
If there are non-trivial dependencies between candidate key attributes, separate them out into distinct
tables.

4NF: Isolate Independent Multiple Relationships
No table may contain two or more 1:n or n:m relationships that are not directly related.

5NF: Isolate Semantically Related Multiple Relationships 
There may be practical constrains on information that justify separating logically related many-to-many
relationships.

ONF: Optimal Normal Form
A model limited to only simple (elemental) facts, as expressed in Object Role Model notation.

DKNF: Domain-Key Normal Form
A model free from all modification anomalies.

Remember, these normalization guidelines are cumulative. For a database to be in 3NF, it must first
fulfill all the criteria of a 2NF and 1NF database.


What is Stored Procedure? 


A stored procedure is a named group of SQL statements that have been previously created and stored
in the server database. Stored procedures accept input parameters so that a single procedure can be
used over the network by several clients using different input data. And when the procedure is
modified, all clients automatically get the new version. Stored procedures reduce network traffic and 
improve performance. Stored procedures can be used to help ensure the integrity of the database.
e.g. sp_helpdb, sp_renamedb, sp_depends etc. 



What is Trigger?

A trigger is a SQL procedure that initiates an action when an event (INSERT, DELETE or UPDATE)
occurs. Triggers are stored in and managed by the DBMS.Triggers are used to maintain the referential

integrity of data by changing the data in a systematic fashion.  A trigger cannot be called or executed; 
the DBMS automatically fires the trigger as a result of a data modification to the associated table.

Triggers can be viewed as similar to stored procedures in that both consist of procedural logic that is
stored at the database level. Stored procedures, however, are not event-driven and are not attached to a
specific table as triggers are. Stored procedures are explicitly executed by invoking a CALL to the 
procedure while triggers are implicitly executed. In addition, triggers can also execute stored
procedures.

Nested Trigger: A trigger can also contain INSERT, UPDATE and DELETE logic within itself, so when the trigger is fired because of data modification it can also cause another data modification, thereby firing
another trigger. A trigger that contains data modification logic within itself is called a nested trigger.


What is View?


A simple view can be thought of as a subset of a table. It can be used for retrieving data, as well as
updating or deleting rows. Rows updated or deleted in the view are updated or deleted in the table the
view was created with. It should also be noted that as data in the original table changes, so does data
in the view, as views are the way to look at part of the original table. The results of using a view are 
not permanently stored in the database. The data accessed through a view is actually constructed using
standard T-SQL select command and can come from one to many different base tables or even other
views.


What is Index?


An index is a physical structure containing pointers to the data. Indices are created in an existing table 
to locate rows more quickly and efficiently. It is possible to create an index on one or more columns of
a table, and each index is given a name. The users cannot see the indexes, they are just used to speed
up queries. Effective indexes are one of the best ways to improve performance in a database
application. A table scan happens when there is no index available to help a query. In a table scan SQL
Server examines every row in the table to satisfy the query results. Table scans are sometimes
unavoidable, but on large tables, scans have a terrific impact on performance.

Clustered indexes define the physical sorting of a database table’s rows in the storage media. For this
reason, each database table may have only one clustered index.

Non-clustered indexes are created outside of the database table and contain a sorted list of references
to the table itself.Max of 249 non-clustered indices can be added to a table.


What is the difference between clustered and a non-clustered index?


A clustered index is a special type of index that reorders the way records in the table are physically
stored. Therefore table can have only one clustered index. The leaf nodes of a clustered index contain
the data pages. 


A nonclustered index is a special type of index in which the logical order of the index does not match
the physical stored order of the rows on disk. The leaf node of a nonclustered index does not consist of
the data pages. Instead, the leaf nodes contain index rows.


What are the different index configurations a table can have? 


A table can have one of the following index configurations:

  • No indexes 
  • A clustered index 
  • A clustered index and many nonclustered indexes 
  • A nonclustered index 
  • Many nonclustered indexes
What is cursors?

Cursor is a database object used by applications to manipulate data in a set on a row-by-row basis, 
instead of the typical SQL commands that operate on all the rows in the set at one time.

In order to work with a cursor we need to perform some steps in the following order: 
  • Declare  cursor 
  • Open cursor 
  • Fetch row from the cursor 
  • Process fetched row 
  • Close cursor 
  • Deallocate cursor
What is the use of DBCC commands?

DBCC stands for database consistency checker. We use these commands to check the consistency of 
the databases, i.e., maintenance, validation task and status checks. 

E.g. DBCC CHECKDB - Ensures that tables in the db and the indexes are correctly linked. 
DBCC CHECKALLOC - To check that all pages in a db are correctly allocated. 
DBCC CHECKFILEGROUP - Checks all tables file group for any damage. 

What is a Linked Server?

Linked Servers is a concept in SQL Server by which we can add other SQL Server to a Group and query 
both the SQL Server dbs using T-SQL Statements. With a linked server, you can create very clean, easy 
to follow, SQL statements that allow remote data to be retrieved, joined and combined with local data.
Storped Procedure sp_addlinkedserver, sp_addlinkedsrvlogin will be used add new Linked Server. 

What is Collation? 

Collation refers to a set of rules that determine how data is sorted and compared. Character data is 
sorted using rules that define the correct character sequence, with options for specifying casesensitivity, accent marks, kana character types and character width.

What are different type of Collation Sensitivity? 

Case sensitivity
A and a, B and b, etc. 

Accent sensitivity
a and á, o and ó, etc. 

Kana Sensitivity
When Japanese kana characters Hiragana and Katakana are treated differently, it is called Kana 
sensitive. 

Width sensitivity
When a single-byte character (half-width) and the same character when represented as a double-byte 
character (full-width) are treated differently then it is width sensitive.

What's the difference between a primary key and a unique key?

Both primary key and unique enforce uniqueness of the column on which they are defined. But by 
default primary key creates a clustered index on the column, where are unique creates a nonclustered 
index by default. Another major difference is that, primary key doesn't allow NULLs, but unique key 
allows one NULL only. 

How to implement one-to-one, one-to-many and many-to-many relationships while 
designing tables?
One-to-One relationship can be implemented as a single table and rarely as two tables with primary 
and foreign key relationships. 
One-to-Many relationships are implemented by splitting the data into two tables with primary key and 
foreign key relationships. 
Many-to-Many relationships are implemented using a junction table with the keys from both the tables 
forming the composite primary key of the junction table.

What is difference between DELETE & TRUNCATE commands? 

Delete command removes the rows from a table based on the condition that we provide with a WHERE 
clause. Truncate will actually remove all the rows from a table and there will be no data in the table
after we run the truncate command.  

TRUNCATE
TRUNCATE is faster and uses fewer system and transaction log resources than DELETE. 
TRUNCATE removes the data by deallocating the data pages used to store the table’s data, and only the 
page deallocations are recorded in the transaction log.     
TRUNCATE removes all rows from a table, but the table structure and its columns, constraints, indexes 
and so on remain. The counter used by an identity for new rows is reset to the seed for the column. 
You cannot use TRUNCATE TABLE on a table referenced by a FOREIGN KEY constraint. 
Because TRUNCATE TABLE is not logged, it cannot activate a trigger. 
TRUNCATE can not be Rolled back. 
TRUNCATE is DDL Command.  
TRUNCATE Resets identity of the table. 

DELETE
DELETE removes  rows one at a time and records an entry in the transaction log for each deleted row. 
If you want to retain the identity counter, use DELETE instead. If you want to remove table definition
and its data, use the DROP TABLE statement. 
DELETE Can be used with or without a WHERE clause 
DELETE Activates Triggers. 
DELETE Can be Rolled back. 
DELETE is DML Command. 
DELETE does not reset identity of the table.  

Difference between Function and Stored Procedure?

UDF can be used in the SQL statements anywhere in the WHERE/HAVING/SELECT section where as 
Stored procedures cannot be. 
UDFs that return tables can be treated as another rowset. This can be used in JOINs with other tables.
Inline UDF's can be though of as views that take parameters and can be used in JOINs and other 
Rowset operations. 

When is the use of UPDATE_STATISTICS command?

This command is basically used when a large processing of data has occurred. If a large amount of 
deletions any modification or Bulk Copy into the tables has occurred, it has to update the indexes to 
take these changes into account. UPDATE_STATISTICS updates the indexes on these tables 
accordingly. 

What types of Joins are possible with Sql Server?

Joins are used in queries to explain how different tables are related. Joins also let you select data from 
a table depending upon data from another table. 
Types of joins: INNER JOINs, OUTER JOINs, CROSS JOINs. OUTER JOINs are further classified as LEFT 
OUTER JOINS, RIGHT OUTER JOINS and FULL OUTER JOINS. 

What is the difference between a HAVING CLAUSE and a WHERE CLAUSE?

Specifies a search condition for a group or an aggregate. HAVING can be used only with the SELECT 
statement. HAVING is typically used in a GROUP BY clause. When GROUP BY is not used, HAVING 
behaves like a WHERE clause. Having Clause is basically used only with the GROUP BY function in a 
query. WHERE Clause is applied to each row before they are part of the GROUP BY function in a query

What is "normalization"? "Denormalization"? Why do you sometimes want to denormalize? 

Normalizing data means eliminating redundant information from a table and organizing the data so that future changes to the table are easier. Denormalization means allowing redundancy in a table. The main benefit of denormalization is improved performance with simplified data retrieval and manipulation. This is done by reduction in the number of joins needed for data processing.

What is a "constraint"? 

 A constraint allows you to apply simple referential integrity checks to a table. There are four primary types of constraints that are currently supported by SQL Server.
  • PRIMARY/UNIQUE - enforces uniqueness of a particular table column. 
  • DEFAULT - specifies a default value for a column in case an insert operation does not provide one. 
  • FOREIGN KEY - validates that every value in a column exists in a column of another table. 
  • CHECK - checks that every value stored in a column is in some specified list. 

Each type of constraint performs a specific type of action. Default is not a constraint. NOT NULL is one more constraint which does not allow values in the specific column to be null. And also it the only constraint which is not a table level constraint.

Why can a "group by" or "order by" clause be expensive to process? -

Processing of "group by" or "order by" clause often requires creation of Temporary tables to process the results of the query. Which depending of the result set can be very expensive.

What is a SQL view

 An output of a query can be stored as a view. View acts like small table which meets our criterion. View is a precomplied SQL query which is used to select data from one or more tables. A view is like a table but it doesn’t physically take any space. View is a good way to present data in a particular format if you use that query quite often. View can also be used to restrict users from accessing the tables directly.


how to delete duplicate rows from a table in sql server ?

CREATE TABLE [dbo].[ATTENDANCE](
 [EMPLOYEE_ID] [varchar](50) NOT NULL,
 [ATTENDANCE_DATE] [date] NOT NULL
) ON [PRIMARY] 
INSERT INTO dbo.ATTENDANCE (EMPLOYEE_ID,ATTENDANCE_DATE)VALUES
   ('A001',CONVERT(DATETIME,'01-01-11',5)) 
INSERT INTO dbo.ATTENDANCE (EMPLOYEE_ID,ATTENDANCE_DATE)VALUES
   ('A001',CONVERT(DATETIME,'01-01-11',5)) 
INSERT INTO dbo.ATTENDANCE (EMPLOYEE_ID,ATTENDANCE_DATE)VALUES
   ('A002',CONVERT(DATETIME,'01-01-11',5)) 
INSERT INTO dbo.ATTENDANCE (EMPLOYEE_ID,ATTENDANCE_DATE)VALUES
   ('A002',CONVERT(DATETIME,'01-01-11',5)) 
INSERT INTO dbo.ATTENDANCE (EMPLOYEE_ID,ATTENDANCE_DATE)VALUES
   ('A002',CONVERT(DATETIME,'01-01-11',5)) 
INSERT INTO dbo.ATTENDANCE (EMPLOYEE_ID,ATTENDANCE_DATE)VALUES
   ('A003',CONVERT(DATETIME,'01-01-11',5)) 
After inserting the data, check the data of the below table. If we grouped the employee_id andattendance_date, then A001 and A002 become duplicates.
EMPLOYEE_IDATTENDANCE_DATE
A0012011-01-01
A0012011-01-01
A0022011-01-01
A0022011-01-01
A0022011-01-01
A0032011-01-01
First, insert an identity column in that table by using the following code:
ALTER TABLE dbo.ATTENDANCE ADD AUTOID INT IDENTITY(1,1)  
Now the table data will be like the following table:
EMPLOYEE_IDATTENDANCE_DATEAUTOID
A0012011-01-011
A0012011-01-012
A0022011-01-013
A0022011-01-014
A0022011-01-015
A0032011-01-016
Check the AUTOID column. Now we will start playing the game with this column.
Now use the following code to find out the duplicate rows that exist in the table.
SELECT * FROM dbo.ATTENDANCE WHERE AUTOID NOT IN (SELECT MIN(AUTOID)
 FROM dbo.ATTENDANCE GROUP BY EMPLOYEE_ID,ATTENDANCE_DATE) 
The above code will give us the following result:
EMPLOYEE_IDATTENDANCE_DATEAUTOID
A0012011-01-012
A0022011-01-014
A0022011-01-015
Ultimately, these are the duplicate rows which we want to delete to resolve the issue. Use the following code to resolve it.
DELETE FROM dbo.ATTENDANCE WHERE AUTOID NOT IN (SELECT MIN(AUTOID)
 FROM dbo.ATTENDANCE GROUP BY EMPLOYEE_ID,ATTENDANCE_DATE) 








No comments:

Post a Comment