Introduction: One of the most obvious and useful set of window
functions available in SQL Server are ranking functions where rows from result set are
ranked according to a certain scheme.
In previous articles i have explained How to Update table data using inner join in sql server and CTE recursive query to get employee manager hierarchy with level and Query to search any text in all stored procedures,views and functions and Query to get upcoming birthdays within week and Without primary key column update first or last n records in table
Ranking functions return a numeric ranking value for each row in a partition. Some rows might receive the same value as other rows depending on the ranking function used. So,
Ranking functions return a numeric ranking value for each row in a partition. Some rows might receive the same value as other rows depending on the ranking function used. So,
Ranking functions are non-deterministic.
There are four ranking functions available in Sql:
1) ROW_NUMBER()
2) RANK()
3) DENSE_RANK()
4) NTILE()
These functions are having some similarities and
significant difference.
Similarity:
1) Should have OVER clause and ORDER BY clause inside
the OVER clause. PARTITION BY clause is optional.
2) Can have PARTITION BY clause inside the OVER clause.
When combined with a PARTITION BY clause, all of these functions reset the
returned integer value to 1.
3) All of them return an increasing integer with a base value of 1.
3) All of them return an increasing integer with a base value of 1.
4) ROW_NUMBER(), RANK() and DENSE_RANK() functions return
the same output If there are no duplicate values in the column used by the
ORDER BY clause.
Differences
Differences will
be described with practical examples mentioned in this article.
ROW_NUMBER: Assigns unique number starting at 1 for every
row, even if one or more rows have the same value.
ROW_NUMBER can be used with or without PARTITION BY clause:
ROW_NUMBER() Function without PARTITION BY clause: If we want to add serial number/row number to each row of the result set regardless they are same or different.
ROW_NUMBER() Function without PARTITION BY clause: If we want to add serial number/row number to each row of the result set regardless they are same or different.
ROW_NUMBER()
Function with PARTITION BY clause: If we want to add serial
number/row number to each group, and reset it for every group.
Similarly RANK(), DENSE_RANK() and NTILE() ranking
functions can also be used with or without PARTITION BY clause.
RANK:
It ranks each row in the result set based on the mentioned column in Over
clause of query. If values in the ranking column are the same, they receive the
same rank. However, the next number in the ranking sequence is skipped creating
a gap in between the groups.
Example: If same record is repeated 3 times and has
rank "3", the next rank will be 3+3=6 i.e. there will be a gap in
between the records sequence.
DENSE_RANK:
This function is similar to Rank with only 1 difference; this will not leave
gaps between groups if there is a tie between the ranks of the preceding
records. The next number in the ranking sequence is then used to rank the row
or rows that follow.
Example: If same record is repeated 3 times and has rank "3", the next rank will be 4 i.e. no gaps in between the record sequence.
Example: If same record is repeated 3 times and has rank "3", the next rank will be 4 i.e. no gaps in between the record sequence.
NTILE:
It divides the result set into the number of groups specified as an argument to
the function. If we pass 3 as an argument to this function then it will divide
the result set into 3 groups.
That division is based on the total number of rows in
the result set divided by the number in the argument. NTILE is
non-deterministic.
If the total number of rows in result set is divisible
by the number of groups, the rows will be evenly distributed among the groups.
For example, if the total number of rows is 12, and there are 3 groups, each
group will contain 4 rows.
But as in our example below if there are 11 rows in
result set and we pass 3 as an argument to this function then it will break the
result set into 3 groups. First and second group will have 4 records each and
last group will have remaining 3 records. First group records will have rank 1,
second group records will have rank 2 and third group records will have rank 3.
Though all these are ranking functions, the difference
between rank(), dense_rank(),
row_number() can be understood when we have ties on ranking i.e.
duplicate records. For example, if we are ranking students by their marks then
what would be the rank of two students scoring same marks?
It depends upon which ranking function we use.
Let's create a temporary table and insert some dummy
data into it for demonstration purpose using the following script.
IF OBJECT_ID( 'tempdb..#tbStudent' )
IS NOT NULL
DROP TABLE #tbStudent;
GO
CREATE TABLE #tbStudent
(
StudentId INT IDENTITY(1,1) PRIMARY KEY NOT NULL ,
StudentName VARCHAR(50),
Section CHAR(1),
Marks INT,
);
INSERT INTO #tbStudent
VALUES
('Rohan', 'A',90),
('Priya', 'A',45),
('Aasif', 'A',90),
('Vivek', 'B',58),
('Arjun', 'B',40),
('Sidharth', 'A',100),
('Aalia', 'A',95),
('Kabir', 'B',90),
('Robin', 'B',40),
('Rahul', 'A',30),
('Anju', 'B',32);
--View
table records
SELECT * FROM #tbStudent
Output will be as:
StudentId
|
StudentName
|
Section
|
Marks
|
1
|
Rohan
|
A
|
90
|
2
|
Priya
|
A
|
45
|
3
|
Aasif
|
A
|
90
|
4
|
Vivek
|
B
|
58
|
5
|
Arjun
|
B
|
40
|
6
|
Sidharth
|
A
|
100
|
7
|
Aalia
|
A
|
95
|
8
|
Kabir
|
B
|
90
|
9
|
Robin
|
B
|
40
|
10
|
Rahul
|
A
|
30
|
11
|
Anju
|
B
|
32
|
As can see three students (Rohan, Aasif and Kabir)
scored same marks i.e. 90 and (Arjun and Robin) scored same 40 marks.
Now let’s rank them using all ranking functions one by
one and see the difference.
ROW_NUMBER() without Partition By clause
It always generates a unique ranking even with
duplicate records. That's why it is useful to solve problems like second or nth
highest marks/salary among students/employees etc.
In the following example, we have three students
(Rohan, Aasif and Kabir) and two student (Arjun and Robin) with the same marks
and even though we have generated row numbers over marks column it produces
different row number for these students with the same marks i.e. 11 unique
numbers in sequence for 11 records.
SELECT StudentId, StudentName, Section, Marks, ROW_NUMBER() OVER (ORDER BY Marks DESC) AS RowNumber FROM #tbStudent
Output will be as:
StudentId
|
StudentName
|
Section
|
Marks
|
RowNumber
|
6
|
Sidharth
|
A
|
100
|
1
|
7
|
Aalia
|
A
|
95
|
2
|
8
|
Kabir
|
B
|
90
|
3
|
1
|
Rohan
|
A
|
90
|
4
|
3
|
Aasif
|
A
|
90
|
5
|
4
|
Vivek
|
B
|
58
|
6
|
2
|
Priya
|
A
|
45
|
7
|
5
|
Arjun
|
B
|
40
|
8
|
9
|
Robin
|
B
|
40
|
9
|
11
|
Anju
|
B
|
32
|
10
|
10
|
Rahul
|
A
|
30
|
11
|
As we can see Rohan, Aasif and Kabir have same marks
i.e. 90 but they got the different 3rd, 4th and 5th rank and Arjun and Robin
scored same 40 marks and they got 8th and 9th rank. In case of a tie, ranks are
assigned on a random basis (any of the students scoring same marks can appear
before or after the other if we execute this query multiple times.)
ROW_NUMBER() with PARTITION BY clause:
SELECT StudentId, StudentName, Section, Marks, ROW_NUMBER() OVER (PARTITION BY SECTION ORDER BY Marks DESC) AS RowNumber FROM #tbStudent
Output will be as:
StudentId
|
StudentName
|
Section
|
Marks
|
RowNumber
|
6
|
Sidharth
|
A
|
100
|
1
|
7
|
Aalia
|
A
|
95
|
2
|
1
|
Rohan
|
A
|
90
|
3
|
3
|
Aasif
|
A
|
90
|
4
|
2
|
Priya
|
A
|
45
|
5
|
10
|
Rahul
|
A
|
30
|
6
|
8
|
Kabir
|
B
|
90
|
1
|
4
|
Vivek
|
B
|
58
|
2
|
5
|
Arjun
|
B
|
40
|
3
|
9
|
Robin
|
B
|
40
|
4
|
11
|
Anju
|
B
|
32
|
5
|
Here we have partitioned the records based on student
Sections. So it generated row number 1 to 6 for first group i.e. for Section A,
and reset it for next group and generated row number 1 to 5 for Section B.
Similarly RANK(), DENSE_RANK() and NTILE() ranking
functions can also be used with PARTITION BY clause. You can give them a try.
RANK()
The rank() function assigns same rank to same values
which are not distinguishable by ORDER BY. Also, the next different rank does
not start from immediate next number but there is a gap i.e. if 3rd, 4th and
5th student have the same marks then they will have same rank 3 as in our
example, and the next student Vivek which has different marks will have new
rank 6.
Here is the example to clarify the point.
SELECT StudentId, StudentName, Section, Marks, RANK() OVER (ORDER BY Marks DESC) AS [Rank] FROM #tbStudent
Output will be as:
StudentId
|
StudentName
|
Section
|
Marks
|
Rank
|
6
|
Sidharth
|
A
|
100
|
1
|
7
|
Aalia
|
A
|
95
|
2
|
8
|
Kabir
|
B
|
90
|
3
|
1
|
Rohan
|
A
|
90
|
3
|
3
|
Aasif
|
A
|
90
|
3
|
4
|
Vivek
|
B
|
58
|
6
|
2
|
Priya
|
A
|
45
|
7
|
5
|
Arjun
|
B
|
40
|
8
|
9
|
Robin
|
B
|
40
|
8
|
11
|
Anju
|
B
|
32
|
10
|
10
|
Rahul
|
A
|
30
|
11
|
Rohan, Aasif and Kabir got the same rank 3, but the
next student Vivek got the rank 6 , instead of 4 because it keeps original
ordering similar to ROW_NUMBER function. Similarly Arjun and Robin got the same
rank 8 and next student Anju got rank 10 instead of 9.
DENSE_RANK()
The dense_rank() function is similar to rank() window
function i.e. same values will be assigned the same rank, but the next
different value will have rank which is just one more than the previous rank
i.e. if 3rd, 4th and 5th students (Rohan, Aasif and Kabir) has the same marks
then they will have same rank 3 but 6th student Vivek, which has different
marks will have rank 4, unlike rank 6 as in the case with rank() function. Similarly 8th and 9th students Arjun and
Robin got the rank 6 and next student Anju got rank 7 instead of 10 as in the
case with rank() function.
There will be no gap on ranking in case of
dense_rank() as shown in the following example:
SELECT StudentId, StudentName, Section, Marks, DENSE_RANK() OVER (ORDER BY Marks DESC) AS DenseRank FROM #tbStudent
Output will be as:
StudentId
|
StudentName
|
Section
|
Marks
|
DenseRank
|
6
|
Sidharth
|
A
|
100
|
1
|
7
|
Aalia
|
A
|
95
|
2
|
8
|
Kabir
|
B
|
90
|
3
|
1
|
Rohan
|
A
|
90
|
3
|
3
|
Aasif
|
A
|
90
|
3
|
4
|
Vivek
|
B
|
58
|
4
|
2
|
Priya
|
A
|
45
|
5
|
5
|
Arjun
|
B
|
40
|
6
|
9
|
Robin
|
B
|
40
|
6
|
11
|
Anju
|
B
|
32
|
7
|
10
|
Rahul
|
A
|
30
|
8
|
NTILE()
It divides the result set into the number of groups
specified as an argument to the function. If we pass 3 as an argument to this
function then it will divide the result set into 3 groups.
That division is based on the total number of rows in
the result set divided by the number in the argument.
Here is the example to clarify the point.
SELECT StudentId, StudentName, Section, Marks, NTILE(3) OVER(ORDER BY Marks DESC) AS [NTILE] FROM
#tbStudent
Output will be as:
StudentId
|
StudentName
|
Section
|
Marks
|
NTILE
|
6
|
Sidharth
|
A
|
100
|
1
|
7
|
Aalia
|
A
|
95
|
1
|
8
|
Kabir
|
B
|
90
|
1
|
1
|
Rohan
|
A
|
90
|
1
|
3
|
Aasif
|
A
|
90
|
2
|
4
|
Vivek
|
B
|
58
|
2
|
2
|
Priya
|
A
|
45
|
2
|
5
|
Arjun
|
B
|
40
|
2
|
9
|
Robin
|
B
|
40
|
3
|
11
|
Anju
|
B
|
32
|
3
|
10
|
Rahul
|
A
|
30
|
3
|
There are 11 rows in result set and we passed 3 as an
argument to this function, it divided the result set into 3 groups. First and second
group have 4 records and third group have remaining 3 records.
Difference
between ROW_NUMBER() vs RANK() vs DENSE_RANK() and NTILE()
As demonstrated and explained, the difference between
rank, row_number, and dense_rank can be noticed when there are duplicate
records. Since in all of the above examples we are ranking records based on
marks, if more than one student will have the same marks then the difference
between these three ranking functions can be noticed.
The row_number gives continuous numbers, while rank
and dense_rank gives the same rank for duplicates, but the next number in rank
is as per continuous order so you will see a jump. but dense_rank doesn't have
any gap in rankings. ntile divides the result into specified groups and a group
number is then assigned to each row identifying which group the row belongs to.
And here is the query which clearly shows the
difference in the ranking functions:
SELECT
StudentId, StudentName,
Section,Marks,
ROW_NUMBER() OVER (ORDER BY
Marks DESC) AS RowNumber,
RANK() OVER (ORDER BY
Marks DESC) AS [Rank],
DENSE_RANK() OVER (ORDER BY
Marks DESC) AS DenseRank,
NTILE(3) OVER(ORDER BY Marks DESC) AS [NTILE]
FROM #tbStudent
Output will be as:
StudentId
|
StudentName
|
Section
|
Marks
|
RowNumber
|
Rank
|
DenseRank
|
NTILE
|
6
|
Sidharth
|
A
|
100
|
1
|
1
|
1
|
1
|
7
|
Aalia
|
A
|
95
|
2
|
2
|
2
|
1
|
8
|
Kabir
|
B
|
90
|
3
|
3
|
3
|
1
|
1
|
Rohan
|
A
|
90
|
4
|
3
|
3
|
1
|
3
|
Aasif
|
A
|
90
|
5
|
3
|
3
|
2
|
4
|
Vivek
|
B
|
58
|
6
|
6
|
4
|
2
|
2
|
Priya
|
A
|
45
|
7
|
7
|
5
|
2
|
5
|
Arjun
|
B
|
40
|
8
|
8
|
6
|
2
|
9
|
Robin
|
B
|
40
|
9
|
8
|
6
|
3
|
11
|
Anju
|
B
|
32
|
10
|
10
|
7
|
3
|
10
|
Rahul
|
A
|
30
|
11
|
11
|
8
|
3
|
As we can see In case of tie :
ROW_NUMBER() functions
assigned unique row numbers to each row even for records with same marks i.e.
11 unique numbers in sequence for 11 records.
RANK() function assigned same rank for same marks, but
the next different rank is not in sequence, there is a gap. Here in above
example Rohan, Aasif and Kabir got rank 3 and Vivek got rank 6 since rank 3 repeated
3 times. Similarly Arjun and Robin got the same rank 8 and next student Anju
got rank 10 instead of 9.
DENSE_RANK() also assigned same rank for same marks
but there is no gap in between the sequence. The next different is in sequence.
Here in above example Rohan, Aasif and Kabir got rank 3 and next student Vivek
got rank 4. Similarly Arjun and Robin got the rank 6 and next student Anju got
rank 7 instead of 10.
NTILE(3) Divided the result set into 3 groups.
If we use PARTITION BY with all ranking functions then
the output will be as:
SELECT
StudentId, StudentName,Section, Marks,
ROW_NUMBER() OVER (PARTITION BY Section ORDER BY Marks DESC) AS [row number with
partition],
RANK() OVER (PARTITION BY Section ORDER BY Marks DESC) AS [rank with
partiton],
DENSE_RANK() OVER (PARTITION BY Section ORDER BY Marks DESC) AS [dense rank with
partition] ,
NTILE(3) OVER(PARTITION BY Section ORDER BY Marks DESC) AS [ntile with
partiton]
FROM #tbStudent
Output will be as:
StudentId
|
StudentName
|
Section
|
Marks
|
RowNumber
|
Rank
|
DenseRank
|
NTILE
|
6
|
Sidharth
|
A
|
100
|
1
|
1
|
1
|
1
|
7
|
Aalia
|
A
|
95
|
2
|
2
|
2
|
1
|
1
|
Rohan
|
A
|
90
|
3
|
3
|
3
|
2
|
3
|
Aasif
|
A
|
90
|
4
|
3
|
3
|
2
|
2
|
Priya
|
A
|
45
|
5
|
5
|
4
|
3
|
10
|
Rahul
|
A
|
30
|
6
|
6
|
5
|
3
|
8
|
Kabir
|
B
|
90
|
1
|
1
|
1
|
1
|
4
|
Vivek
|
B
|
58
|
2
|
2
|
2
|
1
|
5
|
Arjun
|
B
|
40
|
3
|
3
|
3
|
2
|
9
|
Robin
|
B
|
40
|
4
|
3
|
3
|
2
|
11
|
Anju
|
B
|
32
|
5
|
5
|
4
|
3
|
It works similar but for each partition i.e. for each
group.(Section)
That's all about the difference between ROW_NUMBER(),
RANK(), DENSE_RANK() and NTILE() function in SQL SERVER.
Now over to you:
A blog is nothing without reader's feedback and comments. So please provide your valuable feedback so that i can make this blog better and If you like my work; you can appreciate by leaving your comments, hitting Facebook like button, following on Google+, Twitter, Linkedin and Pinterest, stumbling my posts on stumble upon and subscribing for receiving free updates directly to your inbox . Stay tuned and stay connected for more technical updates.
If you have any question about any post, Feel free to ask.You can simply drop a comment below post or contact via Contact Us form. Your feedback and suggestions will be highly appreciated. Also try to leave comments from your account not from the anonymous account so that i can respond to you easily..