TOP 100 ordering of Data Question

Hi,

I was trying to use the UNION construct to combine two tables. The second table was to be sorted before effecting the UNION. The first attempt was the following:

SELECT emp_id, assigned_branch_id
FROM employee
WHERE title = 'Teller'
UNION
SELECT
open_emp_id, open_branch_id
FROM account
ORDER BY open_emp_id

The above was generating an error.

I then tried

SELECT emp_id, assigned_branch_id
FROM employee
WHERE title = 'Teller'
UNION
SELECT
e.open_emp_id, e.open_branch_id
FROM (
SELECT TOP (100) PERCENT open_emp_id, open_branch_id
FROM account
ORDER BY open_emp_id
) e

While the above worked the answer was not correct. I then decided to investiage further and below is a script that should create a Database call TopDemo with one table called employee and populate employee with some data.

The last two instructions will extract

-- Create a Test Database
USE MASTER
GO

IF
NOT EXISTS (SELECT name FROM sys.databases WHERE name = N'TopDemo')
BEGIN
CREATE DATABASE [TopDemo]
END
GO

-- create supporting table
USE [TopDemo]
GO

CREATE TABLE [dbo].[employee](
[account_id] [smallint] IDENTITY(1,1) NOT NULL,
[open_branch_id] [smallint] NULL,
[open_emp_id] [smallint] NULL,
CONSTRAINT [pk_account] PRIMARY KEY CLUSTERED
(
[account_id] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

-- populate the table
INSERT INTO [employee] ([open_branch_id], [open_emp_id])
VALUES (2, 10)
GO

INSERT INTO [employee] ([open_branch_id], [open_emp_id])
VALUES (2, 10)
GO

INSERT INTO [employee] ([open_branch_id], [open_emp_id])
VALUES (2, 10)
GO

INSERT INTO [employee] ([open_branch_id], [open_emp_id])
VALUES (2, 10)
GO

INSERT INTO [employee] ([open_branch_id], [open_emp_id])
VALUES (2, 10)
GO

INSERT INTO [employee] ([open_branch_id], [open_emp_id])
VALUES (3, 13)
GO

INSERT INTO [employee] ([open_branch_id], [open_emp_id])
VALUES (3, 13)
GO

INSERT INTO [employee] ([open_branch_id], [open_emp_id])
VALUES (1, 1)
GO

INSERT INTO [employee] ([open_branch_id], [open_emp_id])
VALUES (1, 1)
GO

INSERT INTO [employee] ([open_branch_id], [open_emp_id])
VALUES (1, 1)
GO

INSERT INTO [employee] ([open_branch_id], [open_emp_id])
VALUES (4, 16)
GO

INSERT INTO [employee] ([open_branch_id], [open_emp_id])
VALUES (1, 1)
GO

INSERT INTO [employee] ([open_branch_id], [open_emp_id])
VALUES (1, 1)
GO

INSERT INTO [employee] ([open_branch_id], [open_emp_id])
VALUES (2, 10)
GO

INSERT INTO [employee] ([open_branch_id], [open_emp_id])
VALUES (4, 16)
GO

INSERT INTO [employee] ([open_branch_id], [open_emp_id])
VALUES (4, 16)
GO

INSERT INTO [employee] ([open_branch_id], [open_emp_id])
VALUES (1, 1)
GO

INSERT INTO [employee] ([open_branch_id], [open_emp_id])
VALUES (1, 1)
GO

INSERT INTO [employee] ([open_branch_id], [open_emp_id])
VALUES (1, 1)
GO

INSERT INTO [employee] ([open_branch_id], [open_emp_id])
VALUES (4, 16)
GO

INSERT INTO [employee] ([open_branch_id], [open_emp_id])
VALUES (4, 16)
GO

INSERT
INTO [employee] ([open_branch_id], [open_emp_id])
VALUES (2, 10)
GO

INSERT INTO [employee] ([open_branch_id], [open_emp_id])
VALUES (4, 16)
GO

INSERT INTO [employee] ([open_branch_id], [open_emp_id])
VALUES (3, 13)
GO

-- Why do these two queries return a different ordering of fields
SELECT TOP (100) PERCENT open_emp_id emp_id, open_branch_id assigned_branch_id
FROM [employee]
ORDER BY open_emp_id
GO

SELECT e.emp_id, e.assigned_branch_id
FROM (
SELECT TOP (100) PERCENT open_emp_id emp_id, open_branch_id assigned_branch_id
FROM [employee]
ORDER BY open_emp_id
) e
GO

If it can be of any help, I managed to get 'proper' ordering from MySQL 5.0.18 by using the following:

SELECT emp_id, assigned_branch_id
FROM employee
WHERE title = 'Teller'
UNION
SELECT emp_id, assigned_branch_id
FROM (
SELECT open_emp_id emp_id, open_branch_id assigned_branch_id
FROM account
ORDER BY open_emp_id
) e
GO

Regards,

Al



Answer this question

TOP 100 ordering of Data Question

  • Tharkysis

    then

    select emp_id, assigned_branch_id from (
    SELECT emp_id, assigned_branch_id, 1 IsTeller
    FROM employee
    WHERE title = 'Teller'
    UNION
    SELECT
    open_emp_id, open_branch_id, 0
    FROM account
    ) temp
    ORDER
    BY IsTeller desc



  • Ypsilon

    this will do it for you

    select emp_id, assigned_branch_id from (
    SELECT emp_id, assigned_branch_id, 1 IsTeller
    FROM employee
    WHERE title = 'Teller'
    UNION
    SELECT
    open_emp_id, open_branch_id, 0
    FROM account
    ) temp
    ORDER
    BY IsTeller desc, emp_id



  • SuperJon

    what happens when you remove the IsTeller field from the order by clause (in the MSSQL version)-

    select value from (
    SELECT value, 1 IsTeller
    FROM un1
    UNION ALL
    SELECT value, 0
    FROM un2
    ) temp
    ORDER BY IsTeller desc

    also. . . in the MySQL what would you get from a simple

    SELECT value, 1 IsTeller
    FROM un1

    I would expect 1,5,10 as the primary key is the default index

    so you want the top part of the union to come in a random order



  • URLmon

    I agree that ideally you should specify an ORDER BY clause as in the case with MySQL and Oracle 10g, but if you do that with MS SQL you get an error.

    /*------------------------
    SELECT value
    FROM un1
    UNION ALL
    SELECT value
    FROM (
    SELECT value
    FROM un2
    ORDER BY value
    ) e
    GO
    ------------------------*/
    Msg 1033, Level 15, State 1, Line 9

    The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified.

    So my query on whether MS SQL can build a UNION between two selects with the second select only being sorted still remains open in my opinion.

  • GREESH KUMAR MAHESHWARI

    oops. . . I misread. . . hold on

  • sholliday

    by the way what is the default index field(s) and their types on employee



  • Jan Buskens

    Thank you for your input. I appreciate the point you are making, but appreciate that this example is a rigged one.

    Imagine that you want to combine data from two tables (with primary keys, etc). You want to join data using a union operator and want that the second set be sorted according to a particular field. How would you go about doing it

    I admit that the title is misleading, but when I started off I was under the impression that TOP was the culprit.

    Put differently, what would be the equivalent MS SQL statement to the following MySQL/Oracle construct

    SELECT value
    FROM un1
    UNION ALL
    SELECT value
    FROM (
    SELECT value
    FROM un2
    ORDER BY value
    ) e
    GO


  • Pittsburgh

    Order of rows returned by the SELECT statement is determine ONLY by the last ORDER BY clause. So if you need rows in a particular order you need to specify an ORDER BY clause in your SELECT statement using the appropriate columns. This behavior is in line with the ANSI SQL standard. However, the order in which rows are returned in case where there is no explicit ORDER BY clause is plan dependent. Each database engine will handle it differently and even for same query under different conditions. The query optimizer in MSSQL/Oracle/IBM is much more sophisticated than MySQL so you can get different execution plans for same query which will return rows in different order. See the post http://blogs.msdn.com/sqltips/archive/2005/07/20/441053.aspx for more details on ordering guarantees in SQL Server.


  • Ajit Sheth

    If you compare it to the MySQL version, you will notice that the first set is not sorted, only the second set it. Your solution will keep the two parts of the UNION separate but will sort group each by their respective emp_id and open_emp_id.


  • Witold Jaworski

    I apologies that I forgot to answer the first part of your comment.

    The answer to the first script is 5, 1, 10, 5, 1, 10.

    My question is what need one do to get 5, 1, 10, 1, 5, 10 (second group only sorted)


  • Zoiner Tejada

    In reply to your query:

    select value from (
    SELECT value, 1 IsTeller
    FROM un1
    UNION ALL
    SELECT value, 0
    FROM un2
    ) temp
    ORDER BY IsTeller desc

    -- output
    -- 5, 1, 10, 1, 10, 5

  • Dudley McFadden

    I removed the sort by value and only sort by IsTeller.

    as "Uma" pointed out the query engine will apply plans it has available to it, unless elimnated by other "order by" elements. the engine implies a sort by the primary key index, then the sort by IsTeller is explicitly applied.

    ok, lets say we had these tables -

    create table un1 (id int primary key, value int, sub varchar(255));
    create table un2 (id int primary key, value int, sub varchar(255));

    with

    un1            un2                 
    1   5    z     1   5    z
    2   1    m     2   1    m
    3   10   a     3   10   a
                       
    and I wanted

    un1 first with un2 ordered by z

    select value, unionsub sub
    from(
        select id, value, sub unionsub, null dummy, 1 isTeller
        from un1 
      union all                   
        select id, value, sub unionsub, sub dummy, 0 isTeller
        from un2 
    ) order desc by IsTeller, dummy

    should yield:

    5    z
    1    m
    10   a
    10   a
    1    m
    5    z

    keep in mind, in all three cases, if there is any other index on un1, it will be implicitly applied prior to the union. in this case, its the primary key.

    Also, in all three cases, if there is no other index in un1, there is no guarantee of the order of the top part of the union will be the same on subsequent runs of the query even if the contents of un1 are the same except that a record has been deleted and then reinserted. The result will be affected by record position and this is not such a good thing.



  • Saurav Kr. Basu

    I bet you dont have exactly the same schema in the MSSQL, that is, Primary Key on main_id (remember a primary key is an index)

    run this script in MsSql


    create table un1
     (main_id int identity(1,1) not null primary key,
      value int not null
     )
    GO
    INSERT INTO un1(value)VALUES (5);
    GO
    INSERT INTO un1(value)VALUES (1);
    GO
    INSERT INTO un1(value)VALUES (10);
    GO
    SELECT value from un1
    GO
    create table un2
     (main_id int identity(1,1) not null primary key,
      value int not null
    )
    GO
    INSERT INTO un2(value)VALUES (5);
    GO
    INSERT INTO un2(value)VALUES (1);
    GO
    INSERT INTO un2(value)VALUES (10);
    GO
    SELECT value from un1
    GO
    select value from (
     SELECT value, 1 IsTeller
      FROM un1
     UNION ALL
     SELECT value, 0
      FROM un2
    ) temp
    ORDER BY IsTeller desc
    go
    drop table un1
    go
    drop table un2
    go
    You get the desired output, correct

    now look at this, do you really want to depend only on insert order for the top part of the union your three row tables are trivial. but watch what happens if you do fairly large table with no indexes (MSSQL follows)
    create table un1
     (main_id int,
      value int
     )
    GO
    declare @i int
    set @i = 0
    while @i < 1000
    begin
     INSERT INTO un1 VALUES (@i, rand() * 1000);
     set @i = @i + 1
    end
    go
    create table un2
     (main_id int,
      value int
     )
    go
    insert into un2 select * from un1
    go
    select value into ##temp1 from (
     SELECT value, 1 IsTeller
      FROM un1
     UNION ALL
     SELECT value, 0
      FROM un2
    ) temp
    ORDER BY IsTeller desc
    go
    declare @n int
    select @n = value from un1 where main_id = 0
    select  @n n
    delete from un1 where main_id = 0
    insert into un1 values (0,@n)
    go
    select value into ##temp2 from (
     SELECT value, 1 IsTeller
      FROM un1
     UNION ALL
     SELECT value, 0
      FROM un2
    ) temp
    ORDER BY IsTeller desc
    go
    select * from ##temp1
    go
    select * from ##temp2
    go
    drop table ##temp1
    go
    drop table ##temp2
    go
    select count (*) from un1  inner join un2 on un1.main_id = un2.main_id where un1.value <> un2.value
    go
    drop table un1
    go
    drop table un2
    go

    change this line:
            while @i < 5
    to:
            while @i < 10       
    and run again.
        
    then try
            while @i < 100        
        
    and finally
            while @i < 1000
           
    note how on the last run, row 1 of ##temp1 and ##temp2 are different

    But wait - the relational data contained is exactly the same - the value related for to a given main_id is the same in both tables -

    select count (*) from un1  inner join un2 on un1.main_id = un2.main_id where un1.value <> un2.value

    yields: 0

    I guarantee the same thing happens in Oracle and MySQL. If it doesn't, this is an indication that MSSQL's query optimization is superior.

    In a database, rarely if ever are business rules dictated by physical position in a table. The anomoly here is that the result set is determined by size of the table and that is totally inappropriate.



  • Jerry Eldridge

    My original posting was taken from a larger scenario, so I decided to simplify so that the difference between MySQL and MS SQL 2005 can be highlighted (In my opinion, MySQL is the correct answer).

    The code that follows was run on MySQL 5.0.18:

    create table un1
    (main_id smallint unsigned not null auto_increment,
    value smallint unsigned not null,
    constraint pk_main_id primary key (main_id)
    )
    GO

    insert into un1
    values (null, 5)
    GO

    insert into un1
    values (null, 1)
    GO

    insert into un1
    values (null, 10)
    GO

    SELECT value from un1
    GO
    -- output
    -- 5, 1, 10

    --

    create table un2
    (main_id smallint unsigned not null auto_increment,
    value smallint unsigned not null,
    constraint pk_main_id primary key (main_id)
    )
    GO

    insert into un2
    values (null, 5)
    GO

    insert into un2
    values (null, 1)
    GO

    insert into un2
    values (null, 10)
    GO

    SELECT value from un2
    GO
    -- output
    -- 5, 1, 10

    ---

    SELECT value
    FROM un1
    UNION ALL
    SELECT value
    FROM un2
    -- output
    -- 5, 1, 10, 5, 1, 10


    SELECT value
    FROM un1
    UNION ALL
    SELECT value
    FROM (
    SELECT value
    FROM un2
    ORDER BY value
    ) e
    GO
    -- output
    -- 5, 1, 10, 1, 5, 10

  • TOP 100 ordering of Data Question