Why is this slow?

The query is just... select name, max(date) from myTable group by name

myTable had a clustered index PK, also non-clustered indexes on name and date.

There are approx 1,000,000 rows and there are only 3 distinct names.

This takes 10-15 secs to execute, even running directly on the server.

Server trace says duration = 100, reads = 100

Profiler says CPU=20000, reads = 20000, duration = 10000

What on earth is going on !



Answer this question

Why is this slow?

  • Mike Price

    [Post recalled - I didn't read the question properly!]


  • zyunique

    Alternatively, you could make the dual-column index (as suggested above) a clustered index such that the maximum date may be found immediately for each name. Using a non-clustered index will still require a scan of all date values in the index, since a comparison of dates for each name is necessary.


  • Mike H B

    Is it on SQL 2000 or 2005 How is the Clustered index defined For this case, it will be useless if it is on name column only as it contains only 3 distinct values.

    -- runt the following in Query Analyzer (for 2000) or management Studio (for 2005), it will show IO + CPU time for compile and execution
    SET statistics IO ON
    go
    your query here
    go
    SET statistics IO OFF
    go

    -Shus

     

     


  • drpavan

    Makes sense. I stand corrected.


  • Elina

    Actually, it will not require a scan of every date value, if both columns are included in the index -- only every name value must be scanned.  The query engine should be able to scan each name and then use the fact that the dates are already sorted in the index to pull only the max date for each name, without having to read every value.  This is true for either a clustered or non-clustered index.
     

    --
    Adam Machanic
    Pro SQL Server 2005, available now
    http://www..apress.com/book/bookDisplay.html bID=457
    --
     
     
    Alternatively, you could make the dual-column index (as suggested above) a clustered index such that the maximum date may be found immediately for each name. Using a non-clustered index will still require a scan of all date values in the index, since a comparison of dates for each name is necessary.

  • Altunin Yuri

    Have you looked at the execution plan   I'm betting that your nonclustered indexes aren't even being used -- neither of them provide access to both of the columns you need, so a lookup would be necessary.  And because every name is needed to satisfy this query, a scan of the cluster is going to be a lot cheaper than scanning the name index then doing a lookup into the cluster for every row.
     
    So the most likely way to easily fix this problem: Add the Date column to your nonclustered index on Name.
     

    --
    Adam Machanic
    Pro SQL Server 2005, available now
    http://www..apress.com/book/bookDisplay.html bID=457
    --
     
     

    The query is just... select name, max(date) from myTable group by name

    myTable had a clustered index PK, also non-clustered indexes on name and date.

    There are approx 1,000,000 rows and there are only 3 distinct names.

    This takes 10-15 secs to execute, even running directly on the server.

    Server trace says duration = 100, reads = 100

    Profiler says CPU=20000, reads = 20000, duration = 10000

    What on earth is going on !


  • Why is this slow?