none
SQL Query about Selecting a Top10 Record of a lists of data? RRS feed

  • 問題

  • Hi

    I need to create and function to select the TOP 10 records of each of the user from a table. Could you give me some idea whether I can do it smartly.
    E.G.
    The table contains the mission records of all the users (e.g. 8000 users). I need to generate a table which only contains the TOP 10 results of these users.
    The tables contains more than 5 Millions data per months.
    I know I can do that by writing a program to run a stored procedure to do the summary and insert the data into another table. But it seems takes few minutes to finish a user.

    I would be grateful if there any better way to do it.

    Many thanks

    Chi

    2008年4月16日 下午 03:55

解答

  •  

    One of the major overheads associated with the use of non-clustered indexes is the cost of bookmark lookups. Bookmark lookups are a mechanism to navigate from a non-clustered index row to the actual data row in the base table (clustered index) and can be very expensive when dealing with large number of rows.

     

    So you may consider to create clustered index on Spend and RecordID

    2008年4月30日 上午 09:28

所有回覆

  • Please give us the schema of the table. Hard to help to think the query for you without schema.

    2008年4月23日 上午 08:07
  • Hi ken

    Here is the example scheme.

    CREATE TABLE [MissionResults]
    (
     [CustomerCode] Varchar(8) NULL,
     [Mission_Code] Varchar(10) NULL,
     [Duration] Int NULL, -- how long have they spend on it
     [DateOfAction] Datetime NULL --Time stamp when they start to do the mission
     [Spends] money NULL -- Money spend on the mission
    )
    go


    What I need to summarise is to get the Top 10 missions that each of the user spend the most time/money/ played the most on it.

    Thanks very much for your help.

    Many thanks

    Chi
    2008年4月23日 上午 09:12
  • Let's talk about getting Top 10 missions that each customer spend the most time. For example, let's use the Order table in Northwind Database.

     

    Code Snippet
    select orderid, customerid, freight from orders
    order by customerid

     

    This Code will show all [OrderID], [CustomerID], [Freight] from the [Orders] table. Similar to your case by replacing as [Mission_Code], [CustomerCode], [Duration].

     

    You will see that for

    CustomerID=ALFKI, there are 6 OrderID={10643,10692,10702,10835,10952, 11011} with 6 Freight={'29.46', '61.02', '23.94', '69.53', '40.42', '1.21'}

    CustomerID=ANTON, there are 7 OrderID={10365, 10507, 10535, 10573, 10677, 10682, 10856} with 7 Freight={'22.00', '47.45', '15.64', '84.84', '4.03', '36.13', '58.43'}


    Code Snippet
    select orderid, customerid, freight from orders as a
    where freight in (select top 5 freight from orders as b where a.customerid = b.customerid order by freight desc)
    order by customerid

     

     

    With using this query, you will only see top 5 Freight for each customer

    CustomerID=ALFKI, there are 6 OrderID={10643, 10692, 10702, 10835, 10952, 11011} with 5 Freight={'29.46', '61.02', '23.94', '69.53', '40.42'}

    CustomerID=ANTON, there are 7 OrderID={10365, 10507, 10573, 10682, 10856} with 5 Freight={'22.00', '47.45', '84.84', '36.13', '58.43'}

     

    SO for your table, it should be

     

    Top 10 missions that each customer spend the most time

    Code Snippet

    select * from MissionResults as a
    where Duration in (select top 5 Duration from MissionResults as b where a.CustomerCode= b.CustomerCode order by Duration desc)
    order by CustomerCode

     

     

    Top 10 missions that each customer spend the most money

    Code Snippet

    select * from MissionResults as a
    where Spends in (select top 5 Spends from MissionResults as b where a.CustomerCode= b.CustomerCode order by Spends desc)
    order by CustomerCode

     

     

    Hope this solve your problem. Please tell me if this is correct solution which you are looking for.
    2008年4月25日 上午 03:43
  • Hi Ken

    Thanks for your information, the query work quite well, unfortunately, I forgotten to let you know that for all the top 10 result it need to display the date/time when the mission start or something. so in that case if the top 10 records (E.G. spend, if they spend the same amount, the query will select more than 10 record the the customer, because the spend value matched.

    I have another field which is like the unique ID (auto increment number) when I modify the query slightly instead of doing

    select TOP 200 * from MissionResults as a
    where Spends in (select top 5 Spends from MissionResults as b where a.CustomerCode= b.CustomerCode order by Spends desc)
    order by CustomerCode

    I do
    select TOP 200 * from MissionResults as a
    where RECORDID in (select top 5 RECORDID from MissionResults as b where a.CustomerCode= b.CustomerCode order by Spends desc)
    order by CustomerCode



    it takes me over an hour to pull out 200 records
    (for testing purpose)
    On the execution plan it said it spend 100% cost on Bookmark lookup on the Spend and RECORDID

    Any advice?

    Thanks very much for your help.

    Chi
    2008年4月29日 下午 02:58
  •  

    One of the major overheads associated with the use of non-clustered indexes is the cost of bookmark lookups. Bookmark lookups are a mechanism to navigate from a non-clustered index row to the actual data row in the base table (clustered index) and can be very expensive when dealing with large number of rows.

     

    So you may consider to create clustered index on Spend and RecordID

    2008年4月30日 上午 09:28
  • Hi Ken

    Thanks for your information. I will try again and let you know the result later.

    Thanks a lot
    2008年5月7日 下午 09:20
  •  

    Please mark "這篇文章有用嗎?" to change the status once the problem is solved. Thanks.
    2008年5月8日 上午 03:03
  • Hi Ken

    Sorry to reply late, I have tested it out by adding Cluster Index It works quite well.
    Is there any rule that I should create a Clustered Index?
    If I do not tick the check box for Clustered Index when I create an Index, Does it mean it is a non-Clustered Index?

    Thanks very much for your help. I have marked the replies are useful and marked the answered reply.
    2008年5月12日 上午 08:43
  • Yes, you need to check clustered. With clustered index, you can defrag table by rebuild it.

    2008年5月12日 下午 06:36
  • You can create *one* clustered index on each table. If you do not tick/check the Clustered Index checkbox, non-clustered index is created.

     

    Generally, you should evaluate all queries that are executed against the table, and create the clustered index to cover the columns that are most commonly used in your 'WHERE' clauses. You should then check whether your modifications (about clustered / non-clustered indexes) enhances the performance of your database. (Note that index affects the performance of INSERT / UPDATE / DELETE statements.)

     

    ===== For more information =====

     

    ~ Clustered index design guidelines

    http://msdn.microsoft.com/en-us/library/ms190639.aspx

    2008年5月13日 上午 09:40