joins in linq

Consider this query which implements a join.

var q = from c in Customers, o in Orders
where c.CustomerId == o.CustomerId
group o by new { o.CustomerId, c.CompanyName } into g
select new { g.Key.CustomerId, g.Key.CompanyName, NumOrders = g.Group.Count() }

As far as I can tell, this implements a nested-loops join, which has O(N^2) performance.

Is there a recommended way to do a more efficient join in LINQ




Answer this question

joins in linq

  • phanf

    A buddy and I were asking the same thing.  The only thing we could come up with was that LINQ does't inherently know about indexes (to our knowledge).  You'd need to create some sort of indexable data store, and probably override the Where() clause to rewrite its expression to make use of the indexer when it made sense. 
    My sense is that this would be a non-trivial task, but I think a valuable one.

  • Jason Cheung

    We are considering more explicit join operators like the one damien describes.

  • venkatesh24837

    You can use it directly in a from.where.select expression:


    from co in customers.class
    ="txt4">Join(orders, c => c.CustomerId, o => o.CustomerId, EqualityComparer<IdType>.Default)
    select new { co.CustomerId }

     

    But yeah, ideally youd operate directly on expressions, re-arranging them as necessary, sorting and creating indices as required. One could imagine quite a sophisticated little in-memory database.

  • aprag

    Can we consider more general Join - e.g.

    from co in customers.Join(orders, (c,o) = > c.CustomerId==o.CustomerId)
    select new {co.CustomerId}


  • Kur Lan

    Can you run in 32bit mode

    I dont think its too much of a problem to inject a Farmer.Sequence class somewhere. You have the source code for System.Sequence, just copy and paste, modify to suit, and bango - there you are.

    Not sure how name resolution happens when two namespaces are imported that offer the same Where() function, or what happens if one takes a Func<> and the other takes an Expression<>.

    I imagine the precedence comes from the order in which they are imported.

  • SweptSquash

    Right.  My point is that if you wrote a proxy to the build-in Where, you could include the indexed Join behavior in the query syntax (non-dotted).

    Actually, that probably wouldn't be difficult to pull off at all; I suspect everything's on the Sequence class.

    Have you been able to test relative performance


  • Mattias Asplund

    Okay.. looking at Sequence.Where.. It takes a Func, not an Expression.  So it may be more difficult.

    Unfortunately for me, LINQ doesn't work (yet) on 64-bit.



  • Tumnus123

    Aside from speeding up joins, having operators than can make use of pre-existing indexes would be useful for Where operators.



    class House
    {
        public string Address;

        public string City;

        public Color HouseColor;

        public int OccupantCount;
    }

    List<House> houses = new List<House>();

    // Create some indices.
    // This can be expensive, so make it an explicit user operation.
    // I can see "index on <property list>" operator for IEnumerable
    // and for projection.
    houses.CreateIndex("City");
    houses.CreateIndex("OccupantCount");

    // Adding a house will also maintain indices.
    houses.Add(GetHouses());

    // I don't how to deal with modifications to house
    // without an interceptor or compiler magic.

    var nonRedHouses =
        from house in houses
        where house.City == "El Dorado" // uses index
        and house.Color != Color.Red // could use index, if index were created
        and
        (
            house.OccupantCount == 2  // uses index
            or house.OccupantCount > 5 // uses index
        )
        select house index on Color; // automatically populate a Color index on result

    var blueHouses =
        from house in nonRedHouses index on OccupantCount // create index
        where house.OccupantCount <= 10 // use index
        and house.Color == Color.Blue // use index
        select house; // no indices created on result

     



  • Swamy Kanakala

    64-bit Framework .. I don't think WOW would affect it.


  • wfillis

    Well, the GroupBy function knows how to index incoming data, so its not impossible. Its just not directly supported by the LINQ syntax.

    Heres a stab at a Join() function:



    public static IEnumerable<Pair<T1,T2>> Join<T1,T2>(this IEnumerable<T1> source1, IEnumerable<T2> source2, Func<T1,K> selector1, Func<T2,K> selector2, IEqualityComparer<K> comparer)
    {
        Dictionary<K, List<T1>> dict = new Dictionary<K, List<T1>>(comparer);   
        foreach (T1 element in source1)
        {
            K key = selector1(element);
            List<T1> list;
            if (!dict.TryGetValue(key, out list))
                dict[key] = list = new List<T1>();
            list.Add(element);
        }
        foreach (T2 element2 in source2)
        {
            List<T1> list = dict[selector2(element2)];
            if (list != null)
                foreach (T1 element1 in list)
                    yield return new Pair<T1,T2>(element1, element2);
        }
    }

     


  • tixall

    called as:

    IEnumerable<Customer> custsomers = GetCustomers();
    IEnumerable<Orders> orders = GetOrders();

    var join = customers.Join(orders, c => c.CustomerId, o => o.CustomerId, EqualityComparer<IdType>.Default);

    .. right

    I like it.  Now a Where() could inspect the expression trees and replace the == operation with Join().  Similar tricks would work for < and >, etc.  The only overhead is in the Dictionary, which is the price to pay for indexing.


  • joins in linq