Consider this query which implements a join.
var q = from c in Customers, o in Orders
where c.CustomerId == o.CustomerId
group o by new { o.CustomerId, c.CompanyName } into g
select new { g.Key.CustomerId, g.Key.CompanyName, NumOrders = g.Group.Count() }
As far as I can tell, this implements a nested-loops join, which has O(N^2) performance.
Is there a recommended way to do a more efficient join in LINQ

joins in linq
phanf
My sense is that this would be a non-trivial task, but I think a valuable one.
Jason Cheung
venkatesh24837
from co in customers.class="txt4">Join(orders, c => c.CustomerId, o => o.CustomerId, EqualityComparer<IdType>.Default)
select new { co.CustomerId }
But yeah, ideally youd operate directly on expressions, re-arranging them as necessary, sorting and creating indices as required. One could imagine quite a sophisticated little in-memory database.
aprag
Can we consider more general Join - e.g.
from co in customers.Join(orders, (c,o) = > c.CustomerId==o.CustomerId)
select new {co.CustomerId}
Kur Lan
I dont think its too much of a problem to inject a Farmer.Sequence class somewhere. You have the source code for System.Sequence, just copy and paste, modify to suit, and bango - there you are.
Not sure how name resolution happens when two namespaces are imported that offer the same Where() function, or what happens if one takes a Func<> and the other takes an Expression<>.
I imagine the precedence comes from the order in which they are imported.
SweptSquash
Actually, that probably wouldn't be difficult to pull off at all; I suspect everything's on the Sequence class.
Have you been able to test relative performance
Mattias Asplund
Okay.. looking at Sequence.Where.. It takes a Func, not an Expression. So it may be more difficult.
Unfortunately for me, LINQ doesn't work (yet) on 64-bit.
Tumnus123
Aside from speeding up joins, having operators than can make use of pre-existing indexes would be useful for Where operators.
class House
{
public string Address;
public string City;
public Color HouseColor;
public int OccupantCount;
}
List<House> houses = new List<House>();
// Create some indices.
// This can be expensive, so make it an explicit user operation.
// I can see "index on <property list>" operator for IEnumerable
// and for projection.
houses.CreateIndex("City");
houses.CreateIndex("OccupantCount");
// Adding a house will also maintain indices.
houses.Add(GetHouses());
// I don't how to deal with modifications to house
// without an interceptor or compiler magic.
var nonRedHouses =
from house in houses
where house.City == "El Dorado" // uses index
and house.Color != Color.Red // could use index, if index were created
and
(
house.OccupantCount == 2 // uses index
or house.OccupantCount > 5 // uses index
)
select house index on Color; // automatically populate a Color index on result
var blueHouses =
from house in nonRedHouses index on OccupantCount // create index
where house.OccupantCount <= 10 // use index
and house.Color == Color.Blue // use index
select house; // no indices created on result
Swamy Kanakala
wfillis
Heres a stab at a Join() function:
public static IEnumerable<Pair<T1,T2>> Join<T1,T2>(this IEnumerable<T1> source1, IEnumerable<T2> source2, Func<T1,K> selector1, Func<T2,K> selector2, IEqualityComparer<K> comparer)
{
Dictionary<K, List<T1>> dict = new Dictionary<K, List<T1>>(comparer);
foreach (T1 element in source1)
{
K key = selector1(element);
List<T1> list;
if (!dict.TryGetValue(key, out list))
dict[key] = list = new List<T1>();
list.Add(element);
}
foreach (T2 element2 in source2)
{
List<T1> list = dict[selector2(element2)];
if (list != null)
foreach (T1 element1 in list)
yield return new Pair<T1,T2>(element1, element2);
}
}
tixall
IEnumerable<Customer> custsomers = GetCustomers();
IEnumerable<Orders> orders = GetOrders();
var join = customers.Join(orders, c => c.CustomerId, o => o.CustomerId, EqualityComparer<IdType>.Default);
.. right
I like it. Now a Where() could inspect the expression trees and replace the == operation with Join(). Similar tricks would work for < and >, etc. The only overhead is in the Dictionary, which is the price to pay for indexing.