PeerChannel message loss and more test data

Here's the latest in my attempt to find a reliable configuration for PeerChannel to handle large volume enterpries messaging, and some new test results.

When I first started playing with PeerChannel, it seemed like it lost messages very easily. Then I found that by adding an Application.DoEvents or a small sleep in my tight sending loop, things got better. A lot better: a couple of days ago I posted test results where I never lost a single message. If I really pushed things and tried to send a million messages, receivers eventually ran out of memory, but the network and its buffering were superb. Not a single lost message. That testing was perforrmed on Neudesic's network. Now it was time to try it at a customer.

This week, I've been testing at a large enterprise customer in the financial industry with a very busy network. The results are quite different and we're experiencing a lot of message loss. In the test results below, increasing the sleep time in the sender loop eventually gets to a stable point where there is no message loss by any of the receivers, but at a dramatic cost in message throughput. For 100-byte messages, a 35 ms sleep between sends seems to be the magic number. But it's not much of a consolation, because changing any of a number of variables takes you back into a message loss situation. For 1000-byte messages (see last two results below), I went back into a message loss condition even with a 50ms wait between sends.

My questions:

(1) Is this kind of message loss normal or abnormal Prior dialog on this forum seemed to indicate that the team expected no message loss generally, but I'm seeing plenty of it. What might explain this phenomenon

(2) How can I detect how busy a PeerChannel network is, and/or what throttling controls are available to prevent senders from flooding the network Being able to monitor and regulate the throughput rate seems paramount.

(3) Are there best practices for ensuring good network throughput you can point out, so we can make sure we're following them

Thanks,
David

PeerChannel Test Results [Feb CTP] 03/13/06 Network: Large Enterprise Customer
Configuration: 10 PubSubClient nodes (5 nodes x 2 machines), PeerChannel messaging, 100-byte message size, DoEvents between sends
Volatile Machine Start Finish Messages Loss Elapsed time Msgs/min Msgs/sec Notes
sales1 (sender) M1 12:17:10 12:23:37 10000 0 6.45 1550 25.87 Packet loss
sales2 M2 12:17:10 12:23:55 9073 -927 6.75 1344 21.69
sales3 M1 12:17:10 12:23:38 10000 0 6.47 1546 25.8
sales4 M2 12:17:10 12:23:47 7428 -2572 6.62 1123 18.13
sales5 M1 12:17:10 12:23:43 10000 0 6.55 1527 25.48
sales6 M2 12:17:10 12:23:56 6907 -3093 6.77 1021 16.48
sales7 M1 12:17:10 12:23:37 8409 -1591 6.45 1304 21.76
sales8 M2 12:17:10 12:24:00 7619 -2381 6.83 1115 18.02
sales9 M1 12:17:10 12:22:44 3783 -6217 5.57 680 10.91
sales10 M2 12:17:10 12:23:43 10000 0 6.55 1527 25.49
Configuration: 10 PubSubClient nodes (5 nodes x 2 machines), PeerChannel messaging, 100-byte message size, DoEvents+sleep(10) between sends
Volatile Machine Start Finish Messages Loss Elapsed time Msgs/min Msgs/sec Notes
sales1 (sender) M1 13:23:15 13:32:30 10000 0 9.25 1081 17.99 Packet loss
sales2 M1 13:23:15 13:32:30 10000 0 9.25 1081 18.00
sales3 M1 13:23:15 13:32:45 10000 0 9.50 1053 17.53
sales4 M1 13:23:15 13:32:45 10000 0 9.50 1053 17.53
sales5 M1 13:23:15 13:32:30 10000 0 9.25 1081 18.00
sales6 M2 13:23:15 13:32:32 9701 -299 9.28 1045 17.00
sales7 M2 13:23:15 13:32:29 9610 -390 9.23 1041 16.93
sales8 M2 13:23:15 13:32:21 9752 -248 9.10 1072 17.11
sales9 M2 13:23:15 13:32:01 8162 -1838 8.77 931 15.11
sales10 M2 13:23:15 13:32:32 9555 -445 9.28 1029 16.75
Configuration: 10 PubSubClient nodes (5 nodes x 2 machines), PeerChannel messaging, 100-byte message size, DoEvents+sleep(25) between sends
Volatile Machine Start Finish Messages Loss Elapsed time Msgs/min Msgs/sec Notes
sales1 (sender) M1 13:38:30 13:46:42 10000 0 8.20 1220 20.36 Packet loss
sales2 M1 13:38:30 13:46:42 10000 0 8.20 1220 20.36
sales3 M1 13:38:30 13:46:32 10000 0 8.03 1245 20.36
sales4 M1 13:38:30 13:46:32 10000 0 8.03 1245 20.36
sales5 M1 13:38:30 13:46:32 10000 0 8.03 1245 20.36
sales6 M2 13:38:30 13:46:33 9573 -427 8.05 1189 19.33
sales7 M2 13:38:30 13:47:39 10000 0 9.15 1093 17.81
sales8 M2 13:38:30 13:46:37 8696 -1304 8.12 1071 17.41
sales9 M2 13:38:30 13:46:29 8956 -1044 7.98 1122 18.24
sales10 M2 13:38:30 13:46:40 8728 -1272 8.17 1069 17.36
Configuration: 10 PubSubClient nodes (5 nodes x 2 machines), PeerChannel messaging, 100-byte message size, DoEvents+sleep(35) between sends
Volatile Machine Start Finish Messages Loss Elapsed time Msgs/min Msgs/sec Notes
sales1 (sender) M1 14:23:02 14:33:09 10000 0 10.12 988 16.52 Artificially slow, but every node receives 100% of messages in same time frame
sales2 M1 14:23:02 14:33:09 10000 0 10.12 988 16.52
sales3 M1 14:23:02 14:33:09 10000 0 10.12 988 16.52
sales4 M1 14:23:02 14:33:09 10000 0 10.12 988 16.52
sales5 M1 14:23:02 14:33:09 10000 0 10.12 988 16.52
sales6 M2 14:23:02 14:32:55 10000 0 9.88 1012 16.51
sales7 M2 14:23:02 14:32:55 10000 0 9.88 1012 16.51
sales8 M2 14:23:02 14:32:55 10000 0 9.88 1012 16.51
sales9 M2 14:23:02 14:32:55 10000 0 9.88 1012 16.51
sales10 M2 14:23:02 14:32:55 10000 0 9.88 1012 16.51
Configuration: 10 PubSubClient nodes (5 nodes x 2 machines), PeerChannel messaging, 100-byte message size, DoEvents+sleep(40) between sends
Volatile Machine Start Finish Messages Loss Elapsed time Msgs/min Msgs/sec Notes
sales1 (sender) M1 14:08:30 14:19:28 10000 0 10.97 912 15.31 Artificially slow, but every node receives 100% of messages in same time frame
sales2 M1 14:08:30 14:19:28 10000 0 10.97 912 15.31
sales3 M1 14:08:30 14:19:28 10000 0 10.97 912 15.31
sales4 M1 14:08:30 14:19:28 10000 0 10.97 912 15.31
sales5 M1 14:08:30 14:19:28 10000 0 10.97 912 15.31
sales6 M2 14:08:30 14:19:13 10000 0 10.72 933 15.31
sales7 M2 14:08:30 14:19:13 10000 0 10.72 933 15.31
sales8 M2 14:08:30 14:19:13 10000 0 10.72 933 15.31
sales9 M2 14:08:30 14:19:13 10000 0 10.72 933 15.31
sales10 M2 14:08:30 14:19:13 10000 0 10.72 933 15.31