Here's the latest in my attempt to find a reliable configuration for PeerChannel to handle large volume enterpries messaging, and some new test results.
When I first started playing with PeerChannel, it seemed like it lost messages very easily. Then I found that by adding an Application.DoEvents or a small sleep in my tight sending loop, things got better. A lot better: a couple of days ago I posted test results where I never lost a single message. If I really pushed things and tried to send a million messages, receivers eventually ran out of memory, but the network and its buffering were superb. Not a single lost message. That testing was perforrmed on Neudesic's network. Now it was time to try it at a customer.
This week, I've been testing at a large enterprise customer in the financial industry with a very busy network. The results are quite different and we're experiencing a lot of message loss. In the test results below, increasing the sleep time in the sender loop eventually gets to a stable point where there is no message loss by any of the receivers, but at a dramatic cost in message throughput. For 100-byte messages, a 35 ms sleep between sends seems to be the magic number. But it's not much of a consolation, because changing any of a number of variables takes you back into a message loss situation. For 1000-byte messages (see last two results below), I went back into a message loss condition even with a 50ms wait between sends.
My questions:
(1) Is this kind of message loss normal or abnormal Prior dialog on this forum seemed to indicate that the team expected no message loss generally, but I'm seeing plenty of it. What might explain this phenomenon
(2) How can I detect how busy a PeerChannel network is, and/or what throttling controls are available to prevent senders from flooding the network Being able to monitor and regulate the throughput rate seems paramount.
(3) Are there best practices for ensuring good network throughput you can point out, so we can make sure we're following them
Thanks,
David
| PeerChannel Test Results [Feb CTP] 03/13/06 | Network: Large Enterprise Customer | ||||||||
| Configuration: 10 PubSubClient nodes (5 nodes x 2 machines), PeerChannel messaging, 100-byte message size, DoEvents between sends | |||||||||
| Volatile | Machine | Start | Finish | Messages | Loss | Elapsed time | Msgs/min | Msgs/sec | Notes |
| sales1 (sender) | M1 | 12:17:10 | 12:23:37 | 10000 | 0 | 6.45 | 1550 | 25.87 | Packet loss |
| sales2 | M2 | 12:17:10 | 12:23:55 | 9073 | -927 | 6.75 | 1344 | 21.69 | |
| sales3 | M1 | 12:17:10 | 12:23:38 | 10000 | 0 | 6.47 | 1546 | 25.8 | |
| sales4 | M2 | 12:17:10 | 12:23:47 | 7428 | -2572 | 6.62 | 1123 | 18.13 | |
| sales5 | M1 | 12:17:10 | 12:23:43 | 10000 | 0 | 6.55 | 1527 | 25.48 | |
| sales6 | M2 | 12:17:10 | 12:23:56 | 6907 | -3093 | 6.77 | 1021 | 16.48 | |
| sales7 | M1 | 12:17:10 | 12:23:37 | 8409 | -1591 | 6.45 | 1304 | 21.76 | |
| sales8 | M2 | 12:17:10 | 12:24:00 | 7619 | -2381 | 6.83 | 1115 | 18.02 | |
| sales9 | M1 | 12:17:10 | 12:22:44 | 3783 | -6217 | 5.57 | 680 | 10.91 | |
| sales10 | M2 | 12:17:10 | 12:23:43 | 10000 | 0 | 6.55 | 1527 | 25.49 | |
| Configuration: 10 PubSubClient nodes (5 nodes x 2 machines), PeerChannel messaging, 100-byte message size, DoEvents+sleep(10) between sends | |||||||||
| Volatile | Machine | Start | Finish | Messages | Loss | Elapsed time | Msgs/min | Msgs/sec | Notes |
| sales1 (sender) | M1 | 13:23:15 | 13:32:30 | 10000 | 0 | 9.25 | 1081 | 17.99 | Packet loss |
| sales2 | M1 | 13:23:15 | 13:32:30 | 10000 | 0 | 9.25 | 1081 | 18.00 | |
| sales3 | M1 | 13:23:15 | 13:32:45 | 10000 | 0 | 9.50 | 1053 | 17.53 | |
| sales4 | M1 | 13:23:15 | 13:32:45 | 10000 | 0 | 9.50 | 1053 | 17.53 | |
| sales5 | M1 | 13:23:15 | 13:32:30 | 10000 | 0 | 9.25 | 1081 | 18.00 | |
| sales6 | M2 | 13:23:15 | 13:32:32 | 9701 | -299 | 9.28 | 1045 | 17.00 | |
| sales7 | M2 | 13:23:15 | 13:32:29 | 9610 | -390 | 9.23 | 1041 | 16.93 | |
| sales8 | M2 | 13:23:15 | 13:32:21 | 9752 | -248 | 9.10 | 1072 | 17.11 | |
| sales9 | M2 | 13:23:15 | 13:32:01 | 8162 | -1838 | 8.77 | 931 | 15.11 | |
| sales10 | M2 | 13:23:15 | 13:32:32 | 9555 | -445 | 9.28 | 1029 | 16.75 | |
| Configuration: 10 PubSubClient nodes (5 nodes x 2 machines), PeerChannel messaging, 100-byte message size, DoEvents+sleep(25) between sends | |||||||||
| Volatile | Machine | Start | Finish | Messages | Loss | Elapsed time | Msgs/min | Msgs/sec | Notes |
| sales1 (sender) | M1 | 13:38:30 | 13:46:42 | 10000 | 0 | 8.20 | 1220 | 20.36 | Packet loss |
| sales2 | M1 | 13:38:30 | 13:46:42 | 10000 | 0 | 8.20 | 1220 | 20.36 | |
| sales3 | M1 | 13:38:30 | 13:46:32 | 10000 | 0 | 8.03 | 1245 | 20.36 | |
| sales4 | M1 | 13:38:30 | 13:46:32 | 10000 | 0 | 8.03 | 1245 | 20.36 | |
| sales5 | M1 | 13:38:30 | 13:46:32 | 10000 | 0 | 8.03 | 1245 | 20.36 | |
| sales6 | M2 | 13:38:30 | 13:46:33 | 9573 | -427 | 8.05 | 1189 | 19.33 | |
| sales7 | M2 | 13:38:30 | 13:47:39 | 10000 | 0 | 9.15 | 1093 | 17.81 | |
| sales8 | M2 | 13:38:30 | 13:46:37 | 8696 | -1304 | 8.12 | 1071 | 17.41 | |
| sales9 | M2 | 13:38:30 | 13:46:29 | 8956 | -1044 | 7.98 | 1122 | 18.24 | |
| sales10 | M2 | 13:38:30 | 13:46:40 | 8728 | -1272 | 8.17 | 1069 | 17.36 | |
| Configuration: 10 PubSubClient nodes (5 nodes x 2 machines), PeerChannel messaging, 100-byte message size, DoEvents+sleep(35) between sends | |||||||||
| Volatile | Machine | Start | Finish | Messages | Loss | Elapsed time | Msgs/min | Msgs/sec | Notes |
| sales1 (sender) | M1 | 14:23:02 | 14:33:09 | 10000 | 0 | 10.12 | 988 | 16.52 | Artificially slow, but every node receives 100% of messages in same time frame |
| sales2 | M1 | 14:23:02 | 14:33:09 | 10000 | 0 | 10.12 | 988 | 16.52 | |
| sales3 | M1 | 14:23:02 | 14:33:09 | 10000 | 0 | 10.12 | 988 | 16.52 | |
| sales4 | M1 | 14:23:02 | 14:33:09 | 10000 | 0 | 10.12 | 988 | 16.52 | |
| sales5 | M1 | 14:23:02 | 14:33:09 | 10000 | 0 | 10.12 | 988 | 16.52 | |
| sales6 | M2 | 14:23:02 | 14:32:55 | 10000 | 0 | 9.88 | 1012 | 16.51 | |
| sales7 | M2 | 14:23:02 | 14:32:55 | 10000 | 0 | 9.88 | 1012 | 16.51 | |
| sales8 | M2 | 14:23:02 | 14:32:55 | 10000 | 0 | 9.88 | 1012 | 16.51 | |
| sales9 | M2 | 14:23:02 | 14:32:55 | 10000 | 0 | 9.88 | 1012 | 16.51 | |
| sales10 | M2 | 14:23:02 | 14:32:55 | 10000 | 0 | 9.88 | 1012 | 16.51 | |
| Configuration: 10 PubSubClient nodes (5 nodes x 2 machines), PeerChannel messaging, 100-byte message size, DoEvents+sleep(40) between sends | |||||||||
| Volatile | Machine | Start | Finish | Messages | Loss | Elapsed time | Msgs/min | Msgs/sec | Notes |
| sales1 (sender) | M1 | 14:08:30 | 14:19:28 | 10000 | 0 | 10.97 | 912 | 15.31 | Artificially slow, but every node receives 100% of messages in same time frame |
| sales2 | M1 | 14:08:30 | 14:19:28 | 10000 | 0 | 10.97 | 912 | 15.31 | |
| sales3 | M1 | 14:08:30 | 14:19:28 | 10000 | 0 | 10.97 | 912 | 15.31 | |
| sales4 | M1 | 14:08:30 | 14:19:28 | 10000 | 0 | 10.97 | 912 | 15.31 | |
| sales5 | M1 | 14:08:30 | 14:19:28 | 10000 | 0 | 10.97 | 912 | 15.31 | |
| sales6 | M2 | 14:08:30 | 14:19:13 | 10000 | 0 | 10.72 | 933 | 15.31 | |
| sales7 | M2 | 14:08:30 | 14:19:13 | 10000 | 0 | 10.72 | 933 | 15.31 | |
| sales8 | M2 | 14:08:30 | 14:19:13 | 10000 | 0 | 10.72 | 933 | 15.31 | |
| sales9 | M2 | 14:08:30 | 14:19:13 | 10000 | 0 | 10.72 | 933 | 15.31 | |
| sales10 | M2 | 14:08:30 | 14:19:13 | 10000 | 0 | 10.72 | 933 | 15.31 | |
