![]() |
![]() |
![]() |
Perhaps one of the most critical decisions application designers face when creating a distributed system is how data should be exchanged between interested parties. Typically, this involves selecting one or more communication protocols and determining the most efficient means of dispatching data to each endpoint. Implementing lower-level communications software can be time consuming, expensive, and prone to error. Too often, after investing a significant amount of time and effort, many designers find that last minute changes in requirements or flaws in the initial design lay waste to this crucial layer of software.
Middleware solutions are designed to aid (if not solve) these types of problems. Rather than have each application implement a custom communications suite, designers may instead rely upon the features present in a given Middleware product to act as a de-facto compatibility layer. The Data Distribution Service (DDS) specification is one such solution proposed by the Object Management Group. As described in a prior article, OpenDDS is an open source implementation of the DDS specification that provides a number of pluggable transport options to meet a wide variety of communication needs. Combined with expressive Quality of Service (QoS) facilities, OpenDDS is an ideal solution for those projects which demand a configurable and efficient means of distributing data with little to no risk for the designer.
Starting with version 2.1, OpenDDS provides a new IP multicast transport with improved reliability and better determinism on lossy networks. This article provides a detailed explanation of the multicast transport implementation, its interactions over an IP-based network, and a comparative analysis of other IP-based transports supported by OpenDDS.
Starting with version 2.1, support for IP multicast is provided by the multicast transport.
The multicast transport was designed as a replacement for the older ReliableMulticast
and SimpleMcast transports. Unlike other OpenDDS transport implementations, multicast
provides unified support for best-effort and reliable delivery based on transport configuration.
Best-effort delivery imposes the least amount of overhead as data is exchanged between peers, however it does not provide any guarantee of delivery. Data may be lost due to unresponsive or unreachable peers or received in duplicate.
Reliable delivery provides for guaranteed delivery of data to associated peers with no duplication at the cost of additional processing and bandwidth. Reliable delivery is achieved through two primary mechanisms: 2-way peer handshaking and negative acknowledgment of missing data. Each of these mechanisms is bounded to ensure deterministic behavior and is configurable to ensure the broadest applicability possible for user environments.
Reliable delivery is achieved through two primary mechanisms:
2-way peer handshaking (SYN/SYN+ACK)
Negative acknowledgment of missing data (NAK/NAK+ACK)
For the purpose of discussing the interactions between reliable peers, it helps to narrow the view of the multicast group to two specific peers: active and passive. The active peer is the publishing side of an association. The passive peer is always the subscribing peer. It is important to remember that control data is exchanged among all peers within the multicast group; however disinterested peers discard payloads which are not addressed to them directly.
The first mechanism used to ensure reliability is a 2-way handshake between associated peers. The handshaking mechanism provides a deterministic bound to how long a given peer will wait to form an association. This feature is also used to prevent the publishing peer from sending data too early. This mechanism is detailed below:

The active (i.e. publisher) peer initiates a handshake by sending a MULTICAST_SYN control sample to the passive (i.e. subscriber) peer at
periodic intervals bounded by a timeout to ensure delivery. An exponential backoff is observed based on the number
of retries.
Upon receiving the MULTICAST_SYN control sample, the passive peer records the current transport sequence
value for the active peer and responds with a MULTICAST_SYNACK control sample. This sequence number establishes a value
from which reception gaps may be identified in the future.
If the active peer receives a MULTICAST_SYNACK control sample, it then notifies the application that the
association is complete. In the event a MULTICAST_SYNACK control sample is not received, an error is logged.
Negative acknowledgment is the primary mechanism for ensuring data reliability. This mechanism places the responsibility of detecting data loss on each subscriber which allows the publisher to dispatch data as quickly and efficiently as possible. This mechanism is detailed below:

As each datagram is delivered to a passive peer, the transport sequence value is scored against a special container
which identifies reception gaps. Periodically, a watchdog executes and ages off the oldest unfulfilled repair requests by resetting the sequence low-water mark and logging an error. Once unfulfilled requests have been
purged, the watchdog then checks for reception gaps. In the event a new (or existing) reception gap is identified, the
peer records the current interval and the high-water mark for the transport sequence. It then issues MULTICAST_NAK
control samples to active peers from which it is missing data. Passive peers perform NAK suppression based on all repair
requests destined for the same active peer. If previous MULTICAST_NAK requests have been sent by other passive peers, the passive peer temporarily omits previously
requested ranges for the current interval. This interval is randomized to prevent potential collisions among
similarly associated peers.
If the active peer receives a MULTICAST_NAK control sample it examines its send buffer for the missing datagram.
If the datagram is no longer available, it sends a MULTICAST_NAKACK control sample to the entire multicast group to
suppress further repair requests. If the datagram is still available, it resends the missing data to the entire multicast group.
If the passive peer receives a MULTICAST_NAKACK control sample, it behaves just as if the repair request has aged
off, suppressing further repair requests and logging an error.
The multicast transport supports a number of configuration options:
The default_to_ipv6 and port_offset options affect how default
multicast group addresses are selected. If default_to_ipv6 is set to 1
(enabled) then the default IPv6 address is used. The port_offset option determines
the default port used; this value is added to the transport ID to determine the actual port number.
The default values are 0 (disabled) and 49400 respectively.
The group_address option may be used to manually define a multicast group to join
to exchange data. Both IPv4 and IPv6 addresses are supported. As with SimpleTcp, OpenDDS IPv6
support requires that the underlying ACE/TAO components be built with IPv6 support enabled. The
default value is 224.0.0.128:<port> for IPv4 and [FF01::80]:<port>
for IPv6.
If reliable delivery is desired, the reliable option may be specified. The
remainder of configuration options affect the reliability mechanisms used by the multicast transport.
The default value is 1 (enabled).
The syn_backoff, syn_interval, and syn_timeout
configuration options affect the handshaking mechanism. syn_backoff is the exponential
base used when calculating the backoff delay between retries. The syn_interval option
defines the minimum number of milliseconds to wait before retrying a handshake. The syn_timeout
defines the maximum number of millseconds to wait before giving up on the handshake. The default
values are 2.0, 250, and 30000 (30 seconds) respectively.
Given the values of syn_backoff and syn_interval, it is possible to calculate the delays between
handshake attempts (bounded by syn_timeout):
delay = syn_interval * syn_backoff ^ <# retries>
For example, if the default configuration options are assumed, the delays between handshake attempts would be: 0, 250, 1000, 2000, 4000, 8000, and 16000 milliseconds.
The nak_depth, nak_interval, and nak_timeout configuration
options affect the negative acknowledgment mechanism. nak_depth determines the maximum number
of datagrams retained by the transport to service incoming repair requests. The nak_interval
configuration option defines the minimum number of milliseconds to wait between repair requests. This
interval is randomized to prevent potential collisions between similarly associated peers. The maximum delay
between repair requests is bounded to double the minimum value. The nak_timeout configuration option defines
the maximum amount of time to wait on a repair request before giving up. The default values are
32, 500, and 30000 (30 seconds) respectively.
For convenience, an example .ini file is provided in the OpenDDS distribution in the following location:
$DDS_ROOT/dds/DCPS/transport/multicast/multicast.ini-dist
Currently, there are a couple of restrictions which must be observed when using the transport:
At most, each unique multicast group may be used by only one DDS domain.
A given DomainParticipant object may have only one multicast transport attached for a particular multicast group. For example, an application may not use the same multicast group in transports for both a Publisher and a Subscriber created from the same DomainParticipant object. If an application wishes to both send and receive samples on the same multicast group in the same process, then the application must create two different DomainParticipant objects for the same domain -- one for the Publisher, and another for the Subscriber.
As with other transports, dynamic configuration is enabled by the ACE Service Configurator Framework.
A svc.conf file should be created which contains the following:
dynamic OpenDDS_DCPS_Multicast_Service Service_Object * OpenDDS_Multicast:_make_MulticastLoader()
The transport is loaded at runtime by passing the following arguments to the application's command-line:
-ORBSvcConf svc.conf
Alternately, the transport may be statically loaded by including the
dds/DCPS/transport/multicast/Multicast.h header (implies static linking):
#ifdef ACE_AS_STATIC_LIBS # include <dds/DCPS/transport/multicast/Multicast.h> #endif
Finally, The multicast transport is configured and attached similarly to other transport implementations:
OpenDDS::DCPS::TransportImpl_rch transport_impl =
TheTransportFactory->create_transport_impl(
OpenDDS::DCPS::DEFAULT_MULTICAST_ID,
OpenDDS::DCPS::AUTO_CONFIG);
OpenDDS::DCPS::AttachStatus status = transport_impl->attach(...);
if (status != OpenDDS::DCPS::ATTACH_OK) {
exit(EXIT_FAILURE);
}
When discussing a wire protocol, it is often helpful to provide a graphical representation of protocol interactions. As described above, there are four transport control sample types used by the protcol state machine:
MULTICAST_SYN
MULTICAST_SYNACK
MULTICAST_NAK
MULTICAST_NAKACK
Examples of these PDUs are provided below. These packets were captured from a SunOS (SPARC)
workstation using the odds_dissector Wireshark plugin, available in OpenDDS version 2.1.1:

MULTICAST_SYN control samples are sent from active (i.e. publisher) peers. These samples declare
an intent to form an association with a specific passive peer. The passive peer is identified
by the last four octets (highlighted in blue). This value corresponds to the passive peer's
Source ID.

MULTICAST_SYNACK control samples are sent from passive (i.e. subscriber) peers. These samples are
sent to acknowledge a handshake attempt by an active peer. When received, the active peer
identified by the last four octets (highlighted in blue) completes the association with the
passive peer. This value corresponds to the active peer's Source ID.

MULTICAST_NAK control samples are sent from passive peers. These samples are
sent when a reception gap has been identified. The octets highlighted in blue indicate
the Source ID of the active peer from which it is missing data. The following four octets
highlighted in purple indicate the range of missing data. The first two octets indicate the
lower bound of the missing range, while the last two octets indicate the upper bound. These
values correspond to the sequence numbers of the datagrams.

MULTICAST_NAKACK control samples are sent from active peers. These samples are
sent when a repair request is unable to be fulfilled. The last two octets (highlighted in
purple) indicate the sequence number of the first datagram which can be recovered.
An informal test was constructed to showcase the scalability aspect of IP multicast communication.
The scaling test (available as a part of the OpenDDS-Bench benchmarking framework, $DDS_ROOT/performance-tests/Bench) collects latency metrics for dissemination of samples to one or more endpoints. The multicast transport was tested with both
reliable and best-effort delivery modes against the udp (UDP/IP) and SimpleTcp
(TCP/IP) transports. The following graph represents results collected when running the scaling test
between eight similarly configured hosts over a switched ethernet network:

It is important to note that the absolute numbers collected across these test runs should not be used as a basis of comparison. Instead, the relative differences between each data point should be used.
Compared against the unicast IP transports, udp is the only transport which performs
better than the multicast transport with respect to latency (notice that
latency increases linearly with the number of subscribing endpoints for unicast). The general overhead
of dispatching multicast data factors out once the number of subscribing processes are greater
than two. This is an indication that unicast transports should still be considered for fan-outs
of two or less on this particular hardware. An important point to note is the relative differences
between the reliable and best-effort delivery modes. Reliability does impose a small amount of overhead,
however said overhead does not increase linearly with respect to the number of subscribers as it does with unicast-based
transports.
The above results are a good indication of when IP multicast communication should be considered viable. Given a scenario that a publisher may potentially have several (i.e. greater than two) subscribing endpoints, IP multicast provides the lowest-latency when dispatching data. Conversely, IP multicast is unlikely to scale well in scenarios where a publisher only maintains one or two subscribers.
This article has provided a detailed look at the new IP multicast transport available in
OpenDDS version 2.1. The multicast transport provides an desirable solution for
those designers who wish to dispatch low-latency data reliably to one or more subscribers.
While IP multicast communication has a number of constraints to manage for reliability, we have shown
that reliable communication is still possible with a minimal amount of overhead - especially
as the number of subscribers scales. OpenDDS is professionally developed and commercially supported
by OCI, a full-service software engineering, open source product and training company.
multicast Design Documentation - $DDS_ROOT/docs/html/multicast/multicast.html$DDS_ROOT/performance-tests/Bench/doc/userguide.htmlObject Computing, Inc. (OCI) is the leading provider of object-oriented technology training in the Midwest. Thousands of students participate in our training program every year. Targeted toward software engineers and the development community, our extensive program of over 50 hands-on workshops is delivered to corporations and individuals throughout the U.S. and internationally. OCI's Education Services include Private Training, Public Training, and Lab Rentals. Visit www.ociweb.com/training or contact us at training@ociweb.com.
OCI offers downloads and commercial support for a variety of middleware technologies.
Copyright
©2010
Object Computing, Inc. All rights reserved.
OMG, CORBA, IIOP, and all OMG marks and logs are trademarks or registered
trademarks of Object Management Group, Inc. in the United States and/or
other countries.
Java and all
Java-based marks are trademarks or registered trademarks of Sun
Microsystems, Inc. in the United States and/or other countries.
.NET, C#, and .NET-based marks are trademarks or registered trademarks of Microsoft
Corporation in the United States and/or other countries.