What Are STUN, TURN, and ICE?
We techies love our acronyms, but ICE, STUN, TURN, TURNS ... it can be a bit much. What are these things really? Why do they exist, and how are they used by LiveSwitch? The goal of this article is to demystify these technologies and their usage.
Ultimately, the goal of a real-time application is high throughput, low latency communication between various clients that may (or may not) be behind restrictive firewall rules. This implies the following preferred precedence of network communication protocols:
- UDP - Direct between source and target of media flow.
- UDP - Indirect (relayed) between source and target of media flow.
- TCP - Indirect (relayed) between source and target of media flow, via TCP.
- TCP/TLS - Indirect (relayed) between source and target of media flow, via TCP, with an extra layer of encryption.
Now that we know what we want, how does this relate to STUN, TURN, and ICE?
UDP - Direct Between Source and Target of Media Flow
For us to establish a UDP connection between two clients we need to be able to negotiate through a firewall. To do so, a given client must be able to target the other client's public IP address and negotiated port. However, most clients do not know their own public IP address. This is where STUN comes in. STUN is a protocol for the self-discovery of a client's public IP address from behind a NAT. A STUN server allows a computer on the internet to determine its own public-facing IP address. STUN uses UDP, and STUN servers typically listen for UDP requests on port 3478. For a client to use STUN, their network must allow UDP traffic.
STUN tells a client its public IP so that client can in turn communicate its IP to the other participating client. Assuming both clients can route to the other's discovered IP address and port directly, communication is established with direct UDP sockets. This routing capability is a big assumption because some firewall rules and/or network devices block direct connections. When this direct connection fails, we fall back to indirect (relayed) UDP.
UDP - Indirect (relayed) Between Source and Target of Media Flow
In the direct connection process (STUN), many firewall rules do not allow incoming traffic on the negotiated port. In that situation we have to introduce an IP and port that we know ahead of time is available. This is a TURN server. TURN is an extension of STUN, and as such, TURN servers also typically listen on port 3478. However, TURN provides STUN capability and more. TURN is a protocol for relaying media traffic through a service when a direct connection between two endpoints is not possible. TURN typically authorizes access to the server via username/password.
TURN's preferred mode of operation is to use UDP sockets. However, this assumes the target network is not actively blocking UDP sockets completely. This is another big assumption. When UDP TURN fails we fall back to TCP.
TCP - Indirect (relayed) Between Source and Target of Media Flow
At this point, UDP has failed. In this case, we're stuck using TCP, and to make our connection most likely to succeed, we need to pretend our data is standard web traffic. Therefore, we'll wrap it up in a standard TCP packet and relay this using our TCP TURN server. This is typically done on port 80 since this is the standard port for web traffic.
TURN over TCP works in many restrictive environments, but we are making one more assumption. We are assuming that the firewall itself does not inspect packets to ensure that the data is actually web traffic. Our third big assumption. When this fails we fall back to TCP/TLS.
TCP/TLS - Indirect (relayed) Between Source and Target of Media Flow
If we have reached this point the network is very restrictive. This is typical only for large corporate networks, banks, or hospitals. In this scenario we wrap the TCP data in a secure TCP socket, initiating the connection with an HTTPS handshake, so the firewall is unable to distinguish this traffic from any other web traffic other than by using heuristics (such as data volume) or man-in-the-middle via proxy. We have never seen this fail in practice.
The STUN, TURN, and TURNS Protocols
The protocols used to realize connections over the network communication protocols we have outlined above are STUN, TURN, and TURNS. We have already hinted at this, but let's define these explicitly:
- Session Traversal Utilities for NAT (STUN) - Used to establish a direct UDP connection between two clients.
- Traversal Using Relay around NAT (TURN) - Used to establish a relayed UDP or TCP connection between two clients. Here, the traffic must be relayed through the TURN server to bypass restrictive firewall rules, and the preference is UDP over TCP because TCP's guaranteed ordered delivery of packets implies overhead that is undesirable for real-time communications.
- Secure Traversal Using Relay around NAT (TURNS) - Used to establish a relayed TCP/TLS connection between two clients. Here, the traffic must be relayed through the TURN server and through a TLS socket to bypass extremely restrictive firewall rules.
At this point, you understand the preferred methods and technologies for routing media traffic between clients.
So how does all this apply to my app? Inside LiveSwitch, we utilize the ICE protocol to manage STUN, TURN, and TURNS. Read on to learn how ICE accomplishes this.
ICE in a Nutshell
Interactive Connectivity Establishment (ICE) is a standard for using STUN and TURN to establish connectivity between two endpoints. ICE takes all of the complexity implied in the discussion above, and coordinates the management of STUN, TURN, and TURNS to a) optimize the likelihood of connection establishment, and b) ensure that precedence is given to preferred network communication protocols.
To understand ICE you must understand "candidates," how they are gathered, and how they are used to establish connectivity between two peers. A candidate is an IP address and a port. These candidates are "gathered" by an implementation of the ICE protocol, and iterated over to find candidates that are "routable" - that is, candidates between which clients can route media packets.
There are four types of candidates:
- "host" candidates - Gathered directly from the local network adapter, host candidates are the internal IP address and port of a computer on a LAN. They can only route between peers on the same subnet.
- "srflx" candidates - Gathered via STUN, srflx, or server reflexive candidates, consisting of the public IP address and negotiated port of the local peer. When signaled to a remote peer they can be used to route traffic over the internet providing that traffic is not blocked by firewall rules.
- "prflx" candidates - Peer reflexive candidates are simply a variation on server reflexive candidates. In this case, these candidates are gathered directly by peers after they have established a connection. A peer reflexive candidate could be used after connectivity has been established if ongoing connectivity checks determine the candidate is routable, and connectivity on the currently active candidate is failing.
- "relay" candidates - Gathered via TURN, relay candidates consist of the public IP address and a negotiated port for the relay server. This is useful when a firewall does not allow direct routing via srflx candidates. In this case, the remote peer has signaled a candidate it can use to route traffic to the relay server, which can then in turn relay that traffic to the peer behind the restrictive firewall. The firewall allows the TURN server to route traffic to the peer because the peer made the initial request to the server. In this way, the TURN server acts as a "man-in-the-middle" that the initiating peer uses to circumvent its firewall. From the discussion above, you know that communication over relay candidates is the least desirable option.
Now that we know what candidates are, and understand the different types, it's obvious that for real-time communication candidate preference is host > srflx/prflx > relay. Let's look at how ICE uses candidates to ensure the best possible routes for media flow while maximizing the chances that connectivity is possible at all. Here is a simplified version of how it works:
- The ICE layer of the LiveSwitch Client SDK gathers all types of candidates that it can. If STUN, TURN, and TURNS services are available, all candidate types are gathered for both UDP and TCP, if at all possible, at all times.
- Candidates are ranked with host > srflx/prflx > relay and UDP > TCP.
- Candidates are tested in order of rank. The first to establish connectivity is the winner and becomes the "active" candidate.
- Ongoing connectivity checks are performed on candidates. If the active candidate fails connectivity checks then a different candidate is used. Likewise, if a higher ranking candidate has connectivity checks succeed, then it becomes the active candidate.
Understanding how clients use ICE to establish and maintain connectivity is pretty cool, and the best part is that by offering embedded STUN/TURN capability directly in the Media Server, LiveSwitch manages all of this for you.
Provide Credentials for Embedded TURN
As discussed above TURN allows for authenticated access to the server. But, you don't want just anyone relaying traffic through your relay service. LiveSwitch's embedded TURN uses username/password authentication, but again, LiveSwitch manages this for you using usernames and passwords that are encrypted and temporary. You do not need to configure anything, and you do not need to worry about vending out TURN credentials to authorized clients.