|
Content |
|
|
|
|
RFC 2914 - Congestion Control Principle |
|
 |
|
|
RFC 2914: Congestion Control Principle was written by
Sally Floyd in September 2000.
Here is a brief summary of this document: |
|
|
Dear lector: my notes are in cursive; also, I underlined some
interesting issues. LB. |
|
This document is a general discussion of the principles of congestion
control. One of the keys to the success of the Internet has been
the congestion avoidance mechanisms of TCP. While TCP is still
the dominant transport protocol in the Internet, it is not
ubiquitous, and there are an increasing number of applications that,
for one reason or another, choose not to use TCP. Such traffic
includes not only multicast traffic, but unicast traffic such as streaming
multimedia that does not require reliability; and traffic such as DNS
or routing messages that consist of short transfers deemed critical to the
operation of the network. Much of this traffic does not use any form of
either bandwidth reservations or end-to-end congestion control. The
continued use of end-to-end congestion control by best-effort traffic
is critical for maintaining the stability of the Internet. |
| Preventing congestion collapse |
| The Internet protocol architecture is
based on a connectionless end-to-end packet service using the IP
protocol. The advantages of its connectionless design, flexibility and
robustness, have been amply demonstrated. However, these advantages are not
without cost: careful design is required to provide good service under heavy
load. In fact, lack of attention to the dynamics of packet forwarding can
result in severe service degradation or "Internet meltdown". This
phenomenon was first observed during the early growth phase of the Internet
of the mid 1980s [RFC896], and is technically called "congestion
collapse". |
| The original fix for Internet meltdown
was provided by Van Jacobson. In 1988, he developed the congestion
avoidance mechanisms that are now required in TCP implementations [RFC
2581]. These mechanisms operate in the hosts to cause TCP
connections to "back off" during congestion. We say that TCP
flows are "responsive" to congestion signals (i.e., dropped packets)
from the network. It is these TCP congestion avoidance algorithms
that prevent the congestion collapse of today's Internet. |
|
|
|
People who are only interested in having
their applications flowing AS FAST AS POSSIBLE throughout the Internet,
specially those designing real-time applications, should have the above
underlined information always in mind. It is impossible to have a health
Internet without the collaborative effort of everyone. Applications using
UDP as the transport protocol constitute really a serious problem to the
Internet well behavior. They have to continue researching for trying
their application to use some responsive transport, as
DCCP is. On the contrary some admission control mechanism for these
applications will be necessary to adopt in the near future. L.B. |
|
| However, that is not the end of the story.
Considerable research has been done on Internet dynamics since 1988,
and the Internet has grown. It has become clear that the TCP
congestion avoidance mechanisms, while necessary and powerful, are
not sufficient to provide good service in all circumstances. In addition
to the development of new congestion control mechanisms [RFC2357],
router-based mechanisms are in development (have a
look to RFC 2309; LB) that complement the endpoint congestion
avoidance mechanisms. |
| Fairness |
| In addition to a concern about congestion
collapse, there is a concern about `fairness' for best-effort
traffic. Because TCP "backs off" during congestion, a large
number of TCP connections can share a single, congested link in such
a way that bandwidth is shared reasonably equitably among similarly situated
flows. The equitable sharing of bandwidth among flows depends on the
fact that all flows are running compatible congestion control algorithms.
For TCP, this
means congestion control algorithms conformant with the current TCP
specification [RFC793, RFC1122, RFC2581]. |
| The popularity of the Internet has
caused a proliferation in the number of TCP implementations. Some of
these may fail to implement the TCP congestion avoidance mechanisms
correctly because of poor implementation [RFC2525]. Others may
deliberately be implemented with congestion avoidance algorithms that are
more aggressive in their use of bandwidth than other TCP
implementations; this would allow a vendor to claim to have a "faster TCP".
The logical consequence of such implementations would be a spiral of
increasingly aggressive TCP implementations, or increasingly
aggressive transport protocols, leading back to the point where there is
effectively no congestion avoidance and the Internet is chronically
congested. |
| There is a well-known way to achieve more
aggressive performance without even changing the transport protocol, by
changing the level of granularity: open multiple connections to the same
place, as has been done in the past by some Web browsers. Thus, instead
of a spiral of increasingly aggressive transport protocols, we would instead
have a spiral of increasingly aggressive web browsers, or increasingly
aggressive applications. |
| This raises the issue of the appropriate
granularity of a "flow", where we define a `flow' as the level
of granularity appropriate for the application of both fairness and
congestion control. From RFC 2309: "There are a few `natural'
answers: 1) a TCP or UDP connection (source address/port,
destination address/port); 2) a source/destination host pair;
3) a given source host or a given destination host. We would guess
that the source/destination host pair gives the most appropriate
granularity in many circumstances. The granularity of flows for
congestion management is, at least in part, a policy question that needs to
be addressed in the wider IETF community." |
| When Sally Floyd wrote this RFC in 2000, perhaps
she didn't think about something even worst that multiple connection web
browsers. Now we have what are called P2P applications, like Kazaa, E-mule
and all similar implementations. These applications are increasing, very
fast, the Internet congestion problem, with a new kind of congestion
provoked by "excessive" multiple connections. In the traditional
client-server scheme you can have, for example, 100 servers and 100000
clients. This lead up to 100 x 100000 = 10000000 possible connections, when
you have just one connection between each client-server pair. Assuming 2
possible connections between each client-server pair (as RFC 2616 states), you could have up to
20000000 connections (twenty millions of connections). But P2P applications,
trying to make fun of the intelectual property laws, opened the direct
connection between clients, where everyone can sniff in some specific
directory in the personal PC of everyone, looking for music, books,
software, and games. Now the original estimated connections based on a
lineary function of N (being N the number of hosts connected to the
Internet), has become in a function based on a quadratic function of N. A
lot, really a big new lot of connections, exhausting the Internet state
routers' resource. When the state router's resource to accept new
connections is exhausted, the router simply does not accept more
connections, creating a new and really problematic (DOS) form of congestion.
L.B. |
It is convenient to divide flows into three
classes:
| |
- TCP-compatible flows, i.e., a flow that behaves under
congestion like a flow produced by a conformant TCP. A
TCP-compatible flow is responsive to congestion notification, and
in steady-state uses no more bandwidth than a conformant TCP
running under comparable conditions (drop rate, RTT, MTU, etc.).
- Unresponsive flows, i.e., flows that do not slow down when
congestion occurs.
- Flows
that are responsive but are not TCP-compatible.
|
|
|
| The last two classes
contain more aggressive flows that pose significant threats to Internet
performance, as we discuss below. |
| In addition to steady-state fairness, the
fairness of the initial slow-start is also a concern. One concern is
the transient effect on other flows of a flow with an overly-aggressive
slow-start procedure. Slow-start performance is particularly
important for the many flows that are short-lived, and only have a small
amount of data to transfer. |
| Amother reason for a flow to use end-to-end
congestion control can be to optimize its own performance regarding
throughput, delay, and loss. In some circumstances,
for example in environments of high statistical multiplexing, the delay
and loss rate experienced by a flow are largely independent of its own
sending rate. Thus, a flow can use end-to-end congestion control to
limit the delay or loss experienced by its own packets. We would note,
however, that in an environment like the current best-effort Internet,
concerns regarding congestion collapse and fairness with competing flows
limit the range of congestion control behaviors available to a flow. |
| The role of the standards process |
| The standardization of a transport protocol
includes not only standardization of aspects of the protocol that could
affect interoperability, but also standardization of mechanisms deemed
critical to performance. At the same time, implementation-specific details
and other aspects of the transport protocol that do not affect
interoperability and do not significantly interfere with performance do not
require standardization. |
| In addition to addressing the danger of
congestion collapse, the standardization process for new transport protocols
takes care to avoid a congestion control `arms race' among competing
protocols. As an example, from RFC2357 we have: "A particular concern
for the IETF is the impact of reliable multicast traffic on
other traffic in the Internet in times of congestion, in particular the
effect of reliable multicast traffic on competing TCP traffic....
The challenge to the IETF is to encourage research and
implementations of reliable multicast, and to enable the needs of
applications for reliable multicast to be met as expeditiously as
possible, while at the same time protecting the Internet from the
congestion disaster or collapse that could result from the widespread
use of applications with inappropriate reliable multicast
mechanisms." |
| It is reasonable to expect that these concerns
about the effect of new transport protocols on competing traffic will apply
not only to reliable multicast protocols, but to unreliable
unicast, reliable unicast, and unreliable multicast traffic
as well. |
| The specific issue of a browser opening
multiple connections to the same destination has been addressed by
RFC 2616, which states that "Clients that use persistent connections
SHOULD limit the number of simultaneous connections that they maintain
to a given server. A single-user client SHOULD NOT maintain more than 2
connections with any server or proxy." |
| Observe that this issue does not take into
account the problem of the same single-user client maintaining 1
connection with many servers or proxies, as FastTrack protocol does
(protocol used by Kazaa and related). L.B. |
| New developments in the standards process |
| The most obvious developments in the IETF
that could affect the evolution of congestion control are the development of
integrated and differentiated services [RFC2212, RFC2475]
and of Explicit Congestion Notification (ECN) [RFC2481].
However, other less dramatic developments are likely to affect congestion
control as well. |
| In this site you can find a very complete
document about Differentiated Services LB. |
| One such effort is that to construct
Endpoint Congestion Management, to enable multiple concurrent flows from
a sender to the same receiver to share congestion control state. By allowing
multiple connections to the same destination to act as one flow in terms of
end-to-end congestion control, a Congestion Manager could
allow individual connections slow-starting to take advantage of
previous information about the congestion state of the end-to-end
path. Further, the use of a Congestion Manager could remove the
congestion control dangers of multiple flows being opened between the same
source/destination pair, and could perhaps be used to allow a browser to
open many simultaneous connections to the same destination. |
| A description of congestion collapse |
| Informally, congestion collapse occurs
when an increase in the network load results in a decrease in the useful
work done by the network. Congestion collapse was first reported in
the mid 1980s and was largely due to TCP connections unnecessarily
retransmitting packets that were either in transit or had already been
received at the receiver. We call the congestion collapse that results
from the unnecessary retransmission of packets classical congestion
collapse. Classical congestion collapse is a stable condition
that can result in throughput that is a small fraction of normal. Problems
with classical congestion collapse have generally been corrected by the
timer improvements and congestion control mechanisms in modern
implementations of TCP. |
| A second form of potential congestion
collapse occurs due to undelivered packets. Congestion
collapse from undelivered packets arises when bandwidth is wasted by
delivering packets through the network that are dropped before reaching
their ultimate destination. This is probably the largest unresolved
danger with respect to congestion collapse in the Internet today.
Different scenarios can result in different degrees of congestion collapse,
in terms of the fraction of the congested links' bandwidth used for
productive work. The danger of congestion collapse from undelivered packets
is due primarily to the increasing deployment of open-loop applications
not using end-to-end congestion control. Even more destructive
would be best-effort applications that *increase* their
sending rate in
response to an increased packet drop rate. |
| Table 1 (below) gives the results from a
scenario with congestion collapse from undelivered packets, where
scarce bandwidth is wasted by packets that never reach their destination.
The simulation uses a scenario with three TCP flows and one UDP
flow competing over a congested 1.5 Mbps link. The access links for
all nodes are 10 Mbps, except that the access link to the receiver of
the UDP flow is 128 Kbps, only 9% of the bandwidth of
shared link. When the UDP source rate exceeds 128 Kbps, most
of the UDP packets will be dropped at the output port to that final
link. |
|

|
| Table 1 shows the UDP arrival rate
from the sender, the UDP goodput (defined as the bandwidth delivered
to the receiver), the TCP goodput (as delivered to the TCP
receivers), and the aggregate goodput on the congested 1.5 Mbps
link. Each rate is given as a fraction of the bandwidth of the congested
link. As the UDP source rate increases, the TCP goodput decreases roughly linearly, and the UDP goodput
is nearly constant. Thus, as the UDP flow increases its offered load,
its only effect is to hurt the TCP and aggregate goodput. On
the congested link, the UDP flow ultimately `wastes' the
bandwidth that could have been used by the TCP flow, and reduces the
goodput in the
network as a whole down to a small fraction of the bandwidth of the
congested link. This simulation illustrates both unfairness and congestion
collapse. |
| There are only two alternatives
for eliminating the danger of congestion collapse from undelivered
packets. The first alternative is the use of effective end-to-end
congestion control by the end nodes. More specifically, the requirement
would be that a flow avoid a pattern of significant losses at links
downstream from the first congested link on the path. Given that an
end-node is generally unable to distinguish between a path with one
congested link or multiple congested links, the most reliable way to do this
is for the flow to use end-to-end congestion control, and reduce
its sending rate in the presence of loss. |
| A second alternative would be a
guarantee by the network that packets accepted at a congested link in the
network will be delivered all the way to the receiver [RFC2212, RFC2475].
We note that the choice between the first and the second alternative does
not have to be an either/or decision; congestion collapse can be
prevented by the use of effective end-to-end congestion by some of
the traffic, and the use of end-to-end bandwidth guarantees from the
network for the rest of the traffic. |
| I was reading in the IEFT's
DCCP workgroup list (DCCP is a new protocol being
designed by IETF to replace UDP), a very interesting discussion about
whether real-time traffic can really be transported by a protocol having
end-to-end congestion control. Some people say definitively no. They
complaint about the Additive-Increase Multiplicative-Decrease (AIMD)
behavoir of the DCCP protocol (similar to TCP). Their applications require a
protocol having a constant rate behavior to work well (as UDP does). To have
a constant rate behavior the protocol has to ignore congestion problems in
the network and maintains its rate of packet delivery independently if those
packets are dropped later or not. This, in fact, is the seed of the
congestion collapse from undelivered packets. Research has to be advanced to
allow real-time traffic to be transported succesffully over protocols having
end-to-end congestion control. This is the only way to guarantee the
Internet health. If this is not possible, the last alternative is to apply
admission control to traffic being delivered using protocols not having
end-to-end congestion control. This decision has been postponed because
real-time traffic is yet negligible when compared with TCP traffic in the
Internet. But this situation is changing very fast. LB. |
|
| End-to-end congestion control
for avoiding congestion collapse |
| The avoidance of congestion collapse from undelivered
packets requires that flows avoid a scenario of a high sending
rate, multiple congested links, and a persistent high packet drop rate
at the downstream link. Because this congestion consists of packets that
waste valuable bandwidth only to be dropped downstream, it is not
possible in an environment where each flow traverses only one congested
link, or where only a small number of packets are dropped at links
downstream of the first congested link. Thus, any form of congestion
control that successfully avoids a high sending rate in the presence
of a high packet drop rate should be sufficient to avoid
congestion collapse from undelivered packets. |
|
|
|
| We would note that the addition of
Explicit Congestion Notification (ECN) to the IP
architecture would not, in and of itself, remove the danger of congestion
collapse for best-effort traffic. ECN allows routers to set a
bit in packet headers as an indication of congestion to the end-nodes,
rather than being forced to rely on packet drops to indicate congestion.
However, with ECN, packet-marking would replace
packet-dropping only in times of moderate congestion. In particular,
when congestion is heavy, and a router's buffers overflow, the router has no
choice but to drop arriving packets. |
| End-to-end congestion control
for fairness with TCP |
| An environment with per-flow
scheduling at all congested links would isolate flows from each other,
and eliminate the need for congestion control mechanisms to be
TCP-compatible. An environment with differentiated services,
where flows marked as belonging to a certain diff-serv class would be
scheduled in isolation from best-effort traffic, could allow the
emergence of an entire diff-serv class of traffic where congestion
control was not required to be TCP-compatible. Similarly, a
pricing-controlled environment, or a diff-serv class with its own
pricing paradigm, could supercede the concern about fairness with
TCP. However, for the current Internet environment, where other
best-effort traffic could compete in a FIFO queue with TCP
traffic, the absence of fairness with TCP could lead to one flow `starving
out' another flow in a time of high congestion, as was illustrated in
Table 1 above. |
| However, the list of
TCP-compatible congestion control procedures is not limited to AIMD
with the same increase/decrease parameters as TCP. Other
TCP-compatible congestion control procedures include rate-based
variants of AIMD; AIMD with different sets of
increase/decrease parameters that give the same steady-state
behavior; equation-based congestion control where the sender adjusts
its sending rate in response to information about the long-term packet
drop rate; layered multicast where receivers subscribe and
unsubscribe from layered multicast groups; and possibly other forms
that we have not yet begun to consider. |
| Slow-start |
| One TCP sender can not
open a new connection by sending a large burst of data all at
once. The TCP sender is limited by a small initial value for
the congestion window. During slow-start, the TCP
sender can increase its sending rate by at most a factor of two in
one roundtrip time. Slow-start ends when congestion is
detected, or when the sender's congestion window is greater than the
slow-start threshold ssthresh. |
| An issue that potentially affects
global congestion control, and therefore has been explicitly addressed
in the standards process, includes an increase in the value of the
initial window [RFC2414, RFC2581].
Issues that have not been addressed in the standards process, and are
generally considered not to require standardization, include the use (or
non-use) of rate-based pacing, and mechanisms for ending
slow-start early, before the congestion window reaches ssthresh.
Such mechanisms result in slow-start behavior that is as
conservative or more conservative than standard TCP. |
| Additive Increase,
Multiplicative Decrease (AIMD) |
| In the absence of congestion, the
TCP sender increases its congestion window by at most one
packet per roundtrip time. In response to a congestion indication,
the TCP sender decreases its congestion window by half. (More
precisely, the new congestion window is half of the minimum of the
congestion window and the receiver's advertised window). |
| An issue that potentially affects
global congestion control, and therefore would be likely to be
explicitly addressed in the standards process, would include a proposed
addition of congestion control for the return stream of `pure acks'.
An issue that is generally not considered to require standardization, would
be a change to the congestion window to apply as an upper bound on
the number of bytes presumed to be in the pipe, instead of applying as a
sliding window starting from the cumulative acknowledgement.
(Clearly, the receiver's advertised window applies as a sliding
window starting from the cumulative acknowledgement field,
because packets received above the cumulative acknowledgement field
are held in TCP's receive buffer, and have not been delivered to
the application. However, the congestion window applies to the
number of packets outstanding in the pipe, and does not necessarily have
to include packets that have been received out-of-order by the TCP
receiver). |
| Retransmit timers |
| The TCP sender sets a
retransmit timer to infer that a packet has been dropped in the
network. When the retransmit timer expires, the sender infers that a
packet has been lost, sets ssthresh to half of the current window,
and goes into slow-start, retransmitting the lost packet. If the
retransmit timer expires because no acknowledgement has been
received for a retransmitted packet, the retransmit timer is also "backed-off",
doubling the value of the next retransmit timeout interval. |
| An issue that potentially affects
global congestion control, and therefore would be likely to be
explicitly addressed in the standards process, might include a modified
mechanism for setting the retransmit timer that could
significantly increase the number of retransmit timers that expire
prematurely, when the acknowledgement has not yet arrived at the
sender, but in fact no packets have been dropped. This could be of
concern to the Internet standards process because retransmit
timers that expire prematurely could lead to an increase in the
number of packets unnecessarily transmitted on a congested link. |
| Fast Retransmit and Fast
Recovery |
| After seeing three duplicate
acknowledgements, the TCP sender infers a packet loss. The TCP
sender sets ssthresh to half of the current window, reduces the
congestion window to at most half of the previous window, and
retransmits the lost packet. |
| An issue that potentially affects
global congestion control, and therefore would be likely to be
explicitly addressed in the standards process, might include a proposal for
inferring a lost packet after only one or two duplicate
acknowledgements. If poorly designed, such a proposal could lead to
an increase in the number of packets unnecessarily transmitted on a
congested path. |
| An issue that would not be expected
to require standardization, would be a proposal to send a "new" or
presumed-lost packet in response to a duplicate or partial
acknowledgement, if allowed by the congestion window. An example of
this would be sending a new packet in response to a single duplicate
acknowledgement, to keep the `ack clock' going in case no further
acknowledgements would have arrived. Such a proposal is an example of
a beneficial change that does not involve interoperability and does
not affect global congestion control, and that therefore could be
implemented by vendors without requiring the intervention of the IETF
standards process. |
| Other aspects of TCP congestion
control |
| Other aspects of TCP congestion
control that have not been discussed in any of the sections above
include TCP's recovery from an idle or application-limited
period. |
| Security Considerations |
| Because this document does not
propose any specific congestion control mechanisms, it is also not
necessary to present specific security measures associated with congestion
control. However, we would note that there are a range of security
considerations associated with congestion control that should be considered
in IETF documents. |
| For example, individual
congestion control mechanisms should be as robust as possible to
the attempts of individual end-nodes to subvert end-to-end
congestion control. This is a particular concern in multicast
congestion control, because of the far-reaching distribution of
the traffic and the greater opportunities for individual receivers to fail
to report congestion. |
|
RFC 2309 also discussed the potential
dangers to the Internet of unresponsive flows, that is, flows
that don't reduce their sending rate in the presence of congestion,
and describes the need for mechanisms in the network to deal with flows that
are unresponsive to congestion notification. We would note that there is
still a need for research, engineering, measurement, and deployment in these
areas. |
|
Because the Internet aggregates very large numbers of flows, the risk
to the whole infrastructure of subverting the congestion control of a few
individual flows is limited. Rather, the risk to the infrastructure would
come from the widespread deployment of many end-nodes subverting
end-to-end congestion control. |
|
|
|
|
|
Content |
|
|