|
Previous
|
Content |
Next
|
|
|
2.6.- RED queuing
discipline |
|
 |
|
| All queuing disciplines we have seen so far
(FIFO, PRIO, TBF, SFQ) and also HTB that we will see later, base their
queuing algorithm in the well-known FIFO queue. As you remember, in this
queuing discipline packets enter the queue at the tail and leave it from the
head. This, that could be considered something very simple and natural, pose
a serious problem to those flows based on the, de-facto standard, TCP
transport protocol. |
| Problems arise when queues overflow due to
bursty of packets. At this moment more packets can not be admitted and they
have to be dropped where they are entering, this means, at the tail of the
queue. That's the reason why all queuing discipline based on the FIFO queue
are known as DropTail queues. |
| |
| Mark Anthony Parris in his excellent work
"Class-Based Thresholds-Lightweight Active Router-Queue Management for
Multimedia Networking" [12], approaches this theme in a magisterial way.
Let's read from his work: |
| |
| Firstinfirstout, droptailwhenfull was
the original queue management scheme used in Internet routers. With this
scheme packets are enqueued at the tail of a queue as they arrive and
dequeued from the head of queue when there is capacity on the link.
Droptail is the policy of dropping the arriving packet when the queue is
full. (Other alternatives include dropping the packet at the head of the
queue.) Droptail and FIFO are used interchangeably in this dissertation.
|
| |
| Braden, et al., point out several problems
with simple droptailonfull and recommends that Internet routers employ
more sophisticated techniques for managing their queues [Braden98]. The two
major problems they identify are the problems of lockout and fullqueues.
|
| |
| Lockout refers to a phenomenon in which the
shared resource, link bandwidth, is unfairly consumed exclusively by a
small number of flows. The remaining flows are lockedout of (i.e., denied
access to) the queue and, consequently, locked out of the outbound link. In
this phenomenon the queue is occupied only by the packets from a small
number of flows while the packets associated with most flows are
consistently discarded. As a result, most flows receive none of the link
bandwidth, and starve. |
| |
| This phenomenon occurs because of timing
effects which result in some flows' packets always arriving to find the
queue full. For example, consider a situation where many sources are
periodically sending bursts of packets that in aggregate exceeds the queue's
capacity. If these sources become synchronized, all sending nearly
simultaneously, the first packets to arrive (e.g. from the source closer to
the bottleneck link) will find a queue with some available capacity while
the subsequent packets will be discarded. If the same relative order is
maintained between the sources, those sources that send first will
consistently make progress while the other flows will consistently have all
packets discarded and, thus, starve. |
|
|
|
|
| Fullqueues are queues that are usually
occupied to capacity. If bursts of packets arrive to a full queue, many
packets are dropped simultaneously. This can lead to large oscillations in
the network utilization. If the dropped packets are from different flows
there may be synchronized responses (backoff) among multiple flows.
Synchronized backoff is a phenomenon in which many sources simultaneously
receive congestion notification and reduce their generated load. As a
result, the overall load on the network may drop below the capacity of the
link and then rise back to exceed the link's capacity resulting in a full
queue and once again leading to simultaneous drops. This oscillating
behavior is exactly counter to the buffer's intended function, acting as a
smoothing filter. |
| Many authors have studied these problems. In
1993, Sally Floyd and Van Jacobson presented their paper "Random Early
Detection Gateways for Congestion Avoidance" [13] where the RED gateway was
outlined. The idea behind RED is to provide, as soon as is possible, a
feedback to responsive flows (like TCP) before the queue overflows in an
effort to indicate that congestion is inminent, instead of waiting until the
congestion has become excessive. Also, packet drops are distributed more
fairly across all flows. Let's see how Floyd & Jacobson [13] explain
guidelines for RED gateway design in their paper. Sorry, but this part is a
little long; also if it is necessary I will insert some minor comments: |
|
| This section summarizes some of the design
goals and guidelines for RED gateways. The main goal is to provide
congestion avoidance by controlling the average queue size. Additional goals
include the avoidance of global synchronization and of a bias against bursty
traffic and the ability to maintain an upper bound on the average queue size
even in the absence of cooperation from transport-layer protocols. |
| Okay, let's explain this a little. They try to
avoid congestion by controlling the average queue size at the gateway or
router. First problem when network get congested is that router's queue
overflows and begins to drop packets. Avoiding packet drops (being as
conservative as is possible) the global synchronization problem is minimized
(where several TCP connections reduce their throughput to a minimum at the
same time due, mainly, to a packet massacre at router queue). Also RED
gateway pretends to be as condescendent as possible with bursty traffic,
trying again to protect packet survival. Finally, by controlling the maximum
average queue size an indirect control is exercised over those 'bad citizen'
unresponsive flows, like UDP. |
| The first job of a congestion avoidance
mechanism at the gateway is to detect incipient congestion. As defined in
[18], a congestion avoidance scheme maintains the network in a region of low
delay and high throughput. The average queue size should be kept low, while
fluctuations in the actual queue size should be allowed to accommodate
bursty traffic and transient congestion. Because the gateway can monitor the
size of the queue over time, the gateway is the appropriate agent to detect
incipient congestion. Because the gateway has a unified view of the various
sources contributing to this congestion, the gateway is also the appropriate
agent to decide which sources to notify of this congestion. |
| In a network with connections with a range
of roundtrip times, throughput requirements, and delay sensitivities, the
gateway is the most appropriate agent to determine the size and duration of
short-lived bursts in queue size to be accommodated by the gateway. The
gateway can do this by controlling the time constants used by the low-pass
filter for computing the average queue size. The goal of the gateway is to
detect incipient congestion that has persisted for a "long time" (several
roundtrip times). |
| Again a little explanation. A low-pass filter
is a mechanism to soft or smooth the current aleatory behavior of the queue
length throughout the time. Have a look to figure 2.6.2 below where two
curves representing current queue size and average queue size (this obtained
applying a low-pass filter to the current queue size) are depicted. They use
what is called an Exponential Weighted Moving Average (EWMA); something
simple but great invented by a genious mathematician (I don't know who)
sometime ago. |
| The second job of a congestion avoidance
gateway is to decide which connections to notify of congestion at the
gateway. If congestion is detected before the gateway buffer is full, it is
not necessary for the gateway to drop packets to notify sources of
congestion. In this paper, we say that the gateway marks a packet, and
notifies the source to reduce the window for that connection. This marking
and notification can consist of dropping a packet, setting a bit in a packet
header, or some other method understood by the transport protocol. The
current feedback mechanism in TCP/IP networks is for the gateway to drop
packets, and the simulations of RED gateways in this paper use this
approach. |
| One goal is to avoid a bias against bursty
traffic. Networks contain connections with a range of burstiness, and
gateways such as Drop Tail and Random Drop gateways have a bias against
bursty traffic. With Drop Tail gateways, the more bursty the traffic from a
particular connection, the more likely it is that the gateway queue will
overflow when packets from that connection arrive at the gateway [7]. |
| Another goal in deciding which connections
to notify of congestion is to avoid the global synchronization that results
from notifying all connections to reduce their windows at the same time.
Global synchronization has been studied in networks with Drop Tail gateways
[37], and results in loss of throughput in the network. Synchronization as a
general network phenomena has been explored in [8]. |
| In order to avoid problems such as biases
against bursty traffic and global synchronization, congestion avoidance
gateways can use distinct algorithms for congestion detection and for
deciding which connections to notify of this congestion. The RED gateway
uses randomization in choosing which arriving packets to mark; with this
method, the probability of marking a packet from a particular connection is
roughly proportional to that connection's share of the bandwidth through the
gateway. This method can be efficiently implemented without maintaining
per-connection state at the gateway. |
| One goal for a congestion avoidance gateway
is the ability to control the average queue size even in the absence of
cooperating sources. This can be done if the gateway drops arriving packets
when the average queue size exceeds some maximum threshold (rather than
setting a bit in the packet header). This method could be used to control
the average queue size even if most connections last less than a roundtrip
time (as could occur with modified transport protocols in increasingly
high-speed networks), and even if connections fail to reduce their
throughput in response to marked or dropped packets. |
| Okay. These technological monsters, Floyd &
Jacobson, explain the problem and some outlines of how to approach it to be
resolved. The solution they propose is overwhelming. Reading again from
Floyd & Jacobson [13]: |
| This section describes the algorithm for RED
gateways. The RED gateway calculates the average queue size, using a
low-pass filter with an exponential weighted moving average. The average
queue size is compared to two thresholds, a minimum threshold and a maximum
threshold. When the average queue size is less than the minimum threshold,
no packets are marked. When the average queue size is greater than the
maximum threshold, every arriving packet is marked. If marked packets are in
fact dropped, or if all source nodes are cooperative, this ensures that the
average queue size does not significantly exceed the maximum threshold. |
| When the average queue size is between the
minimum and the maximum threshold, each arriving packet is marked with
probability pa, where pa is a function of the average queue size avg. Each
time that a packet is marked, the probability that a packet is marked from a
particular connection is roughly proportional to that connection's share of
the bandwidth at the gateway. The general RED gateway algorithm is given in
Figure 2.6.1. |
|

|
| Thus the RED gateway has two separate
algorithms. The algorithm for computing the average queue size determines
the degree of burstiness that will be allowed in the gateway queue. The
algorithm for calculating the packet-marking probability determines how
frequently the gateway marks packets, given the current level of congestion.
The goal is for the gateway to mark packets at fairly evenly-spaced
intervals, in order to avoid biases and to avoid global synchronization, and
to mark packets sufficiently frequently to control the average queue size. |
| These guys are not joking. RED queuing
discipline is a very ingenious algorithm. To understand this even better we
are going to present a figure created by using the ns-2 network simulator,
taken from NS by Example [14] and then retouched by me; the simulation has
to deal with a a link where a RED, instead of a DropTail, queuing discipline
is used. The figure represents the current queue length and the average
queue length against time of the RED gateway used in the link. It can help
us a lot. Have a look to figure 2.6.2 below. |
|

|
| The red curve is the current queue size of the
router; the green curve is the average queue size calculated using the EWMA
low-pass filter. The minimum threshold is 5 packets and the maximum
threshold is 12 packets. Observe that current queue has a very severe burst
of packets in the interval between 3 and 4.2 seconds, more or less. In fact,
the maximum threshold of 12 packets is violated. But the smoothed average
queue size calculated using the EWMA low-pass filter is a lot slower. It
follows the current queue length but like a turtle follows a rabbit. The
'drop probability' is selected based on the average queue size, not the
current queue size. But let's Parris [12] to explain this better for us; I
did some minor changes to adapt his explanation to our figure: |
|
| RED's packet dropping decisions are
mode-based. Figure 2.6.2 illustrates the ideas behind the RED algorithm.
This figure shows the instantaneous (red color) and weighted average (green
color) queue size (in packets) over time. |
| |
| The current mode, indicated on the right
hand side, is determined by the relation between the average queue size and
the minimum and maximum thresholds. When the average queue size is less than
the minimum threshold this indicates no congestion, so no action is taken.
This is no drop mode (yellow band on the figure) and the probability that an
arriving packet will be dropped is zero. In this mode arriving packets are
always enqueued. |
| |
| At the other extreme, when the average queue
size is greater than the maximum threshold, or if the queue is full, the
algorithm infers persistent and severe congestion. All arriving packets are
dropped. The probability an arriving packet will be dropped is one. This
mode is referred to as forced drop mode (red band on the figure). |
| |
| Finally, when the average queue size is
between the two thresholds the algorithm operates in a probabilistic (i.e.,
random) drop mode. In this mode, arriving packets are dropped at random. The
probability that a given packet will be dropped ranges between zero and one
as a function of three parameters: maxp, the current average queue size avg,
and count. The input parameter, maxp, determines the maximum probability
that two consecutive packets will be dropped while in probabilistic drop
mode. The variable, count, tracks how many packets have been enqueued since
the last packet was dropped. |
| |
| Very nice. In figure 2.6.1 above the RED
algorithm shows how the three parameters that Parris talked about are used
to implement the queuing discipline behavior. Observe also that even when
current queue size breaks the 12 packets maximum threshold when the burst of
packets arrive between seconds 3 and 4.2 aproximately, no packet is dropped
because the average queue size is below the maximum threshold. This way, the
RED gateway manages intelligently bursty traffic while controlling the
average queue size when the congestion becomes permanent. |
| |
| In their paper, Floyd & Jacobson [13] showed
that using RED gateways the network power is improved. Reading again from
them: |
| |
| Because RED gateways can control the average
queue size while accommodating transient congestion, RED gateways are
well-suited to provide high throughput and low average delay in high-speed
networks with TCP connections that have large windows. The RED gateway can
accommodate the short burst in the queue required by TCP's slow-start phase;
thus RED gateways control the average queue size while still allowing TCP
connections to smoothly open their windows. Figure 2.6.3 shows the results
of simulations of the network below with two TCP connections, each with a
maximum window of 240 packets, roughly equal to the delay-bandwidth product.
The two connections are started at slightly different times. The simulations
compare the performance of Drop Tail and of RED gateways. |
|
|
|
|
|

|
| In the figure the x-axis shows the total
throughput as a fraction of the maximum possible throughput on the congested
link. The y-axis shows the average queue size in packets (as seen by
arriving packets). Our simulator does not use the 4.3-Tahoe TCP code
directly but we believe it is functionally identical. |
| Five 5-second simulations were run for each
of 11 set of parameters for Drop Tail gateways, and for 11 sets of
parameters for RED gateways; each mark in the figure shows the result of one
of these five-second simulations. The simulations with Drop Tail gateways
were run with the buffer size ranging from 15 to 140 packets; as the buffer
size is increased, the throughput and the average queue size increase
correspondingly. |
| In order to avoid phase effects in the
simulations with Drop Tail gateways, the source node takes a random time
drawn from the uniform distribution on [0, t] seconds to prepare an FTP
packet for transmission, where t is the bottleneck service time of 0.17 ms.
[7]. |
| The simulations with RED gateways were all
run with a buffer size of 100 packets, with minth ranging from 3 to 50
packets. For the RED gateways, maxth is set to 3minth, with wq = 0.002 and
maxp = 1/50. |
| The dashed lines show the average delay (as
a function of throughput) approximated by 1.73/(1 - x) for the simulations
with RED gateways, and approximated by 0.1/(1 - x)^3 for the simulations
with Drop Tail gateways. For this simple network with TCP connections with
large windows, the network power (the ratio of throughput to delay) is
higher with RED gateways than with Drop Tail gateways. There are several
reasons for this difference. With Drop Tail gateways with a small maximum
queue, the queue drops packets while the TCP connection is in the slow-start
phase of rapidly increasing its window, reducing throughput. |
| On the other hand, with Drop Tail gateways
with a large maximum queue the average delay is unacceptably large. In
addition, Drop Tail gateways are more likely to drop packets from both
connections at the same time, resulting in global synchronization and a
further loss of throughput. |
| Okay, having the theoretical aspect of RED more
or less controlled let's see now how this queuing discipline is implemented
on Linux. Reading from the RED man page, we have: |
| Random Early Detection is a classless qdisc
which limits its queue size smartly. Regular queues simply drop packets from
the tail when they are full, which may not be the optimal behaviour. RED
also performs tail drop, but does so in a more gradual way. |
| Once the queue hits a certain average
length, packets enqueued have a configurable chance of being marked (which
may mean dropped). This chance increases linearly up to a point called the
max average queue length, although the queue might get bigger. |
| This has a host of benefits over simple
taildrop, while not being processor intensive. It prevents synchronous
retransmits after a burst in traffic, which cause further retransmits, etc. |
| The goal is to have a small queue size,
which is good for interactivity while not disturbing TCP/IP traffic with too
many sudden drops after a burst of traffic. |
| Depending on whether ECN is configured,
marking either means dropping or purely marking a packet as overlimit. |
| The average queue size is used for
determining the marking probability. This is calculated using an Exponential
Weighted Moving Average, which can be more or less sensitive to bursts. |
| When the average queue size is below min
bytes, no packet wil ever be marked. When it exceeds min, the probability of
doing so climbs linearly up to probability, until the average queue size
hits max bytes. Because probability is normally not set to 100%, the queue
size might conceivably rise above max bytes, so the limit parameter is
provided to set a hard maximum for the size of the queue. |
| Okay. Bert Hubert explanation just round and
confirm our understanding of this discipline. Next the parameters are
explained: |
min
| |
Average queue size at which marking becomes a
possibility. |
|
max
| |
At this average queue size, the marking probability
is maximal. Should be at least twice min
to prevent synchronous retransmits, higher for low min. |
|
probability
| |
Maximum probability for marking, specified as a
floating point number from 0.0 to 1.0.
Suggested values are 0.01 or 0.02 (1 or 2%, respectively). |
|
limit
| |
Hard limit on the real (not average) queue size in
bytes. Further packets are dropped.
Should be set higher than max+burst. It is advised to set this a few
times higher than max. |
|
burst
| |
Used for determining how fast the average queue size
is influenced by the real queue size.
Larger values make the calculation more sluggish, allowing longer bursts
of traffic
before marking starts. Real life experiments support the following
guideline:
(min+min+max)/(3*avpkt). |
|
avpkt
| |
Specified in bytes. Used with burst to determine the
time constant for average queue size
calculations. 1000 is a good value. |
|
bandwidth
| |
This rate is used for calculating the average queue
size after some idle time. Should be set
to the bandwidth of your interface. Does not mean that RED will shape
for you! Optional. |
|
ecn
| |
As mentioned before, RED can either 'mark' or 'drop'.
Explicit Congestion Notification
allows RED to notify remote hosts that their rate exceeds theamount of
bandwidth
available. Non-ECN capable hosts can only be notified by dropping a
packet. If this
parameter is specified, packets which indicate that their hosts honor
ECN will only be
marked and not dropped, unless the queue size hits limit bytes. Needs a
tc binary with RED
support compiled in. Recommended. |
|
| And finally, to finish this long part we have
to learn how to configure one of this RED beast using tc. First, a general
command: |
|

|
| Yellows indicate values we have to supply.
Let's see how: |
| <max> is maximum threshold. To set this value
we use link bandwidth and maximum desired latency. Assuming a bandwidth of
512 kbps and a maximum desired latency of 500 ms we have: |
| |
512 kbps ~ 512000 bps = 64000
bytes / sec
<max> = 64000 bytes / sec * 0.5 sec = 32000 bytes |
|
| Above <max> threshold we will have a packet
massacre and latency doesn't matter. |
| <min> is minimum threshold. Must be set half of
<max> threshold. Following Floyd & Jacobson, let's set <min> threshold such
that <max> ~ 3 * <min>. Then we will set <min> to 12000 bytes. |
| <limit> is hard limit on the real queue size.
This limit should never be reached. Following Lartc [7] recommendation let's
set this as 8 * <max>; then <limit> will be 256000 bytes. |
| <avpkt> is average packet size. We set this to
1000 bytes. |
| <burst> allows longer burst of traffic before
marking (dropping) starts; just to accommodate bursty flows. Following RED
man page recommendations we have: |
| |
<burst> = (2 * <min> + <max>) / (3 * <avpkt>)
<burst> = (2 * 12000 + 32000) / (3 * 1000) = 18.67 ~ 20. |
|
| <probability> is maximum probability of
marking; using same value used by Floyd & Jacobson we set this to 0.02. |
| <bandwidth> is link bandwidth for calculating
queue length when it's idle; then, bandwidth will be 512. |
| [ecn] is an optional parameter. If our end TCP
systems are configured to respond to 'early
congestion notification' you can use this flag to avoid packet dropping when
average queue size is above <min> threshold and
below <max> threshold. Above <max> threshold all packets are dropped; this
perhaps occurs when dealing with unresponsive flows. |
| |
| Well, we have all parameters completed; then
using tc we order: |
| |
|

|
| |
| or, |
| |
 |
| |
| if we have ecn responsive end-systems. |
| |
| A final word about RED. When implementing it to
make the work it was conceived to, this means, to feedback senders as soon
as possible about incipient congestion, configure it as a root queue. It
should be the queue faced to the sender. Don't configure it behind a real
killer, as CBQ for instance, because you are spoiling RED nature. |
| |
| And RED nature is very interesting for
implementing some other behaviors instead of incipient congestion
notification. Because it can drop packets at random based on a configurable
probability parameter, it is tailor-made for implementing Assure Forwarding
behavior. Do you remember? Same class, different drop precedences. Drop
precedences can be implemented using RED. It was Werner Almesberger idea
when they (Almesberger, Kuznetsov and Salim) created GRED, our next queuing
discipline, of course. |
|
|
|
|
|
|
|
|
|
Previous
|
Content |
Next
|