|
Previous
|
Content
|
Next
|
|
|
3.0.-
New Congestion Avoidance algorithm |
|
 |
|
|
In april 1990, through an e-mail, Van Jacobson
refined the congestion avoidance algorithm. Next is a summary of its
document: |
|
Congestion avoidance algorithm is intended to do two things: |
- Prevent the pipe from going empty after a loss (if the pipe doesn't go
empty, you won't have to waste round-trip times refilling it).
- Correctly account for the amount of data actually in the pipe (since
that's what congestion avoidance is supposed to be estimating and adapting
to).
|
|
For (1), remember that a packet loss is a signal that the pipe is congested
and a packet loss can be detected of one of two different ways:
|
- via a retransmit timeout.
- when some small number (3-4) of consecutive duplicate acks has been
received (also called duplicate ack threshold).
|
|
In case a.-, the pipe is guarantee to be empty, so we want slow-start.
|
|
In case b.-, if the duplicate ack threshold is small compared to the
bandwidth-delay product (bdp), we have detected the loss with the
pipe almost full, i.e., given a dup-ack threshold of 3 packets (3 dup-ack
infering a packet loss) and a bdp of 16 packets (24KB assuming 1500
bytes MTU), the pipe is 75% full when loss is detected. As soon as the
bdp is higher, the pipe will be almost full when loss is detected
(maintaining the same dup-ack threshold of 3 packets). Since the pipe is
(almost) full, there's no need to slow-start after the fast-retransmit. |
|
For (2), a duplicate ack means that the receiver got an out-of-order packet,
been the usual case of this a missing packet. i.e., if there are W
packets in transit (W=window size) and one is dropped, the receiver
will get W-1 out-of-order packets and will generate W-1
duplicate acks. If the dup-ack threshold is set high enough (i.e. 3
packets), we can reasonably assume that reaching the dup-ack threshold means
a dropped packet.
|
|
But there's more information in the ack: receiver can only generate an ack
in response to a packet arrival, i.e., a duplicate ack means that a packet
has left the network; if the sender is limitted by the congestion window, a
packet can now be sent (the congestion window is a count of how many
packets will fit in the pipe; the ack says a packet has left the pipe so a
new one can be added to take its place). So, conceptually, the new
algorithm will be: |
- When the dupack threshold is reached, set ssthresh to ˝ the
current congestion window cwnd and retransmit the missing packet.
- Set now cwnd to ssthresh (original
*Tahoe* specification used to set cwnd to one segment to force a
slow-start).
- Change now the sender's output routine to use an effective window of:
| |
min(snd_wnd, cwnd + dupacks * mss) |
|
|
Van Jacobson tried to explain congestion avoidance algorithm behavior
using what is called some algebra in these terms: |
|
When packet U drop is detected we have: |
- U = first un-acked packet.
- W = window size (in packets).
- Then packets U..U+W are in transit.
- When loss is detected we send packet U and pull the window back
to ˝ W.
- In the round-trip time it takes the U retransmit to fill the
receiver's hole and an ack to get back, W-1 dupacks will arrive
(one for each packet in transit).
|
|
|
|
|
The window is effectively inflated by one packet for each of these acks,
so packets [U..U+W/2+W-1] are sent.
|
|
But, we don't re-send packets when we know they've been lost, so the amount
actually sent between the loss detection and the recovery ack is U+W/2+W-1 -
(U+W) = W/2-1, which is exactly the amount congestion avoidance allows us to
send (if we add in the retransmit of U). The recovery ack is for packet
U+W
so when the effective window is pulled back from W/2+W-1 to W/2 (when
happens because the recovery ack is "new" and sets dupack to zero), we are
allowed to send up to packet U+W+W/2, which is exactly the first packet we
haven't yet sent (i.e., there is no sudden burst of packets as the "hole" is
filled). |
|
Also, when sending packets between the loss detection and the recovery ack,
we do nothing for the first W/2 dup acks (because they only allow us to send
packets we've already sent) and the bottleneck gateway is given W/2 packet
times to clean out its backlog. Thus, when we start sending our W/2-1 new
packets, the bottleneck queue is as empty as it can be. |
|
The explanation is not easy to understand and how himself says: "I don't
know if you can get the flavor of what happens from the description - it's
hard to see without a picture. But I was delighted by how beautiful it
worked…" |
|
I prepared a table for trying to explain what Van Jacobson was
talking about; let's comment this table (LB): |
|

|
|
Cwnd in the first column is the congestion window. Initially it
is set to 1. Outs is the number of outstanding packets traveling through the network.
Initially they are 0. Allows is the number of packets that can be sent by
TCP. Because the initial congestion window is 1 and there are no packets in
the network, we can send one packet. The first packet sent is packet number
0; column Sent tells us the number of the packets that are sent. Initially
we send the packet number 0. |
|
Column #-sent tells us how many packets are sent for this time. Because
only packet number 0 is sent we write here 1 packet. Acked is the number of
the packets acknowledged by the receiver. The packet number 0 reaches
successfully the receiver end and it is acknowledged as well. #-acked
indicates the number of packets acknowledged; this case 1 packet has been
acknowledged. |
|
Because one packet is successfully acknowledged the congestion window is
increased by the number of packets being acknowledged; the new congestion
window is 2 and it is represented in the table below the second column
titled Cwnd. |
|
Ssthresh is the slow start threshold; initially it is set to 64 packets.
The last column, Algo, represents the TCP algorithm being used this time.
For this moment TCP is using the Slow Start Algorithm represented by the
acronym SS. |
|
The second line starts with a congestion window of 2 packets; there are
no packets in the network, then, we can send two new packets. Packets 1 and
2 are sent. They reach again successfully the other end and are acknowledged
back. The congestion window is increased by the 2 packets recently
acknowledged and the new congestion windows is 4 packets. Ssthresh and Algo
remain the same. |
|
This process is repeated for the line number 3 but in line number 4 we
begin to have problems. For this line the congestion window is 8 packets and
we don't have packets traveling through the network; then we can send 8
packets and we do this. Packets 7, 8, 9, .... 14 are sent (8 packets);
unfortunately packet 14 is lost in the path; some router have a lunch with
it. Only packets from number 7 to 13 can be acknowledged (7 packets); the
new congestion window is now 15 packets (8+7). |
|
Now we are in line 5. Congestion window is 15 packets but there is one
packet that has not been acknowledged yet (packet number 14); then, for TCP,
there is one packet traveling through the network. TCP really doesn't know
yet that the packet is being digested by one bad router at this moment, and
it supposes that it is somewhere in the network having a great time. Because
the congestion window is 15 packets and there is one packet outstanding, then we can
send 14 packets and we exactly do this. Packets 15, 16, 17, ... 28 (14
packets) are sent. |
|
When the packet number 15 reaches the receiver end, this, having received
a new packet will have to send one acknowledgement. But, my god, tells it,
packet 14 has not been received yet, then it has to acknowledge only its
last packet successfully received, and this is packet number 13 (bad lucky
number, isn't it?). For acknowledging packet number 13 for the second time
it answers to the sender asking again for the packet number 14. Okay, packet
13 is again acknowledged (second time), but it is its first duplicate
acknowledgment; we represent this fact under the column Acked indicating
that 0 (new) packets have been acknowledged and that this is the first
duplicate acknowledgement of the packet number 13 (dp=1). |
|
Because no new packets are acknowledged the congestion window cannot be
increased; it maintains its previous value of 15 packets. Ssthresh and Algo
do not change either. Now we are in the line number 6. Congestion window is
15 packets but, for TCP, 15 packets are yet traveling somewhere in the
network because no new packets have been acknowledged. Really the packet
number 15 was already received and the pipe (network) has now 14 packets outstanding
(packet 14 that is lost and packets 16 to 28). Observe that the pipe is
being emptied in this circumstance. |
|
Following with line number 6, the congestion window is 15 packets but 15
packets are yet outstanding then we can't send any new packet. But, then, packet 16
is received and the receiver end sends again a new acknowledgement for
packet number 13; this is the second duplicate acknowledgement for this
packet. Congestion window must remain in 15 packets and Ssthresh and Algo
maintain its initial values. Now we are in line number 7; the same situation
is repeated here again, but beware. Something different has occurred this
time. |
|
When packet 17 reaches the receiver end, this sends back the third
duplicate acknowledgement for packet number 13. When this acknowledgement is
received by the sender side, it indicates to TCP that something bad is going
on. TCP takes this third duplicate as an indication that packet number 14 is
lost. Immediately, TCP reduces the congestion window to half the actual
value (integer value of 15 divided by 2, which is just 7). But it also
inflates the congestion window for the number of packets recently
acknowledged. Now it knows that the three duplicate acknowledgement
correspond to three packets that have been received by the other side
(packets 15, 16, and 17 for our case). Then, it inflates the value of 7
packets of the congestion window with 3 new packets, because if 3
acknowledgement have been received it is because 3 packets have left the
pipe. The new congestion window is now 10 packets. |
|
But TCP also enters in a new algorithm. Now, Fast Retransmit (FR) is
activated and the lost packet is immediately retransmitted. Ssthresh is
reduced to 7 packets. In line number 8, packet number 14 is sent again.
Packet 18 reaches now the receiver end, and the fourth duplicate
acknowledgement is sent back. When the new algorithm (FR) is running, the
congestion window will be incremented by one packet for each duplicate
acknowledgement received. The new congestion window is now 11 packets. The
pipe continues being emptied; only packets number 19 to 28 + the recently
retransmitted packet number 14 are now running (11 packets). |
|
In lines number 9 to13 the previous process is repeated. Packets 19, 20,
21, 22, and 23 leave the pipe. The pipe continues being emptied and now it
has only 6 packets (packets number 24 to 28 + packet number 14). Duplicate
acknowledgement 5, 6, 7, 8, and 9 for packet number 13 are sent back
(retransmitted packet number 14 has not been received yet, of course,
because it was sent just after packet number 28 was sent), and the
congestion window is increased by 1 packet each time, until reaching 16
packets. |
|
In line number 14, the congestion window is 16 packets, but, for TCP, 15
packets are yet in the pipe because it only has received the acknowledgement
for packet number 13. Then, it has to suppose that packets 15, 16, 17, ...,
28, and 14 (15 packets) are somewhere enjoying their trip. But now, because
congestion window is 16 packets and 15 packets are outstanding, we can send a new
packet to try to refill the pipe. Packet number 29 is sent. The pipe has now
7 packets (packets number 24 to 28 + packet number 14 + packet number 29). |
|
|
|
|
When packet number 24 reaches the receiver end, the duplicate
acknowledgement number 10 for packet number 13 is sent back and the
congestion window is increased to 17 packets. Now we are at the beginning of
the line number 15. For this line and for lines number 16, 17, and 18, the
process is very similar. Congestion window is always one packet more than
packets TCP thinks are going in the network. Packets 25, 26, 27, and 28
leave the pipe generating duplicate aknowledgement numbers 11, 12, 13 and 14
for packet number 13. But packets 30, 31, 32, and 33 enters the pipe at the
same time to replace leaving packets. The pipe continues having 7 packets
(packets number 14, 29, 20, 31, 32, and 33). It is very important because
TCP doesn't allow the pipe from having less than 7 packets. |
|
At line number 19, packet number 34 is sent but something very
interesting occurs. Packet number 14 reaches the receiver end breaking the
bottleneck we have (this means, packet number 13 being repeatedly
acknowledged). When packet 14 is acknowledged, all packets from packet
number 15 to packet number 28 are acknowledged simultaneously too, because
retransmitted packet number 14 was sent exactly behind packet number 28;
after having acknowledged all these packets, TCP sender adjusts the
congestion window to the value of the slow start threshold (7 packets) and
enters in congestion avoidance algorithm (CA). This is the moment that makes
Van Jacobson to say: "But I was delighted by how beautiful it worked…". The
congestion window is set to 7 packets that is exactly one packet more that
the number of packets we have outstanding in the pipe (packets number 29,
30, 31, 32, 33, and 34). Then, we can send another packet (packet number 35)
and, as Van Jacobson said, "there is no sudden burst as the hole is
filled...". |
|
From here, TCP applies congestion avoidance algorithm (CA). For each
roundtrip time, the congestion window is increased by one packet. When
packets number 29 to number 34 are acknowledged, packets number 36 to number
41 are sent and the congestion window is set to 8 packets. When packets
number 35 to number 41 are acknowledged, 8 new packets can be sent and
packets number 42 to number 49 are sent. Congestion avoidance is the TCP's
steady state algorithm that represent TCP behavior by the TCP response
function studied by Sally Floyd, Matthew Mathis, Jeffrey Semke and Jamshid
Mahdavi. Have a look here for
more information. |
|
Note: these algorithms were included in what
was called Reno TCP implementation. |
|
LB. |
|
|
|
|
|
Previous
|
Content
|
Next
|