|
Previous
|
Content |
Next
|
|
|
2.1.- Linux Traffic
Control |
|
|
|
| Each network card (NIC = Network Interface
Card) on a Linux box is drived by a Network Driver which control the
hardware. The Network Driver acts as an exchange mechanism of packets
between the Linux Networking Code and the physical network. This first
approach can be viewed this way: |
|

|
| In this diagram by Christian [9] the Linux
Networking Code can request the Network Driver to send a packet to the
physical network (packet is leaving the box) or the Network Driver can give
a packet it has received from the physical network to the Linux Networking
Code (packet is entering the box). |
| |
| Because our interest is to use Linux as a pure
router to forward packets from one interface to another in the same box,
it will not generate or consume any packet and our diagram can be extended
to include two NICs: |
| |
|

|
| |
| This way packets can enter from the physical
network A to the NIC number 0; the NIC 0's Network Driver gives the received
packets to the Linux Networking Code where they are forwarded; finally the
Linux Networking Code request the NIC 1's Network Driver to send the packets
to the physical network B. |
| |
| Let's now put our eyes in the Linux Networking
Code. Reading from [8] Saravanan wrote: |
| |
| The basic principle involved in the
implementation of QoS in linux is shown in figure below. This figure shows
how the kernel processes incoming packets, and how it generates packets to
be sent to the network. The input de-multiplexer examines the incoming
packets to determine if the packets are destined for the local node. If so,
they are sent to the higher layer for further processing. If not, it sends
the packets to the forwarding block. The forwarding block, which may also
received locally generated packets from the higher layer, looks up the
routing table and determines the next hop for the packet. After this, it
queues the packets to be transmitted on the output interface. It is at this
point that the linux traffic control comes into play. Linux traffic control
can be used to build a complex combination of queuing disciplines, classes
and filters that control the packets that are sent on the output interface. |
|
|
|
|
|

|
| Very well. But we are going to refine things a
little. First, we are not interested in local process, this means, we will
not accept packets for local and we will not generate packets from local.
Our Linux box will be a pure router: no packets are generated, no packets
are consumed, packets are just forwarded or perhaps dropped. This way your
box is a transfer machine and it is a lot better if you keep it as this. Do not
lend any service except forwarding with a router. Don't do that. |
| Observe also that after packets are forwarded by the
forwarding block they are enqueued in an output queue previously to be
transmitted on the output interface. On this output queue something
called Linux Traffic Control takes care of packet management. Reading
from Werner [8] we can have a better and detailed information including
a very nice figure: |
|
| Figure below shows roughly how the kernel
processes data received from the network, and how it generates new data to
be sent on the network: packets arrive on via an input interface, where they
may be policed. Policing discards undesirable packets, e.g. if traffic is
arriving too fast. After policing, packets are either directly forwarded to
the network (e.g. on a different interface, if the machine is acting as a
router or a bridge), or they are passed up to higher layers in the protocol
stack (e.g. to a transport protocol like UDP or TCP) for further processing.
Those higher layers may also generate data on their own and hand it to the
lower layers for tasks like encapsulation, routing, and eventually
transmission. |
| "Forwarding" includes the selection of the
output interface, the selection of the next hop, encapsulation, etc. Once
all this is done, packets are queued on the respective output interface.
This is the second point where traffic control comes into play. Traffic
control can, among other things, decide if packets are queued or if they are
dropped (e.g. if the queue has reached some length limit, or if the traffic
exceeds some rate limit), it can decide in which order packets are sent
(e.g. to give priority to certain flows), it can delay the sending of
packets (e.g. to limit the rate of outbound traffic), etc. Once traffic
control has released a packet for sending, the device driver picks it up and
emits it on the network. |
|

|
| Werner explanation helps us a lot. Again to
keep our box as a pure router no packets will be sent to the upper layers
and no packets will be received from the upper layers. Our packet route will
be then: Input Interface → Ingress
Policing → Input Demultiplexing (just to
say that packets are not for local use) →
Forwarding → Output Queuing
→ Output Interface. |
| Where are we going to exercise control over our
packets? Just into the blue blocks: Ingress Policing and Output Queuing.
These blocks are called the "Traffic Control Code of the Linux kernel".
Ingress policing will be our first point of control; here policing discards
undesirable packets to enforce an upper limit on the entering throughput.
The second point will be the Output (Egress) Queue; here packets are queued
and then dropped, delayed or prioritized according to the policy we are
interested to implement. |
| Drilling down let's see how the Linux Traffic Control
Code is implemented. As is name says it is just C Language code consisting
of four major conceptual components: |
- Queuing disciplines
- Classes
- Filters
- Policers
|
|
|
|
| Queuing disciplines are algorithms which
control how packets enqueued are treated. A very simple queuing discipline
could be a FIFO (First In-First Out) queue of 16 packets. In this queue
packets enter at the tail and leave the queue from the head. Something as
simple like picture in figure 2.1.5 shown below. |
|

|
| In this the queue is moving 20 packets.
Packets 1 and 2 already entered and left the queue, packets 3 to 18 are just
in the queue and packets 19 and 20 are in the process of entering the queue.
Of course, before a new packet can be accepted another has to be dispatched.
If packets arrive very fast, faster than they can be dispatched, the router
doesn't have any other choice that dropping them. But we don't want this; we
want to conserve our packets to protect TCP protocol behavior and
applications than send/receive these packets. Or better yet, we want to
exercise a true control over them: which are going to be forwarded and how
fast; which are going to be delayed; and which, lamentably, dropped. The
queue is an area of memory on our router. If we are expecting packets of
say, 1000-byte size, we have to reserve an area of 16KB on memory to
implement this simple 16-packets FIFO queue. |
| To step ahead we would like to extend now a
little our knowledge about the Linux Traffic Control code. At this time
Christian [9] explanation of the 'Class' concept come to us as a ring to the
finger: |
| The basic building block of the Traffic
Control system is qdiscs as described above. The queuing disciplines
mentioned in the previous section are comparatively simple to deal with.
That is, they are setup and maybe given some parameters. Afterwards packets
can be enqueued to them and dequeued from them as described. |
| But many queuing are of a different nature.
These qdiscs do not store packets themselves. Instead, they contain other
qdiscs, which they give packets to and take packets from. Such qdiscs are
known as qdiscs with classes. |
| For example, one could imagine a
priority-based queuing discipline with the following properties: |
- Each packet enqueued to the queuing discipline is assigned a
priority. For example the priority could be deduced from the source or
destination IP address of the packet. Let us say that the priority is a
number between 1 and 5.
- When a packet is dequeued it will always select a packet it
contains with the lowest priority number.
|
| A way to implement such a queuing discipline
is to make the priority-based queuing discipline contain 5 other queuing
disciplines numbered from 1 to 5. The priority-based queuing discipline will
then do the following: |
- When a packet is enqueued, it calculates the priority number, i.e.
a number between 1 and 5. It then enqueues the packet to the queuing
discipline indicated by this number
- When a packet is dequeued it always dequeues from the non-empty
queuing discipline with the lowest number.
|
| What is interesting about this, is that the
5 contained queuing disciplines could be arbitrary queuing disciplines. For
example sfq queues or any other queue. |
| In Linux this concept is handled by classes.
That is, a queuing discipline might contain classes. In this example, the
priority queuing discipline has 5 classes. Each class can be viewed as a
socket to which you can plug in any other queuing discipline. When a qdisc
with classes is created, it will typically assign simple FIFO queues to the
classes it contains. But these can be replaced with other qdiscs by the tc
program. |
| Bravo!! Some clearing: qdisc means queuing
discipline; the qdisc he mentioned in a previous section was just a FIFO
queue. But what he is magisterialy describing here is a qdisc called PRIO
queue; a queue with classes, certainly. SFQ is another of these qdiscs. And
tc is a program written by Alexey Kuznetsov to manage qdiscs on Linux. |
| |
| But Werner in [6] has something to say about
this stuff; let's see: |
| |
| Queuing disciplines and classes are
intimately tied together: the presence of classes and their semantics are
fundamental properties of the queuing discipline. But ability doesn't end
yet - classes normally don't store their packets themselves, but they use
another queuing discipline to take care of that. That queuing discipline can
be arbitrarily chosen from the set of available queuing disciplines, and it
may well have classes, which in turn use queuing disciplines, etc. |
| |
| Nice, really nice. If we don't misunderstood
the explanation we can have a tree with hierarchies; queing disciplines have
classes but these classes normally don't store their packets themselve,
instead they use another queuing discipline to take care of that. And these
queuing disciplines can have classes again which in turn use queuing
disciplines and so on. |
| |
| But let's review Christian [9] explanation
above to extend even more our knowledge about Linux Traffic Control code. He
wrote: "For example the priority could be deduced from the source or
destination IP address of the packet". It sounds known for us. Search
back this document and have a look to the "classifier" concept. Christian is
trying to tell us that the PRIO qdisc he described has five classes numbered
1, 2, 3, 4 and 5. When a packet arrives to our router we snoop its IP header
to see its source or destination address and based on either of these, or
both, we select a priority from 1 to 5 for the packet; and with this
priority on hand we put the packet in one of the classes of the qdisc
according. |
| |
| What are we doing? Just classifying the packet.
We are telling: okay, you are coming from network, say, 203.16/16, then your
priority will be 3; ... take the door number 3 please, where you will be
attended. |
| |
| To do this we need some kind of filter for
selecting packets. Someone like a usher that when you arrive to the theater
ask you: what your chair number is? Okay. Go there and search for row number
26, etc., etc. If you search back above a little, a filter is the third
conceptual component of Werner [6] explanation. Let's read from him again: |
|
|
|
|
| Each network device has a queuing discipline
associated with it, which controls how packets enqueued on that device are
treated. Figure 2.1.6 shows the symbol we use for a queuing discipline
without externally visible internal structure. For example, sch_fifo is such
a simple queuing discipline, which just consists of a single queue, where
all packets are stored in the order in which they have been enqueued, and
which is emptied as fast as the respective device can send. More elaborate
queuing disciplines may use filters to distinguish among different classes
of packets and process each class in a specic way, e.g. by giving one class
priority over other classes. Figure 2.1.7 shows an example of such a queuing
discipline. Note that multiple filters may map to the same class. |
|

|
|

|
| Eureka!! Now we have some pictures. It helps
more than a hundred words. In fact this kind of pictures are going to be our
everyday bread when dealing with queuing disciplines. It's a schematic and
easy form of representing them. Looking at the pictures we can have a fast
mental scheme of how is the qdisc behavior. |
| For example, looking at figure 2.1.7, packets
enter the main qdisc by the left; immediately the classifier takes care of
them and using filters select the class where each packet has to be put.
When already in the class the queuing discipline on it takes care now. If
this last qdisc doesn't have classes, as in the figure is shown, the packet
is delivered to be sent to the physical network. Observe also something
interesting: two (or more) filters can point out to the same class. Let's
see now another of these diagrams and its explanation taken from the Werner
[6] document: |
| Figure 2.1.8 shows an example of such a
stack: first, there is a queuing discipline with two delay priorities.
Packets which are selected by the filter go to the high-priority class,
while all other packets go to the low-priority class. Whenever there are
packets in the high-priority queue, they are sent before packets in the
low-priority queue (e.g. the sch_prio queuing discipline works this way). In
order to prevent high-priority traffic from starving low-priority traffic,
we use the token bucket filter (TBF) queuing discipline, which enforces a
rate of at most 1 Mbps. Finally, the queuing of low-priority packets is done
by a FIFO queuing discipline. Note that there are better ways to accomplish
what we've done here, e.g. by using class-based queuing (CBQ). |
|

|
| Great!! Werner talked about four Linux queuing
disciplines (FIFO, PRIO, TBF and CBQ) and clears a lot our understanding
about this stuff. The diagram implements three of these qdiscs. Now we know
we can implement a default class where packets are sent when they do not
match any of our filters. Seems like something well-known for us; something
like "leave a default codepoint and assign it to the best-effort PHB". Also
Werner's figure can be associated with those taken from RFC 2475
specification; see figure 1.4.1 somewhere above. |
| Let's see now with a little detail how the
diagram represents our queuing discipline. The main qdisc is the PRIO queue
which receives the packets. It applies the filter and select those of them
marked as "high priority". How the mark is implemented? We don't know
where the packets were marked but we know that marking could be based on MF (multi-field selection)
or DS-codepoint. |
| Either being the case the classifier put them
in the "High" class above. Rest of the packets (not marked with our high
priority identifier) go to the "Low" class below. In the "High" class a TBF
qdisc is implemented. As Werner explained before: Typically, each class
"owns" one queue… This time the queue assigned to the "High" class is a
TBF queue. What this queue is trying to do? Just controlling the maximum
throughput traversing through it to 1 Mbps (have a look to the diagram). To
the right of the TBF queue representation there's a little queue and a
little clock shown. They represent the queue behavior and that some sort of
metering is been made on it to know how much is the flow flowing. |
| The "Low" class queue is a FIFO queue. We saw
something about it somewhere above. A simple FirstIn-FirstOut queue. It
doesn't try to exercise any control just to enqueue and dequeue packets as
they are arriving. Finally qdiscs for both classes deliver the packets on
the right side of the main queue to be dispatched to the physical network. |
| A very interesting observation to be done here
is that the "High" priority class is controlled by an upper level of
throughput (implemented by TBF). We talked something about this before. High
priority class traffic has to be controlled to avoid "starving" of lower
priority classes. Of course, with this PRIO qdisc implementation this will
work just if the transport protocol is responsive, like TCP. Why? |
| To step ahead let's now read from Saravanan [8]
document: |
| This section discusses queuing disciplines,
which form a basic building block for the support of QoS in linux. It also
discusses the various queuing disciplines that are supported in linux. Each
network device has a queue associated with it. There are 11 types of queuing
disciplines that are currently supported in linux, which includes: |
- Class Based Queue (CBQ)
- Token Bucket Flow (TBF)
- Clark-Shenker-Zhang (CSZ)
- First In First Out (FIFO)
- Priority Traffic Equalizer (TEQL)
- Stochastic Fair Queuing (SFQ)
- Asynchronous Transfer Mode (ATM)
- Random Early Detection (RED)
- Generalized RED (GRED)
- Diff-Serv Marker (DS_MARK)
|
| Queues are identified by a handle <major
number:minor number>, where the minor number is zero for queues. Handles are
used to associate classes to queuing disciplines. Classes are discussed in
the next subsection. |
| |
| Gosh!! A lot of queues. Actually there are some
more just developed but not officially implemented in the Linux kernel. Like
HTB (Hierarchical Token Bucket) written by Martin Devera; WRR (Weighted
Round Robin) written by Christian Worm Mortensen and IMQ (Intermediate
Queuing Discipline) written by Patrick McHardy. Additionally some other
exist not implemented yet on Linux, like CBT (Class-Based Threshold); FRED
(Flow-Based RED); DRR (Differential Round Robin) and D-CBQ (Decoupled-CBQ).
Cisco also has some polished queues like WFQ, D-WFQ, WRED, CBWFQ, CQ, PQ
and stop you from count; there are a lot. |
| |
| But don't be afraid. For our work we will
concentrate on these: FIFO, PRIO, TBF, SFQ, RED, GRED, CBQ and DSMARK.
(Note: really, after this work was started I decided to use HTB instead
of CBQ to implement DS; then CBQ will be replaced with HTB along this
document) It's going to be a very nice trip. Al least I will do my best effort. By
the way, for dealing with all these beasts we will need some mechanism to
talk with them, this means, a tool which permit us to be the conductor of
this orchestra. This tool was developed some years ago by Alexey Kuznetsov
and it is called tc (traffic control). tc is not as friendly as most of us
would like it should be, but, it's what we have. Recently, Werner
Almesberger, really impressed for the lovely care that everybody feel for
tc, wrote tcng (traffic control new generation). People still screaming. I'm
just joking, of course. |
| |
| This time our strategic will be very simple.
First we will present a brief approach to the behavior of each queue; some
theoretical but no so deeper support to understand how the queue was
conceived. Next, how the queue is configurated using tc, and within this
battle, an explanation about parameters and recommended settings. |
| |
| For this part of our study we will have the
little help of our friends: Bert Hubert and Lartc people [7]; Bert Hubert
and his excellent work writing a manpage for these stuffs; Werner
Almesberger, Jamal Hadi Salim and Alexey Kuznetsov [10] mainly for
understanding GRED and DSMARK; some shallow diving into Alexey Kuznetsov
code and documentation to understand how tc works; and last but not least:
for illustrating as better as possible because Linux people are not
distinguished for being good pedagoge teachers, I'm going to use some
figures and concepts taken from "Supporting Differentiated Service Classes:
Queue Scheduling Disciplines" by Chuck Semeria of Juniper Networks [11].
Hoping to not having rights problem with these people (in case of having
please make a collect to help me on my defense). I'm going to dare to use
their excellent documentation to support our explanation. Without losing
more time let's start with the simplest one, the FIFO queuing discipline. |
|
|
|
|
|
|
|
|
|
Previous
|
Content |
Next
|