Knowledge base All about MTU
In simple terms, when you send information from one place on the internet (e.g. a web server) to another (e.g. your computer) the data is broken up in to packets. The sender breaks the overall data in to small chunks, and sends them over the internet. The internet only handles packets. The packets arrive at the other end, and they are put back together to make the whole of the original data. This could be a web page, email, image, or whatever. If a packet gets dropped, the sender works this out and resends it. That way you don't get gaps in what you receive. The process is managed by a protocol called TCP (Transmission Control Protocol). The packets get to the other by a protocol called IP (Internet Protocol). There are other protocols that work over the internet but they all come down to sending packets.
How big is a packet?
The simple answer is 1,500 bytes. But it is more complex. The internet is actually a combination of routers and links. Each router has links to other routers. Your packets go from one place to another by being passed along from one router to another over these links.
One of the main types of link is Ethernet which is used for Local Area Networks (LAN). You have probably encountered a LAN as they are used in offices and homes, and connect things together. You probably use Ethernet to connect your broadband router to your computers in your house. Ethernet allows 1,500 byte packets to be carried. Internet providers use much faster links such as gigabit and 10 gigabit and these often allow bigger packets up to around 9,000 bytes. Some links on the internet are set up specially for certain traffic and have links that support packet sizes like 1,548 bytes.
The links, like Ethernet, limit the size of packets that can be sent. The size of packet that can get from one place to another without any difficulties depends on the smallest link in the chain. On the internet as a whole this is generally 1,500 bytes. But this depends where the packet goes and what links are used. The smallest link allowed is 576 bytes which used to be common for modems on dialup-internet.
The maximum size of packet you can send on a link is called the MTU (Maximum Transmission Unit). The maximum size you can receive is the MRU (Maximum Receive Unit). The terms MRU and MTU often get used interchangeably for obvious reasons.
What happens if a packet is too big?
Consider a packet being sent that is 1,500 bytes. It is passed from router to router until one that has a link which is less than 1,500 bytes. This creates a problem as it cannot send the packet on to the next router via that link. There are two options:-
- (A) Don't send the packet and send an error message back saying it could not be sent
- (B) Break the packet up in to smaller bits (called fragments) which will fit, and send these on to the next router.
The choice depends on the packet. For IPv6 packets you have to take option (A) and send an error. For IPv4 packets the packet has a flag called DF (Don't Fragment). If that is set you have to take option (A) and send an error. If not, then you take option (B) and fragment the packet.
If an error is sent back then the sending computer can try again sending smaller packets this time. If the packet is broken in to fragments then they still arrive at the destination and can be put back together by the receiving end. Either way the data gets through.
Is there another option?
In most cases links support 1,500 bytes, so there is not problem. However, where we know there is a specific issue there is another option, an option (C). It is a bit of a bodge, but works. We have added this as a user-settable feature on the line so you can contol if you want us to work-around the issue or not. This is mainly for where the link from us to you is restricted, typically to 1,492 bytes, for PPPoE.
It works by exploiting the fact that at the start of a TCP session each end tells the other the MTU it can handle (actually the MSS which is the TCP payload but one is derived from the other). When we have have the fix enabled we check the initial TCP handshake packets and we change them if the MSS specified would be too big. This means each end may say they handle 1,500 bytes, but the other hears that they handle, say, 1,492 bytes.
When do you get small links?
The internet as a whole generally works with 1,500 byte packets with no problem, but there are cases where smaller MTU links are used.
One is a VPN (Virtual Private Network). These create links that connect computers (routers) virtually. The link is not real but involves taking packets and wrapping them up in another packet which is sent over the internet to the other end where it is unwrapped. Much like putting a letter in an envelope and then putting it in another envelope - you need a bigger envelope on the outside. This means that if the outside envelope has to fit in 1,500 bytes, the inner one can't be as big, and that makes a link with a small MTU.
The other main example is dialup and broadband links. In practice a broadband link works much like a dialup line as it uses a protocol called PPP (Point to Point Protocol). Part of this is each end telling the other its MRU. I.e. how big a packet it can receive. There are a number of reasons a router might decide to say it cannot handle the full 1,500 bytes:-
- The router could be using PPPoE (PPP over Ethernet) bridging which requires an extra 8 byte header for PPP and then sends the packet over 1,500 byte Ethernet, so the packets sent in the PPP layer can be at most 1,492 bytes.
- Some part of the link could require the use of a smaller MTU, typically because PPPoE is being used within a back-haul network (e.g. Be lines), so the router has to be set to use 1,492.
- The router could simply have a default of 1,492 as PPPoE is common in many countries as the standard way to connect broadband lines. In the UK we usually use PPPoA (PPP over ATM) which allows the full 1,500 bytes.
- The router could be stupid and not even allow you to change the default to the normal 1,500 bytes even though using PPPoA.
- The router could have been deliberately set to a lower MTU by the owner for reasons of their own.
- The router could be fine and BT could have messed up (see below)
Having a lower MTU is not necessarily a problem - as we said, either way the data gets through. But as soon as you don't have the standard 1,500 byte MTU you can run in to issues.
Why do things go wrong?
If everything worked as it should a smaller MTU would not be any problem, but there are reasons why it does not:-
- Some people running web servers (notably some banks) set up their network so that they block the error message that is sent back when a packet is too big. This would not be too bad if they did not also try and send 1,500 byte packets with the DF bit set. The result is the packet gets dropped when it hits a sub 1,500 MTU link and has to retry. Eventually it may try a smaller packet size but this could be 20 seconds later. This is a stupid network setup on the part of the person running the web server.
- Some people set up their firewalls to block any fragmented packets. This is because it is hard to tell what a fragmented packet is as you don't have all of the data. However, it means that fragments don't work. Not everyone sends packets with DF to start with (a process called Path MTU discovery) so fragmentation happens. If you have a broadband link with less than 1,500 MTU and a firewall (or even the router) blocking fragments you are likely to have a problem with some places (notable MSN messenger).
Fragments are bad
Fragments are bad anyway, which is why IPv6 insists you always take option (A) and send an error message. Fragments create extra overhead as each packet has headers that have to be copied in to each fragment. They also work badly when a link is congested as dropping any fragment in a packet means the whole packet is lost. They also take up CPU time creating the fragments and putting them back together. All in all it is better if the sending end creates the smaller packets in the first place. This means Path MTU Discovery being used and the error message not being blocked! Fortunately IPv6 mandates this, so the next generation of internet protocol should not have the same issues.
PPPoE problems (technical)
One of the main causes of a reduced MTU is PPPoE (PPP over Ethernet). This is because Ethernet allows 1,500 byte payloads, ideal for an IP packet, but PPPoE has a header which takes a total of 8 bytes. This would make 1,508 bytes with a full 1,500 byte IP packet.
Unfortunately the specifications are not that helpful here. RFC1661 defines PPP and states If smaller packets are requested, an implementation MUST still be able to receive the full 1,500 octet information field. RFC2516 defines PPPoE The Maximum-Receive-Unit (MRU) option MUST NOT be negotiated to a larger size than 1,492. But they are not incompatible statements - negotiating 1,492 does not mean you don't have to accept 1,500 byte packets (as per RFC1661), but you can't send on if PPPoE bridging, for example, so logically you would have to fragment the IP packet. Most routers do not do that! This is one of the reasons we get problems with MTU being smaller than 1,500 bytes.
There are also ways to do over sized PPPoE using baby jumbo frames on Ethernet. The Ethernet specification still says 1,500 bytes maximum even for gigabit speeds, but it common for gigabit equipment to support jumbo frames - i.e. larger Ethernet packets typically up to around 9,000 bytes. This is more than you need to do PPPoE with 1,500 bytes - you only need 1,508. However there are other wrapping and tunnelling cases where just a bit more is useful. Baby jumbo support normally means a bit more than the usual 1,500.
As there is no real way to tell if baby jumbo frames are supported on an Ethernet, RFC4638 defines an extra option to negotiate this at the Ethernet level. Of course two ends could simply agree to handle slightly larger Ethernet frames by configuration as well. Sadly this is not always the same level of operation or the same equipment that does the MRU negotiation at the PPP level, and if that knows PPPoE is involved it will not negotiate more than 1,492 MRU as per RFC2516. So typically some configuration is needed.
The upshot of all this? It is possible to get BT FTTC (Fibre to the Cabinet) circuits (which use PPPoE) working on full 1,500 byte PPP by using modified pppd on the customer end, a suitable network card that will handle 1,508 byte frames at 10/100Mb/s. We have done this! (Thanks to TonyHoyle and Simon, customers on irc, for tweaking pppd and testing this for us). The new FireBrick does, of course, support PPPoE with baby jumbo frames to handle 1,500 byte MTU and even bonds multiple lines. Using the right modem (and the DLINK 320B in bridge mode do this) you can negotiate and use 1508 bytes over ADSL as well.
What is/was broken in BT?
Just to add to the fun there is brokenness in BT. The new 21CN BRASs (Broadband Remote Access Servers) that BT use have a flaw which means that they actually lie about the MRU your equipment requested. It is not all of them, but some will claim you asked for 1,492. Some even say you asked for 1892 which is not an issue as we limit to 1,500 anyway. But this means your network may be set up right, yet still have a small MTU. If you also have a firewall blocking fragments then you are stuffed.
BT are rolling out a fix for this (as of June 2009), so the issue will go away. It is not an issue on 20CN lines.
What has changed at AAISP in 2009?
This is all very interesting, but why are people with perfectly working service from 2009 now seeing issues all of a sudden. What changed?
We have changed our LNS (L2TP Network Server) which connects to BT and routes customer traffic for 21CN lines. We have been using one we designed and build over 4 years ago, and now we are using a brand new one that is completely redesigned.
We have explained option (A),(B) and (C) - well, this older router used option (D) which was to send the packet anyway even if too big. It would still send the error message if DF was set, but if not then it did not fragment the packet - it just sent it. This is technically correct (the best kind of correct) as the PPP specification says the receiving end must always accept 1,500 bytes. When the issue was a misconfiguration (or even duff data from BT) the receiving end can handle 1,500 bytes so it just works. The problem comes when the receiving end really is using PPPoE and cannot send the packet on.
The new LNS works differently, and more conventionally. It will send a error when DF is set, in the same way, but otherwise large packets are correctly fragmented. This means PPPoE customers now get the packets (as fragments) and work correctly. So this is more correct.
This means anyone with the firewall/fragment issues explained above would have had no issue previously and now do.
Note 21st June 2009: We have set lines that appear to have a lower mru to send up to 1,500 bytes anyway for now, and support staff can change the settings. However, it seems a number of popular sites like microsoft.com have issues were they appear to send 1,500 byte packets with DF set and ignore ICMP errors so customers are seeing issues in some case.