Endian mismatch

Basically, “byte ordering” refers to the order that is used while storing sequence of bytes in computer memory. Two major ordering methods used with modern computers are described below. Reason to call the “byte ordering” instead of “bit ordering” it that, normally smallest memory unit that get assigned with a memory address is 8 bits or 1 byte.

Big-endian and Little-endian byte ordering

“Big-endian byte ordering” refers to storing sequence of bytes, starting from the most significant byte (“Big End In”). Meanwhile “Little-endian byte ordering” refers to storing sequence of bytes starting from the least significant byte (“Little End In”).

Example is worth than hundreds of words. So lets proceed with a simple example.

Take 1201 as an example. Which is the course code of “Computer Organization” course.

1201 in base 10 = 0000 0100 1011 0001 in base 2

Dividing base 2 values into chunks of bytes will give an output like this:

00000100 10110001

Most significant byte of this number is – “00000100”
Least significant byte of this number is – “10110001”

So big-endian systems and little-endian systems will store these numbers as mentioned below.
Big-endian Little-endian
00000100 10110001


Address+0 – 00000100
Address+1 – 10110001

10110001 00000100


Address+0 – 10110001
Address+1 – 00000100

Base 16 is the most popular number format used in representing data sored in computer memory. So below is the same scenario in Hex.

1201 in base 10 = 04B1 in base 16

Most significant byte of this number is – “04”
Least significant byte of this number is – “B1”
Big-endian Little-endian
04 B1 B1 04

Endian Mismatch

So what will happen if we directly shift above number from a big endian system to little-endian system, without any reordering or conversion process.

Imagine big-endian system is sending “00000100 10110001” stored in its memory to a little-endian system. Little endia system will store the number as shown below:

Address+0 – 00000100
Address+1 – 10110001
Now, little-endian computer’s CPU will identify this value as 1011 0001 0000 0100 in base 2, which is 45316 in decimal. The accepted output which is 1201 and the returned value is already messed up. So is it possible to overcome this problem?

Solutions for Endian mismatch.

  • Use of converters
Big-endian and little-endian problem affects most of the programming languages. The binary files created by a compiler can only be executed by processors designed to work with the same architecture. In order to avoid this there are language compilers designed for various architectures. So that, it is possible to create binaries for different architectures. Java programs are compiled into a byte code and .Net programs are compiled to MSIL. So in this case, explicit conversion is not necessary. Different runtime environments are available for different architectures, which will implicitly convert byte code or MSIL to executable formats that suites the architecture.
A popular character format in modern computing is Unicode. BOM (Byte Order Mark) is used to support identification of endianness of a Unicode stream. Normally BOM points to special Unicode value “U+FFFF” and it appears in the beginning of a Unicode stream when UTF-16 is used. So a computer receiving the Unicode stream can easily identify the endianness of the stream and perform required conversions to correctly process data.

Above mentioned are only some type of conversion and there are number of other conversions taken place to avoid this issue.

  • Use of common protocols
This is another type of conversion. However it has been discussed separately because of its importance. It is possible to standardize protocols to use a common byte ordering system. As an example TCP/IP is a big-endia protocol. So, developers do not have to worry about the conversions or byte ordering as long as they use standard protocols like TCP/IP. Little-endian system snows that TCP/IP is big-endian so the system itself will convert its output to big-endia before sending data and also system will convert incoming data from big-endian to little-endian.
  • Both machines use same byte order
There are some processors that can work on both big-endian mode and little-endian mode (Ex : PowerPC). So if both systems agree to work on a same byte order this problem will not arise.

Identifying Big-endian over Little-endian

Thought of posting this here for those who are interested in programming. This is a C based code that can be used to identify the architecture that you are running. It create a int variable with Hex value 12345678 and shifts it to a char variable which is narrower than integer. So finally it checks if the char if equal to hex value “75” using a pointer.

Protected by Copyscape Duplicate Content Software

2 comments on “Endian mismatch

  1. Devaka Cooray January 28, 2011 11:56 AM

    Wow! It has a good comment at the end 😛

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">