Blog
Data Formats And Protocols
From the user’s point of view, this data can be a wide variety of information such as .B temperatures, switching states, weights, times, positioning details and much more.
But no matter what content is involved, for electronic data processing or computer technology, data is always an indefinite number of bytes.
One byte corresponds to a numerical value between 0 and 255.
Data exchange means transferring bytes from A to B
Controlled data exchange by protocols
In order for the receiver to understand the data he receives from the sender, it is important that it is determined in which form the data is transmitted.
In addition, in systems where several components are networked with each other, it must be determined for whom a data transmission is intended. In addition to the user data actually to be transmitted, address information must therefore be attached to the data transmission.
If user data and addressing information follow a given framework structure, this is referred to as a protocol.
In the past, fieldbus systems were often used for data transfer between industrial components. Fieldbuses are serial connections between the components involved. Various standards have established themselves side by side, which differ not only in protocol and transmission speed. The physical transmission up to the mechanical connection options used also vary greatly.
Newer industrial protocols differ at the protocol level in the encoding of the data, but as a physical transport medium, most use TCP/IP Ethernet.
Thus, there is a common standard that has many advantages:
- existing infrastructure can be used
- different industrial protocols can be used side by side in the same network
- Uniform transmission technology and connectors
- Cross-site communication possible
- freely expandable
File formats
For protocols that use TCP/IP Ethernet as a common standard, addressing is already done via the IP address, with a few exceptions.
The actual industrial protocol rather defines the form in which the transported data is transmitted.
There are two basic data formats:
- Message body
- Binary data
When which variant comes into play depends on many factors.
Data as text
Especially in web-based applications, data of all kinds is sent as text. Text means that the information is transmitted as a human-readable string. Each character occupies one byte.

In the past, coding was carried out according to the ASCII standard. The assignment of which character corresponds to which numerical value is defined in the ASCII table (ASCII = American Standard Code for Information Interchange).

The special feature in the past was that only 7 of the 8 available bits of a byte were used, which limits the usable character prefix to 128 readable characters.
Newer standards such as UTF8 overcome this limitation and allow special characters to use even two bytes for one character.
In addition to freely formulated text content, standardized text formats have established themselves in web and industrial protocols:
- XML
- JSON
We want to briefly explain both formats here.
XML – Extensible Markup Language
XML is a so-called markup language. The actual user data is embedded in tags. The tags are names of the respective values or contents. Each tag begins with an opening angle bracket and ends with a closing one.
Each XML construct begins with a start tag that specifies at least the XML version. Additional parameters, such as the character encoding used, are also possible:
After the start tag, the other content embedded in tags follows. All content except the start tag has an opening tag and a closing tag of the same name. However, the naming begins with a slash (“/”) at the closing tag.
Example:
irgendetwas
XML also allows structured tags nested into each other by hierarchy. Here is an example of the sensor values of a W&T Web thermal hygrobarometer:
Temperatur
0
°C
23.900000
rel. Feuchte
1
%
36
Luftdruck
2
hPa
992
The indentations are not mandatory with XML, but are common, as they significantly increase readability.
The advantage of XML as a transmission format is that both human and machine or an evaluative program can read the content well.
The disadvantage is the very high gross data volume for little content.
JSON – JavaScript Object Notation
The syntax, i.e. the structure of JSON, is based on a subset of JavaScript syntax.
JSON uses pairs of name and value/content to encode the data.
Example: “content” : “something”
JSON also allows a structured structure nested within each other according to hierarchy. Here is an example of the sensor values of a W&T Web Thermal Hygrobarometer:
{
"iostate":
{
"sensor":
[
{
"name": "Temperatur",
"number": 0,
"unit": "°C",
"value": 24.1
},
{
"name": "rel. Feuchte",
"number": 1,
"unit": "%",
"value": 35.9
},
{
"name": "Luftdruck",
"number": 2,
"unit": "hPa",
"value": 991.8
}
]
}
}
Both names and values are embedded in quotation marks at the top. An exception are numerical values – here the quotation marks can be omitted.
Name/value pairs are separated by commas.
Related name/value pairs must be grouped together with curly braces.
Groups that belong together can form an array and are grouped into square brackets separated by commas.
A detailed description of the JSON format can be found under https://www.json.org.
JSON is much more compact in terms of data volume than XML and yet easy to read by humans and machines.
Base64 encoding
Base64 is a method that encodes or decodes binary data into a chain of readable ASCII characters. In this way, binary content can also be transported with text-based transmission formats.
The procedure is quite simple. Three bytes of the binary code are transferred bitwise to four 6-bit numbers.

Each of the four numbers is assigned the character corresponding to the value according to the following table. Thus, three binary bytes are replaced by four chars, i.e. readable characters.

This process is repeated until all the binary bytes are encoded. If individual bytes remain at the end, fill bytes are added to encode the last three bytes. Fill bytes have a value of 0.
In order to be able to sort out the fill bytes during the subsequent decoding, i.e. the recovery of the original binary bytes, a “=” character is appended to the encoded string for each fill byte at the end.
The most common use cases for Base64 encoding are web-based applications and email.
Binary
Data is always a certain number of bytes.
Which byte serves which purpose at which point is determined either by a standardized protocol or the application. One or more bytes hides a value, a value array, a string or even a function call.
Individual values can be transferred in a data transmission. Often, however, data structures are also used in which it is determined at which point of the transferred byte chain which value is stored.
Here is an example of data from a Modbus function call. The function code is e.B. always housed in the 8th byte:

Another common method for binary data construction is TLV, which stands for Type Length Value. Multiple contents of any size can be transmitted consecutively in one data transmission.
For each content, the sequence applies:
- Type – what kind of content is it?
Type determination determined by the application - Length – how many bytes does the content contain?
- Value – Bytes of the value or content.
If there are more bytes behind such a sequence, this is the next sequence.
Here is a simple example:

The bytes transmitted contain two values: a 16-bit value (2 bytes) and a 32-bit value (4 bytes).
The advantage of binary data transmission is the very compact structure of the data.