#Scrambling.tex#

\chapter{Cryptography}\label{ch:Scrambling}
Cryptography is the science of rendering content incomprehensible for 
undesired readers. However, securing content is not only about 
cryptography. The main reason why cryptography is attacked, is because 
an attack has a very low chance of being detected. There will be no 
traces of the attack, since the attacker’s access will look just like 
an 
ordinary access. \citep{Schneier:2003}

This can be compared to a real-life break-in. The break-in will be 
noticed if the thief breaks in using a crowbar. On the other hand, you 
might never notice that the security had been breached, if the the 
thief were to pick the lock instead. \citep{Schneier:2003}

One of the more noteworthy cryptography rule is that you always are to 
assume that someone is out to get you. Because of this, 
\citet[pp. 12--14]{Schneier:2003} claims that it is always needed to 
look for possible ways to break systems, to ensure that the security 
can not be breached, and thereby provides security.

\section{Why cryptography is needed}
Cryptography is the science of rendering plaintexts into ciphertexts to 
protect contents from unauthorized viewing. It is used in electronic 
communication for protection of e-mail messages and credit card 
information among other things. If data is sent without being encrypted,
someone listening in to the transmission channel will access the data.

For most people this is not a problem, but in some instances sending 
secure messages can be extremely important. One example is 
communication during war, where a single piece of intelligence might 
turn the tide of the entire war. Moreover, you do not want people to 
read your account information or credit card number when you do online 
shopping. Both of these problems can be solved by cryptography and 
encryption.

\Warning[Source]{Typ Schneier har jag för mig? Låna om boken?}
%This might need a rework!

\section{Scrambling and Encrypting}
Scrambling can be seen as the distortion of a plain-text, using a 
control word. However, encryption can be seen as the entire process of 
protecting content. This includes everything from generating the 
control word to scrambling data. Scrambling can therefore be seen as a 
part of encryption.

%Jag vet inte om det här är relevant, eller stämmer

Yet another reason to scramble, is to reduce the number of adjacent 
data bits with the same value, like strings of zeroes or ones. It could 
also serve to balance the number of zeroes and ones in strings. This is 
done as to try to obtain DC balance. DC balance is desired since it 
avoids voltage imbalance during communication between connected systems.

\Warning[Rewrite]{And find a good source for this}
%Det är osannolikt att det här stämmer helt och hållet :)

\section{Data packets}\label{sec:Data}
The data processed by the DVB systems is sent in data packets. All of 
them are created from Elementary Stream packets (ES) which are 
generally the output from an audio or video encoder. The ES-packets are 
then packeted into Program Stream- (PS), Transport Stream- (TS) or 
Packatized Elementary Stream (PES) packets and then distributed. Among 
the three ways of packing data, only two are interresting from a DVB 
perspective. This is due to PS packets being used for storing data, 
while TS and PES are used for transmitting data. The interresting 
types, when working with DVB, are therefore the TS packets as well as 
the PES packets. PES packets are often packed into the payload of TS 
packets. The payload is the part of the packets which is the actual 
data, which is everything except the header and adaptation field.

\subsection{TS packets}
TS packets are used by the DVB society due to their fixed length, and 
the fact that TS packets are meant to be used for streaming services, 
while PS packets are used for storing packets of data. TS packets have 
got a length of 188 bytes with a 4 byte long header. This means that 
the payload consists of a maximum of 184 bytes. The layout of a TS 
packet can be viewed in figure \ref{img:Package}\citep{DVB:2013}.

\begin{figure}[h!]
  \includegraphics[width=\textwidth]{TSpacket}
  \caption{General layout of a data packet}
  \label{img:Package}
\end{figure}

The TS packet consists of 4 different kinds of building blocks where 
only the header is guaranteed to be present. Those blocks are:

\begin{itemize}
\item Header
\item Adaptation field
\item Encrypted payload
\item Clear payload
\end{itemize}

The byte-sizes of the building blocks of a TS packet are:

\begin{itemize}
\item header\_size = 4
\item adaptation\_field\_size = the size of the adaptation field
\item payload\_size = 188 - (header\_size + adaptation\_field\_size)
\item encrypted\_payload\_size = payload\_size - [payload\_size mod block\_size]
\item clear\_payload\_size = [payload\_size mod block\_size] 
  (or simply payload\_size - encrypted\_payload\_size)
\end{itemize}

The header is always 4 bytes, while the adaptation field can have any 
size between 0 and 183 bytes. This means that the clear payload can 
be of any size stretching from 0 bytes, to one byte smaller than the 
block size. The rest of the data consists of the payload.

\subsubsection{Header}
The header consists of information regarding the packet, and has a 
sync\_byte (with a hex-value of 0x47, or bit-value of 01000111) to 
announce the beginning of a packet. The value of the sync\_byte 
corresponds to the ASCII-value of the letter G, for go. 
The header also contains information as to whether there is an 
adaptation field and payload in the packet, what Packet ID (PID) the 
packet has, if it should be prioritized, whether the data is 
scrambled - and in that case if it was scrambled with an odd or even 
key, among others \citep[pp. 25--26]{etsiMPEG:2009}. The header should 
never be encrypted and is always found at the beginning of a packet 
\citep[pp. 10--11]{DVB:2013}.

The header contains the following:
\begin{longtable}{| l | l | l |}
  \hline
  Bits & Name & Description \\ \hline
  8 & Sync byte & Fixed byte value 0x47 \\ \hline
  1 & Transport Error Indicator & Uncorrectable bit errors exist. \\ 
  \hline
  1 & Payload Unit Start Indicator & TS packet contains PES packets or\\
  & & Program Specific Information (PSI data)\\ \hline
  2 & Transport Scrambling Control & 00 No scramling, 01 Reserved, \\
  & & 10 Even key, 11 Odd key \\ \hline
  1 & Transport Priority & 1 gives this packet higher priority \\ \hline
  13 & PID & Type and number of data \\
  & & stored in packet payload \\ \hline
  1 & Adaptation Field Control & 1 means that an adaptation field 
  exists\\ \hline
  1 & Contains Payload & 1 means that payload exists \\ \hline
  4 & Continuity Counter & Sequence number of payload packets\\ \hline
\end{longtable}

\subsubsection{Adaptation field}
The adaptation field is a sort of padding that is inserted when the 
end of the data does not align with the end of the TS packet. This is 
done to make sure that the TS packet is filled with known data. The 
only time that the adaptation field can exist is when the last string 
of data is processed, if the end of data does not align with the end 
of the TS packet. Adaptation fields are never encrypted. 
\cite[pp. 10--11]{DVB:2013}

\subsubsection{Encrypted and clear payload}
When working with block ciphers, clear bytes of data tend to turn up. 
This is since block ciphers only encrypt data blocks of fixed sizes. 
The clear data is always the data located at the end of the received TS 
packet. When receiving a TS packet, the first thing to be done is to 
find the start of the payload. While there is no adaptation field, this 
is the data right after the header. If there is an adaptation field, 
some searching for the data is needed to be done. After the start of 
the payload is found, blocks of a given size are sent to the scrambler.
The remainder of data, when all of the blocks of the right size have 
been scrambled, is to be left clear. The number of unscrambled bytes 
might be of sizes up to one byte smaller than the block size. This 
means that the AES-128, which works on block sizes of 16 bytes, can 
have a maximum of 15 clear bytes at the most. The encrypted payload is 
always located in front of the clear payload. 
\cite[pp. 10--11]{DVB:2013}

\subsection{PES packets}
The PES packets have varying lengths of up to 64 kilo bytes, and are 
often packed into TS packets when distributed, due to the strength of 
TS packets. The payload data in the TS packets, when carrying PES 
packets, consist of the entire PES packets, which is the header as well 
as the data. PES packets do not use adaptation fields, since they are 
of adaptable lengths, as long as the length of the packet does not 
exceed 64 kilo bytes.

Since Digital Video Broadcasting seldom uses itself of PES packets, an 
analyzation of the header will not be done in this report. The 
derivation of PES packets from TS packets can be seen in Figure 
\ref{img:PES} \citep[p. 9]{ETR:289}.

\begin{figure}[h!]
  \includegraphics[width=\textwidth]{PES.png}
  \caption{PES packet derived from TS packets}
  \label{img:PES}
\end{figure}

\section{Encryption and Decryption}
There are three things that you need when you encrypt and decrypt 
messages. Those are the algorithm, plaintext and the key. Even though 
there are plenty of ways to encrypt messages, there are mainly two ways 
of sharing the encryption-key. The first method is the symmetric-key 
encryption, and the second method is the public-key encryption. 
\Warning[Source]{Please! I think Schneier wrote something about this}

\subsection{Symmetric-key encryption}\label{ch:symmetric}
The symmetric-key encryption uses the same key to encode and decode 
messages. Distrubution of the key, when using the symmetric-key 
encryption is troublesome and the fact that both parties need access to 
the same secret key is a major drawback of the symmetric key 
encryption, as compared to the public-key encryption method. Sending 
the key in an email is a bad idea, since the persons who want to read 
our messages are most likely already listening. They will therefore 
obtain the key as well as the means to decode the messages. Both the 
CSA and the AES encryption methods are symmetric-key encryptions, using 
the same key for encryption and decryption.

\Warning[Source]{Schneier har jag för mig igen}

\subsection{Public-key encryption}\label{ch:public}
The public-key encryption uses a public key that anyone can look up, 
and a secret key that only one person knows 
\citep[pp. 25--32]{Simmons:1992}. For instance say that the two 
persons, Bob and Alice, want to communicate. Bob produces a keypair 
\(P_{Bob}\) (Bob’s public key) and \(S_{Bob}\) (Bob’s secret key) and 
publishes \(P_{Bob}\) for anyone to see. When Alice wants to send Bob a 
message, she looks up Bob’s public key \(P_{Bob}\), which she then uses 
to encode her message. When she sends Bob the message, Bob decodes the 
message using his secret key \(S_{Bob}\) \citep{Schneier:2003}. 
Since Alice now knows both the plaintext, and can find out what the 
corresponding ciphertext will be, she could potentially try to find 
Bob's secret key, as described in section \ref{sec:kpa}.

\subsection{Combination of encryption}
Since the public-key encryption seems secure and easy to manage, how 
come it is not the only used encryption method? The reason is that the 
public-key encryption is not as effective as the symmetric-key 
encryption. It is common to utilize a combination of those two since an 
easy, effective way to encrypt messages is desirable.

To combine the two encryption methods, the symmetric-key algorithm 
encodes the plaintext into a ciphertext. Then the public-key encryption 
encrypts the key used by the symmeteric-key encryption. The encoded key 
is then sent together with the ciphertext to the recipient, which 
decodes the symmetric key using the secret key. The plaintext is once 
again received through decrypting the ciphertext with that key.

\section{Ciphers}
A cipher is the same as an encryption algorithm, which operates on 
either plaintexts or ciphertexts to perform encryption or decryption. 
Figure \ref{img:ciphers} describes how the different kinds of ciphers 
can be split into sub-groups. The first branch splits into Classical-, 
Rotor Machine- and Modern ciphers. Substitution and Transposition are 
still used in modern algorithms. The Modern ciphers are the Private 
key and Public key (descibed in chapter \ref{ch:symmetric} and 
\ref{ch:public}). The CSA algorithm uses both the stream- and block 
ciphers, while the AES algorithm only uses a block cipher.

\begin{figure}
  \includegraphics[width=\textwidth]{cipher.png}
  \caption{Different kinds of ciphers \citep{CipherTax:2013}}
  \label{img:ciphers}
\end{figure}

There are mainly two kinds of ciphers that are used when designing 
modern cryptosystems. Those ciphers are called block ciphers and 
stream ciphers. Many systems use a combination of block ciphers and 
stream ciphers to provide security. 

\Warning[Source]{At least the CSA uses it. But you need a source}

\subsection{Block cipher}\label{sec:BlockCipher}
A block cipher operates on fixed sized sets of data. These sets are 
called blocks, hence the reason that they are called block ciphers. 
Them being fixed sizes might cause a need for padding the blocks, in 
case the plaintext contains a number of bytes that is not a denominator
of the blocksize. Block ciphers often use a combination of S-boxes 
(Substitution boxes) and P-boxes (Permutation boxes) in a so-called 
SP-network (S-box / P-box network) (Figure \ref{img:SPNetwork}).

There are many modes of block ciphers, but the two recommended by 
\citet{Schneier:2003} are the CBC-mode and the CTR-mode.

\emph{CBC} stands for \emph{cipher block chaining} and is performed by 
encrypting the result of an XOR (basic logic component) between 
an Initialization Vector (IV) and the plaintext. The resulting 
ciphertext is then fed back to the XOR, this time replacing the IV.
This means that the data input into the cipher will be the result of 
an XOR between the previous result, and the upcoming data. This is 
then put into an XOR with the next plaintext, which is then encrypted 
in the cipher. For reference, see image \ref{img:CBCmode} in 
appendix \ref{app:fig}. \citep[pp. 109--111]{Stinson:2006}

\emph{CTR} stands for \emph{counter}, and refers to the way the IV is 
generated. The counter outputs a value, which is encoded with the key. 
The encoded output is sent to an XOR together with the plaintext, 
producing the ciphertext. The counter is incremented and the procedure 
is iterated \citep[p. 111]{Stinson:2006}.

\subsection{Stream cipher} \label{sec:StreamCipher}
Stream ciphers work on streams of data. They usually consist of a 
keystream generator which performs an XOR with the data 
\cite[pp. 67]{Simmons:1992}. An effective implementation of the stream 
cipher is to use a linear feedback shift-register which uses the 
current internal state (key) to produce the next state by a simple 
XOR-addition between two or more of the bits in the state. This is 
mainly used because of how easy it is to construct in hardware 
\citep{LFSR:2008}.

\subsection{Decryption}
Decryption is often performed by reversing the encryption. You need to 
know the algorithm, preferably through a mathematical representation, 
to calculate how to obtain the plaintext from the ciphertext. A 
description of how this is done for the CBC-mode (described in 
\ref{sec:BlockCipher}) is described in \ref{sec:CBCcalc} in appendix 
\ref{app:examples}. It is here assumed that the decryption algorithm 
is known, for simplicity. 
\Warning[Source]{feels like a given, but still}

\section{Confusion and Diffusion}\label{ch:ConfDiff}
Two properties that are needed to ensure that a cipher provides 
security are confusion and diffusion \citep{Shannon:1949}. 
\emph{Confusion} refers to making the relationship between ciphertext 
and key as complex as possible. \emph{Diffusion} refers to replacing 
and shuffling the data, to make it impossible to analyze data 
statistically. This is usually done by performing substitutions and 
permutations in a simple pattern multiple times. This can easily be 
done by using an SP-network (S-box / P-box network) 
\citep[pp. 74--79]{Stinson:2006}. The very first, as well as the last 
step, of SP-Networks is usually an XOR between the subkey and the data. 
This is called \emph{whitening}, and is according to 
\citet[p. 75]{Stinson:2006} regarded as a very effective way to prevent 
encryption/decryption without a known key. The goal of this is to make 
it hard to find the key, even though one has access to multiple 
plaintext/ciphertext pairs produced with the same key 
\citep{Shannon:1949}.

However, a cipher is not guaranteed to be secure just because it 
provides these two properties.

\subsection{S-boxes and P-boxes}
The S-box is one of the basic components that is used when creating 
ciphers. An S-box takes a number of input bits and creates an equal 
number of output bits. The way they are generated is non-linear. 
Implementing an S-box can effectively be done using lookup tables. 
Each input has to correspond to a unique output, to make sure that the 
functionality of the S-Box can be uniquely reversed. If it can not be 
reversed, descrambling will be impossible. 
\citep[pp. 74--75]{Stinson:2006}

% CONFUSION OR DIFFUSION?
The second basic component used in cryptography is the P-box. A P-box 
shuffles and thereby rearranges the order of given bits. This can be 
viewed in the SP-network in figure \ref{img:SPNetwork}, where the P-box 
is represented by the dotted rectangle in the middle.

\begin{figure}[h!]
  \begin{center}
    \includegraphics[width=0.5\textwidth]{SPnetwork}
    \caption{SP-Network}
    \label{img:SPNetwork}
  \end{center}
\end{figure}

%CONFUSION OR DIFFUSION?

\section{Secrecy}
Although encryption is important, as well as the strength of the 
encryption, keeping the algorithm secret is never a good idea. A simple 
mistake when designing an algorithm might turn an encryption that would 
otherwise have been strong, incredibly weak. It is therefore a bad 
idea to use small scale algorithms (designed for the use of just a few 
persons for instance). If you instead use an open algorithm, faults 
will most likely be discovered and fixed by experienced cryptographers 
\citep[pp. 23]{Schneier:2003}. Keeping the key, which is used to 
encrypt the data, secret is what is important.