Data Integrity and Number Systems in Computing

Posted by Anonymous and classified in Computers

Written on in English with a size of 9.13 KB

This is a great request covering two fundamental areas of digital communications and computing!
1. Error Detecting and Correcting Codes
Error control codes are essential for ensuring data integrity during transmission or storage by adding redundancy (extra bits) to the original data.
A. Error Detection Codes
These codes can only signal that an error has occurred but cannot determine the location of the error to fix it.
| Code | Principle | Capability |
|---|---|---|
| Parity Check (Simplest) | An extra bit (parity bit) is added to the data word to make the total number of '1's either even (Even Parity) or odd (Odd Parity). | Detects any single-bit error or any odd number of errors. Cannot detect an even number of errors. |
| Checksum | Data is divided into blocks, and the sum of all the blocks is calculated and sent along with the data. The receiver re-calculates the sum and compares it. | Detects most errors, commonly used in Internet protocols (e.G., IP, TCP, UDP). |
| Cyclic Redundancy Check (CRC) | A mathematical remainder (or check value) is generated by dividing the data polynomial by a generator polynomial. The remainder is appended to the data. | Excellent at detecting burst errors (multiple consecutive bit errors). Widely used in storage devices and networking. |
B. Error Correcting Codes
These codes add enough redundancy so that the receiver can not only detect an error but also pinpoint its location and correct it without requesting retransmission.


| Code | Principle | Capability |
|---|---|---|
| Hamming Code | Adds multiple parity-check bits strategically placed at positions that are powers of two (1, 2, 4, 8, etc.). Each parity bit checks overlapping groups of data bits. The pattern of failed checks (called the syndrome) uniquely identifies the error position. | Detects and corrects a single-bit error. Can detect a double-bit error (with an extra parity bit) but cannot correct it. |
| Reed-Solomon (RS) Code | A powerful code that treats blocks of data (symbols) rather than individual bits. | Excellent for correcting burst errors (common on CDs, DVDs, and in deep-space communication). |
> Key Concept: Hamming Distance. > The minimum Hamming distance (d_{\text{min}}) between any two valid codewords in a code determines its error capability:
>  * The code can detect up to d_{\text{min}}-1 errors.
>  * The code can correct up to \lfloor\frac{d_{\text{min}}-1}{2}\rfloor errors.
2. Character Representations: ASCII, EBCDIC, and Unicode -These standards define how letters, numbers, and symbols are mapped to a specific
| Feature | ASCII | EBCDIC | Unicode |
|---|---|---|---|
| Full Name | American Standard Code for Information Interchange | Extended Binary Coded Decimal Interchange Code | A universal character encoding standard |
| Bits/Character | 7 bits (128 characters) \rightarrow Often extended to 8 bits (256 characters) | 8 bits (256 characters) | Variable (1 to 4 bytes/character) |


| Main Use | PCs, Internet, Unix/Linux systems. The foundational encoding for modern computing. | Historically used by IBM mainframe and midrange systems. | Modern operating systems, the World Wide Web (via UTF-8). |
| Character Set | Basic English alphabet, digits, punctuation, and control characters. | Mainly Latin-based, but includes many specific control codes for IBM hardware. | Covers virtually all written scripts in the world (over 150,000 characters). |
| Superset/Subset | Subset of Unicode (the first 128 Unicode characters are the ASCII set). | Incompatible with ASCII/Unicode. Requires translation. | Superset of ASCII. |
| Example (A) | A = 65_{10} = 01000001_2 | A = 193_{10} = 11000001_2 | A = U+0041 (encoded as 41 in UTF-8/one byte) |
Key Takeaways
 * ASCII was the original standard that revolutionized computing by providing a common language for English text.
 * EBCDIC is a proprietary historical standard used almost exclusively on IBM mainframes. Its sorting sequence (numerals are sorted before letters) is different from ASCII and Unicode.
 * Unicode is the universal solution, overcoming the limitations of 8-bit encodings (like Extended ASCII and EBCDIC) to support global languages, mathematics, and emojis.
   * UTF-8 is the most popular Unicode encoding, using 1 byte for ASCII characters (maintaining backward compatibility) and up to 4 bytes for others. This makes it efficient for English text and flexible for global text.


This is a great question covering the fundamental ways that computers store and process both positive and negative integers, and real (fractional) numbers.
1. Integer Number Representations
To represent signed (positive or negative) integers, the most significant bit (MSB) is typically reserved as the sign bit (0 for positive, 1 for negative). The three primary methods are:
A. Sign-Magnitude (S-M)
 * Principle: The MSB is the sign, and the remaining bits represent the magnitude (absolute value) of the number in true binary form.
 * Example (8-bit):
   *    *  * Drawbacks:
   * Double Zero: There are two representations for zero: +0 (00000000) and -0 (10000000).
   * Complex Arithmetic: Addition and subtraction require different hardware logic based on the signs of the numbers.
B. 1's Complement Representation
 * Principle:
   * Positive Numbers: Same as Sign-Magnitude.
   * Negative Numbers: The negative of a number is found by taking the bitwise complement (inverting all 0s to 1s and 1s to 0s) of its positive representation.
 * Example (8-bit) for -45_{10}:
   * +45_{10} is 00101101_2.
   * Invert all bits \rightarrow 11010010_2.
     <!-- end list -->
   * Therefore, -45_{10} is 11010010_2.
 * Drawbacks:
   * Double Zero: Still has two representations for zero: +0 (00000000) and -0 (11111111).
   * End-Around Carry: Arithmetic operations are slightly complex due to the need for an "end-around carry" correction.


C. 2's Complement Representation
 * Principle: This is the standard method used by most modern computers.
   * Positive Numbers: Same as Sign-Magnitude.
   * Negative Numbers: The negative of a number is found by taking the 1's Complement and then adding 1 to the result.
 * Example (8-bit) for -45_{10}:
   * +45_{10} is 00101101_2.
   * Find 1's Complement: 11010010_2.
   * Add 1: 11010010_2 + 1 = 11010011_2.
     <!-- end list -->
   * Therefore, -45_{10} is 11010011_2.
 * Advantages:
   * Single Zero: Only one representation for zero (00000000).
   * Simplified Arithmetic: Subtraction can be performed by simply adding the 2's complement of the subtrahend, simplifying hardware design.
 * Range (n bits): Represents integers from -2^{n-1} to 2^{n-1} - 1.
2. Real Numbers: Normalized Floating-Point Representation (IEEE 754 Standard)
Real numbers (numbers with fractional parts, like 3.14 or 0.00012) are represented using Floating-Point notation, which is analogous to scientific notation. The widely accepted standard is IEEE 754.
A floating-point number is stored in three parts:
Where:
| Field | Description | Single-Precision (32-bit) | Double-Precision (64-bit) |
|---|---|---|---|
| S (Sign) | 1 bit. 0 for positive, 1 for negative. | 1 bit | 1 bit |
| E (Exponent) | The power of 2, stored in Biased form. | 8 bits (Bias = 127) | 11 bits (Bias = 1023) |


| M (Mantissa/Fraction) | The significant digits of the number. | 23 bits | 52 bits |
Normalized Representation
Normalization is the process of adjusting the binary number so that the most significant bit of the mantissa is a '1', which maximizes precision.
 * Binary Conversion: Convert the number into its binary scientific notation (e.G., N \times 2^E).
 * Normalization: The binary point is moved so that only a single '1' appears to the left of the binary point.
   * Example: 101.101_2 \times 2^0 is normalized to 1.01101_2 \times 2^2.
 * Implied Leading One: Since all normalized numbers in base 2 must start with 1. (the "1" before the binary point), the IEEE 754 standard does not store this leading '1' (the hidden bit). The stored Mantissa/Fraction (M) is just the bits after the binary point (01101\dots).
 * Biased Exponent: The actual exponent (E) is stored as a Biased Exponent (E_{\text{stored}} = E_{\text{actual}} + \text{Bias}). This allows both positive and negative exponents to be stored efficiently.
This detailed tutorial provides a good walkthrough of the two's complement system for representing negative integers: 2's Complement Representation of Negative Numbers | Binary Arithmetic.

Related entries: