Cryptography - Hash

1 - About

A hash function is a crypto algorithm which converts/encrypt a large, possibly variable-sized amount of data into a small datum, usually a single integer that may serve as an index to an array.

A function for summarizing or probabilistically identifying data. Typically this means one applies a mathematical formula to the data, producing a string which is probably more or less unique to that data. The string is much shorter than the original data, but can be used to uniquely identify it.

The hash value is shorter and of fixed-length.

The input data is often called the message, and the output (the hash value or hash) is often called the message digest or simply the digest.

The values returned by a hash function are called :

  • hash values,
  • hash codes,
  • hash sums,
  • hash key,
  • or simply hashes.

It's a one way function (you can't retrieve the input) but the output will change for a slight change in the input.

3 - Usage

Hash functions are used to:

In database application, Hash functions are mostly used to speed up:

such as :

4 - Algo

5 - Library

6 - Property

6.1 - Character Position

Position Independent: CRC16 and CRC32 (cyclical redundancy checksum, 16 bit and 32 bit). The hash of Nico or ociN will be the same. The position of the letter doesn't matter.

6.2 - Cryptographic strength

7 - Collision

An hash collision happens when two different inputs produce the same result. Hash collision are very similar to the Birthday_problem.

See Hash Collision

8 - Distributed System

In distributed system, you hash because you don't want to request always the same server. If you hash by time for instance, you can have only one server that serves the most request (ie for the current time)

Distributed hash table (DHT)

8.1 - Regular

Regular Hashing = Round Robin = Modular function

Assign M data keys to N servers assign each key to server = k mod N where:

  • k is the key
  • and N is the number of server

For instance:

  • key 0 → server 0
  • key 1 → server 1
  • key 2 → server 2
  • key 3 → server 0
  • key 4 → server 1
  • ….

If the number of servers increase from N to 2N, every existing key needs to be remapped, relocated.

8.2 - Consistent

In consistent hashing, only a portion of keys need to be transferred to new storage space (a server) when more storage allocated. All data keys do not need to be rehashed like in regular hashing.

When the node D is added, only the data on the orange segment need to be reallocate from the Node A to the Node D.

Each server memorize the locations of other servers in the ring and can forward any incoming request.

9 - Reference / Documentation

Data Science
Data Analysis
Data Science
Linear Algebra Mathematics

Powered by ComboStrap