Cryptography - Hash

About

A hash function is an encryption crypto algorithm that

  • takes as data as input (possibly large and of variable-sized)
  • and produces a short fixed-length integer value (generally printed as an hexadecimal string)
  • that identify uniquely the input data with a very high probability.

In this function:

  • the input data is called the message
  • the output (digest) is called:
    • the hash value
    • the hash code,
    • the hash sum,
    • the hash key,
    • or simply hash.

It's a one way function (you can't retrieve the input) but the output will change for a slight change in the input.

The integer is produced within a given range. A good hash function should generate integer uniformly on this range.

Example

With sha256:

Usage

Hash functions are used to:

In database application, Hash functions are mostly used to speed up:

such as :

Algo

Without secret

With Secret

Library

Property

Character Position

Position Independent: CRC16 and CRC32 (cyclical redundancy checksum, 16 bit and 32 bit). The hash of Foo or ooF will be the same. The position of the letter doesn't matter.

Cryptographic strength

see wiki/Strong_cryptography

Collision

An hash collision happens when two different inputs produce the same result. See Collisions of Hash or Identifier Generation

Distribution

In a distributed system or distributed data structure, you hash to distribute the request or the data.

  • In distributed system, you don't want to request always the same server. If you hash by time for instance, you can have only one server that serves the most request (ie for the current time)
  • In distributed data structure (ie hasmap), you don't want the data to go on the same place (ie in bucket)

Regular

Regular Hashing = Round Robin = Modular function

Assign M data keys to N servers assign each key to server = k mod N where:

  • k is the key
  • and N is the number of server

For instance:

  • key 0 → server 0
  • key 1 → server 1
  • key 2 → server 2
  • key 3 → server 0
  • key 4 → server 1
  • ….

If the number of servers increase from N to 2N, every existing key needs to be remapped, relocated.

Consistent

In consistent hashing, only a portion of keys need to be transferred to new storage space (a server) when more storage allocated. All data keys do not need to be rehashed like in regular hashing.

Consistent Hashing

When the node D is added, only the data on the orange segment need to be reallocate from the Node A to the Node D.

Each server memorize the locations of other servers in the ring and can forward any incoming request.

1) 2) 3) 4) 5)

Task Runner