Algorithm

[WIL]๐Ÿ™ˆPINTOS_KAIST : Project 3. VIRTUAL MEMORY (1) : HASH TABLE ๐Ÿ™‰

NOHCODING 2022. 12. 6. 05:46
๋ฐ˜์‘ํ˜•
๐Ÿšซ ํ˜„์žฌ ๊ธ€์€ ์ •ํ™•ํ•˜์ง€ ์•Š์€ ์ •๋ณด๊ฐ€ ์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์–ธ์ œ๋“ ์ง€ ํ‹€๋ฆฐ ๋ถ€๋ถ„์— ๋Œ€ํ•ด ๋Œ“๊ธ€์„ ๋‹ฌ์•„์ฃผ์„ธ์š”! ๐Ÿšซ

0. Hash Table์ด๋ž€ ๋ญ”๊ฐ€์š”?

Hash Table์ด์ „์— Hashing์— ๋Œ€ํ•ด ์ดํ•ดํ•˜๋ฉด Hash Table์— ๋Œ€ํ•œ ์ดํ•ด๊ฐ€ ๋น ๋ฅด๋‹ค.

 

(1) Hashing์ด๋ž€?

 '์ฝœ๋ผ', 'ํ™˜ํƒ€', '์‚ฌ์ด๋‹ค'๋ฅผ ์ž„์˜์˜ ์ž๋ฃŒ๊ตฌ์กฐ์— ์ €์žฅํ•œ๋‹ค๊ณ  ํ•˜์ž. ์˜ˆ์‹œ๊ฐ€ 3๊ฐœ์ผ๋•Œ๋Š” ์ฐพ๊ธฐ๊ฐ€ ๋น ๋ฅด์ง€๋งŒ ์„ธ์ƒ์— ์žˆ๋Š” ๋ชจ๋“  ์Œ๋ฃŒ์ˆ˜๋ฅผ ์ €์žฅํ•œ๋‹ค๊ณ  ํ• ๋•Œ, ํ•ด๋‹น ์Œ๋ฃŒ๋ฅผ ์ €์žฅํ•˜๊ณ   ์›ํ•˜๋Š” ์Œ๋ฃŒ๋ฅผ ์ž๋ฃŒ๊ตฌ์กฐ์—์„œ ๊ฒ€์ƒ‰ํ•˜๋Š” ์‹œ๊ฐ„์€ ๋น„ํšจ์œจ์ ์ผ ๊ฒƒ์ด๋‹ค. ๊ทธ๋ž˜์„œ hash funtion์„ ํ†ตํ•ด ์ €์žฅํ•˜๊ณ ์ž ํ•˜๋Š” ๊ฐ’์„ ์ž„์˜์˜ intํ˜•(์ˆซ์ž)๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ์ €์žฅํ•œ๋‹ค. ์ฆ‰, ๋ฌธ์ž๋ฅผ ๊ฐ€์ ธ์™€ ์ˆซ์ž๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ณผ์ •์„ ํ•ด์‹ฑ(Hashing)์ด๋ผ๊ณ  ํ•œ๋‹ค.๊ทธ๋ฆฌ๊ณ  ๊ธ€์ž๋ฅผ ํŠน์ • ์ˆซ์ž๋กœ ๋ณ€ํ™˜ํ•˜๋Š”๋ฐ ์‚ฌ์šฉํ•˜๋Š” ์ฝ”๋“œ๋ฅผ ํ•ด์‰ฌ ํ•จ์ˆ˜(hash funtion)์ด๋‹ค.

'์ฝœ๋ผ' -> 123
'ํ™˜ํƒ€' -> 160
'์‚ฌ์ด๋‹ค' -> 170

 

(2) Hash Table์ด๋ž€?

๊ทธ๋Ÿผ Hash Table์€? ์Œ์œผ๋กœ ์ด๋ฃจ์–ด์ง„ ๊ฐ’(ex: 'ํ‚ค': 168, '๋‚˜์ด': 20 )๋“ค์˜ ๋ฆฌ์ŠคํŠธ์ด๋‹ค.  ์ฒซ๋ฒˆ์งธ ํ•ญ๋ชฉ์€ ํ‚ค(key)๋ผ๊ณ  ๋ถ€๋ฅด๊ณ , ๋‘ ๋ฒˆ์งธ ํ•ญ๋ชฉ์€ ๊ฐ’์ด๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค. keys๊ฐ’๋“ค์€ hash function์„ ํ†ตํ•ด hashing์ด ์ง„ํ–‰๋˜๊ณ , ์ง„ํ–‰๋œ ๊ฐ’๋“ค์€ buckets์ด๋ผ๋Š” ๊ณณ์— ์ €์žฅํ•œ๋‹ค. Hash Table์ด ๊ฒ€์ƒ‰์†๋„๊ฐ€๋น ๋ฅธ์ด์œ ๋Š” ํ•˜๋‚˜ํ•˜๋‚˜ ์ฐพ์ง€ ์•Š๊ณ  hashing์ด ์ง„ํ–‰๋œ keys๊ฐ’๋“ค๋กœ ์ฐพ๊ธฐ ๋•Œ๋ฌธ์— O(1)์ด ์†Œ์š”๋œ๋‹ค.

 

(3) ์–ด์ฉœ ๊ทธ๋Ÿผ Hash Table์€ ์™„๋ฒฝํ•œ ์ž๋ฃŒ๊ตฌ์กฐ์ธ๊ฐ€์š”?

  Hash Table์€ ์ข‹์€ ์ž๋ฃŒ๊ตฌ์กฐ์ด์ง€๋งŒ, ์„ธ ์š”์ธ์— ๋”ฐ๋ผ์„œ ํšจ์œจ์„ฑ์ด ์ •ํ•ด์ง„๋‹ค. 

 

  •  ํ•ด์‹œ ํ…Œ์ด๋ธ”์— ์–ผ๋งˆ๋‚˜ ๋งŽ์€ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๋Š”๊ฐ€ 
  • ํ•ด์‹œ ํ…Œ์ด๋ธ”์—์„œ ์–ผ๋งˆ๋‚˜ ๋งŽ์€ ์…€์„ ์“ธ ์ˆ˜ ์žˆ๋Š”๊ฐ€
  • ์–ด๋–ค ํ•ด์‹œ ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋Š”๊ฐ€. 

์ ์€ ์…€์— ๋งŽ์€ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๋ฉด ์ถฉ๋Œ์ด ๋งŽ์„ ํ…Œ๊ณ , ํ•ด์‹œ ํ…Œ์ด๋ธ”์˜ ํšจ์œจ์„ฑ์€ ๋–จ์–ด์งˆ ๊ฒƒ์ด๋‹ค. ๋งˆ์ง€๋ง‰ ํ•ด์‹œ ํ•จ์ˆ˜๊ฐ€ ํšจ์œจ์„ฑ์„ ์ขŒ์šฐ ํ•˜๋Š” ์ด์œ ๋Š” ํ•ด์‹œ ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ํ•ด์‹ฑ์ด ์ง„ํ–‰๋œ ํŠน์ • key๊ฐ’๋“ค์ด ํŠน์ • ์ˆซ์ž์—๋งŒ ๋ชฐ๋ ค ์ถฉ๋Œ์ด ์ผ์–ด๋‚  ๊ฒฝ์šฐ์ด๋‹ค. ์ข‹์€ ํ•ด์‹œ ํ•จ์ˆ˜๋ž€ ์‚ฌ์šฉ ๊ฐ„๋Šฅํ•œ ๋ชจ๋“  (bucket์˜ )์…€์— ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์‚ฐ์‹œํ‚ค๋Š” ํ•จ์ˆ˜์ด๋‹ค. ๋ฐ์ดํ„ฐ๋ฅผ  ๋„“๊ฒŒ ํผํŠธ๋ฆด ์ˆ˜๋ก ์ถฉ๋Œ์ด ์ ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. 

 

์ถฉ๋Œ ํšŸ์ˆ˜๋ฅผ ์ค„์–ด๋“ค ์ˆ˜๋ก ํ•ด์‹œ ํ…Œ์ด๋ธ”์˜ ํšจ์œจ์„ฑ์€ ๋†’๋‹ค. ์ด๋ก ์ƒ ์ถฉ๋Œ์„ ํ”ผํ•˜๋Š” ์ตœ์„ ์˜ ๋ฐฉ๋ฒ•์€ ํ•ด์‹œ ํ…Œ์ด๋ธ”์— ๋งŽ์€ ์ €์žฅ๊ณต๊ฐ„์„ ํ™•๋ณดํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ํ•˜์ง€๋งŒ 10๊ฐœ ์ €์žฅํ•˜๊ณ  ์‹ถ๋‹ค๊ณ  ํ•ด์‹œ ํ…Œ์ด๋ธ”์— ์…€์ด 100000๊ฐœ๋ฉด ์ถฉ๋Œ์ด ์ผ์–ด๋‚  ๊ฐ€๋Šฅ์„ฑ์€ ์ ์ง€๋งŒ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋„ˆ๋ฌด ๋งŽ์ด ์‚ฌ์šฉํ•˜๊ฒŒ ๋˜์–ด ๊ณต๊ฐ„๋ณต์žก๋„๊ฐ€ ๋Š˜์–ด๋‚˜๊ฒŒ๋œ๋‹ค. ํ•ด์‹œ ํ…Œ์ด๋ธ”์€ ๋ฐ˜๋“œ์‹œ ์ถฉ๋Œ ์กฐ์ •์„ ์ˆ˜ํ–‰ํ•ด์•ผํ•˜๋ฉฐ, ์ข‹์€ ํ•ด์‹œํ…Œ์ด๋ธ”์€ ๋งŽ์€ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋‚ญ๋น„ํ•˜์ง€ ์•Š์œผ๋ฉฐ ๊ท ํ˜•์„ ์œ ์ง€ํ•˜๋ฉฐ ์ถฉ๋Œ์„ ํ”ผํ•˜๋Š” ๊ฒƒ์ด๋‹ค. 

 

1. PINTOS์—์„œ์˜ Hash Table

pintos์—์„œ ํ•ด์‹œ ํ…Œ์ด๋ธ”์€ hash ๊ตฌ์กฐ์ฒด๋กœ ํ‘œํ˜„๋˜์–ด ์žˆ๋‹ค. hash ๊ตฌ์กฐ์ฒด์˜ ์‹ค์ œ์ ์ธ ๋ฉค๋ฒ„๋Š” ์ˆจ๊ฒจ์ ธ ์žˆ๋‹ค. ์ฆ‰ ํ•ด์‹œ ํ…Œ์ด๋ธ”์„ ์‚ฌ์šฉํ•˜๋Š” ์ฝ”๋“œ๋Š” ์ง์ ‘์ ์œผ๋กœ ์ ‘๊ทผํ•  ์ˆ˜ ์—†๋„๋ก ์ˆจ๊ฒจ์ ธ ์žˆ๋‹ค.  ํ•ด์‹œ ํ…Œ์ด๋ธ”์€ struct hash_elem ํƒ€์ž…์˜ ์›์†Œ๋กœ ์—ฐ์‚ฐํ•˜๊ฒŒ ๋œ๋‹ค.

 

 (1) struct hash

pintos์—์„œ bucket_cnt = 4 ๋กœ ์ดˆ๊ธฐํ™” ๋˜์–ด ์žˆ๊ณ , ๋ฆฌ์ŠคํŠธํ˜• buckets๊ฐ€ ์กด์žฌํ•œ๋‹ค. Hashing function์„ ํ†ตํ•ด hashing๋  ๊ฐ’๋“ค์ด list buckets์— ๋‹ด๊ธด๋‹ค.

/* Hash table. */
struct hash {
	size_t elem_cnt;            /* Number of elements in table. */
	size_t bucket_cnt;          /* Number of buckets, a power of 2. */
	struct list *buckets;       /* Array of `bucket_cnt' lists. */
	hash_hash_func *hash;       /* Hash function. */
	hash_less_func *less;       /* Comparison function. */
	void *aux;                  /* Auxiliary data for `hash' and `less'. */
};

 

 (2) hash_hash_func 

hash_hash_func์€ ์–ด๋–ป๊ฒŒ ๋“ค์–ด์˜ฌ ์ง€ ๋ชจ๋ฅด๋Š” ๊ฐ hash_elem์— ๊ฐ’์— ๋Œ€๋น„ํ•œ ํ•จ์ˆ˜์ด๋‹ค. hash_elem์ด ๋‹ค์–‘ํ•œ ์ž๋ฃŒํ˜•์œผ๋กœ ๋“ค์–ด์˜ฌ ์ˆ˜ ์žˆ์œผ๋‹ˆ, ํ•ด๋‹น ์ž๋ฃŒํ˜•์— ๋งž์ถฐ hash_bytes, hash_string, hash_int๋ฅผ ์•Œ๋งž๊ฒŒ ์‚ฌ์šฉํ•˜๋ฉด ๋œ๋‹ค. ์ด ๊ฐ๊ฐ์˜ ํ•จ์ˆ˜๊ฐ€ hash_funtion์ด๋‹ค. ๊ฐ ํ•จ์ˆ˜๋“ค์€ Fowler-Noll-Vo์˜ ๋น„ ์•”ํ˜ธ ํ•ด์‹œํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, define์œผ๋กœ ์ •์˜๋œ ์†Œ์ˆ˜๋“ค์„ ๊ณฑํ•˜๊ณ  XOR ์—ฐ์‚ฐ์„ ํ†ตํ•ด Hash collision์„ ๋ง‰๋Š”๋‹ค.

/* include/lib/kernel/hash.h  */
typedef uint64_t hash_hash_func (const struct hash_elem *e, void *aux);
/* include/lib/kernel/hash.h  */
uint64_t hash_bytes (const void *, size_t);
uint64_t hash_string (const char *);
uint64_t hash_int (int);

   

    1) hash_byte

/* Fowler-Noll-Vo hash constants, for 32-bit word sizes. */
#define FNV_64_PRIME 0x00000100000001B3UL
#define FNV_64_BASIS 0xcbf29ce484222325UL

/* Returns a hash of the SIZE bytes in BUF. */
uint64_t
hash_bytes (const void *buf_, size_t size) {
	/* Fowler-Noll-Vo 32-bit hash, for bytes. */
    
	const unsigned char *buf = buf_;
	uint64_t hash;

	ASSERT (buf != NULL);

	hash = FNV_64_BASIS;
	while (size-- > 0)
		hash = (hash * FNV_64_PRIME) ^ *buf++;

	return hash;
}

 

 

 2) hash_string

/* Returns a hash of string S. */
uint64_t
hash_string (const char *s_) {
	const unsigned char *s = (const unsigned char *) s_;
	uint64_t hash;

	ASSERT (s != NULL);

	hash = FNV_64_BASIS;
	while (*s != '\0')
		hash = (hash * FNV_64_PRIME) ^ *s++;

	return hash;
}

 

 3) hash_int 

uint64_t
hash_int (int i) {
	return hash_bytes (&i, sizeof i);
}

 

(3) hash_init

/* Initializes hash table H to compute hash values using HASH and
   compare hash elements using LESS, given auxiliary data AUX. */
/* hash funtion๋“ค์„ ์‚ฌ์šฉํ•˜์—ฌ hash table์„ ์ดˆ๊ธฐํ™”ํ•œ๋‹ค. */
bool
hash_init (struct hash *h,
		hash_hash_func *hash, hash_less_func *less, void *aux) {
	h->elem_cnt = 0;
	h->bucket_cnt = 4;
	h->buckets = malloc (sizeof *h->buckets * h->bucket_cnt);
	h->hash = hash;
	h->less = less;
	h->aux = aux;

	if (h->buckets != NULL) {
		hash_clear (h, NULL);
		return true;
	} else
		return false;
}

 

(4) hasy_entry(elem, type, member)

hash_elem ๊ตฌ์กฐ์ฒด์˜ ํฌ์ธํ„ฐ์ธ elem ๊ตฌ์กฐ์ฒด๋ฅผ ํฌํ•จํ•œ ๊ตฌ์กฐ์ฒด์˜ ํฌ์ธํ„ฐ๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋Š” ํ•จ์ˆ˜์ด๋‹ค. 

#define hash_entry (elem, type, member) { /* Omit details */ }

 

(5) rehash 

๋ฒ„ํ‚ท๋‹น ์š”์†Œ๊ฐ€ ์ ๋ฆฌ์ง€ ์•Š๋„๋ก ๊ด€๋ฆฌ

/* Element per bucket ratios. */
#define MIN_ELEMS_PER_BUCKET  1 /* Elems/bucket < 1: reduce # of buckets. */
#define BEST_ELEMS_PER_BUCKET 2 /* Ideal elems/bucket. */
#define MAX_ELEMS_PER_BUCKET  4 /* Elems/bucket > 4: increase # of buckets. */
/* Changes the number of buckets in hash table H to match the
   ideal.  This function can fail because of an out-of-memory
   condition, but that'll just make hash accesses less efficient;
   we can still continue. */
static void
rehash (struct hash *h) {
	size_t old_bucket_cnt, new_bucket_cnt;
	struct list *new_buckets, *old_buckets;
	size_t i;

	ASSERT (h != NULL);

	/* Save old bucket info for later use. */
	old_buckets = h->buckets;
	old_bucket_cnt = h->bucket_cnt;

	/* Calculate the number of buckets to use now.
	   We want one bucket for about every BEST_ELEMS_PER_BUCKET.
	   We must have at least four buckets, and the number of
	   buckets must be a power of 2. */
	new_bucket_cnt = h->elem_cnt / BEST_ELEMS_PER_BUCKET;
	if (new_bucket_cnt < 4)
		new_bucket_cnt = 4;
	while (!is_power_of_2 (new_bucket_cnt))
		new_bucket_cnt = turn_off_least_1bit (new_bucket_cnt);

	/* Don't do anything if the bucket count wouldn't change. */
	if (new_bucket_cnt == old_bucket_cnt)
		return;

	/* Allocate new buckets and initialize them as empty. */
	new_buckets = malloc (sizeof *new_buckets * new_bucket_cnt);
	if (new_buckets == NULL) {
		/* Allocation failed.  This means that use of the hash table will
		   be less efficient.  However, it is still usable, so
		   there's no reason for it to be an error. */
		return;
	}
	for (i = 0; i < new_bucket_cnt; i++)
		list_init (&new_buckets[i]);

	/* Install new bucket info. */
	h->buckets = new_buckets;
	h->bucket_cnt = new_bucket_cnt;

	/* Move each old element into the appropriate new bucket. */
	for (i = 0; i < old_bucket_cnt; i++) {
		struct list *old_bucket;
		struct list_elem *elem, *next;

		old_bucket = &old_buckets[i];
		for (elem = list_begin (old_bucket);
				elem != list_end (old_bucket); elem = next) {
			struct list *new_bucket
				= find_bucket (h, list_elem_to_hash_elem (elem));
			next = list_next (elem);
			list_remove (elem);
			list_push_front (new_bucket, elem);
		}
	}

	free (old_buckets);
}
๋ฐ˜์‘ํ˜•