알고리즘 문제풀이 정리 - Arrays & Hashing (2)

7/20/2025·임성혁

이전 게시글에서 이어서 작성되는 2부이다.

Medium

Given an array of strings strs, group all anagrams together into sublists. You may return the output in any order.

An anagram is a string that contains the exact same characters as another string, but the order of the characters can be different.

Example 1:

Input: strs = ["act","pots","tops","cat","stop","hat"]
 
Output: [["hat"],["act", "cat"],["stop", "pots", "tops"]]

Example 2:

Input: strs = ["x"]
 
Output: [["x"]]

Example 3:

Input: strs = [""]
 
Output: [[""]]

Constraints:

1 <= strs.length <= 1000.
0 <= strs[i].length <= 100
strs[i] is made up of lowercase English letters.

각 단어의 알바벳 구성요소가 어떻게 되는지 dictionary로 count한 다음 그것과 단어 정보를 리스트 A로 함께 저장하고자했다.

그러고 리스트 B에 겹치지 않는 리스트 A들을 저장한다.

그러기 위해 리스트 A를 리스트B에 추가하기 전에 dictionary가 서로 일치하는지 확인한다.

일치하면 같은 것, 즉 anagram으로 간주하여 일치한 리스트B 의 단어 리스트에 해당하는 부분에 for문의 단어를 연장해놓는다.

그리고 일치하지 않으면 리스트 A 를 리스트 B에 새로이 추가한다.

리스트 A를 리스트 B에 추가하여 dictionary가 서로 일치하는지 확인한다.

failure1

class Solution:
    def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
        answer=[]
        wordboxes=[]
        group=[]
        for word in strs:
            wordbox=[0]*26
            for c in word:
                wordbox[ord(c)-ord('a')]+=1
            if not group:
                wordboxes.append(wordbox)
                wordboxes.extend(word)
                group.append(wordboxes)
                continue
            for i in group:
                if i[0]==wordbox:
                    i[1].extend(word)
                else:
                    wordboxes.append(wordbox)
                    wordboxes.extend(word)
                    group.append(wordboxes)
 
        return answer

failure2

class Solution:
    def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
        answer=[]
        group=[]
        for word in strs:
            wordboxes=[]
            wordbox=[0]*26
            for c in word:
                wordbox[ord(c)-ord('a')]+=1
            if not group:
                wordboxes.append(wordbox)
                wordboxes.insert(1, word)
                group.append(wordboxes)
                continue
            for i in group:
                if i[0]==wordbox:
                    i[1].insert(0, word)
                else:
                    wordboxes.append(wordbox)
                    wordboxes.insert(1, word)
                    group.append(wordboxes)
        for i in group:
            answer.append(i[1])
 
        return answer

이런저런 시도를 하다가 결국 실패했다.

컴파일 자체가 안되고 에러가 계속 발생했다.

Traceback (most recent call last):
  File "/box/script.py", line 73, in process_input
    output = solution.groupAnagrams(input)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/box/script.py", line 29, in groupAnagrams
    i[1].insert(0, word)
    ^^^^^^^^^^^
AttributeError: 'str' object has no attribute 'insert'

Traceback (most recent call last):
  File "/box/script.py", line 79, in process_input
    output_copy = [l.copy() for l in output]
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/box/script.py", line 79, in <listcomp>
    output_copy = [l.copy() for l in output]
                   ^^^^^^
AttributeError: 'str' object has no attribute 'copy'

에러도 다양하게 발생해서 결국 포기하고 solution을 보기로했다. 아직 append insert extend 함수에 대한 이해가 부족한 것 같다.

trial3

solution을 보기전에 진짜 마지막으로 한번 더 해보기로 했다.

class Solution:
    def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
        answer=[]
        used=[False]*len(strs)
        for i in range(len(strs)):
            group=[]
            if used[i]:
                continue
            used[i]=True
            chunki={}
            group.append(strs[i])
            for j in range(len(strs[i])):
                chunki[strs[i][j]]=1+chunki.get(strs[i][j],0)
            for m in range(i+1,len(strs)):
                chunkm={}
                if used[m]:
                    continue
                for n in range(len(strs[m])):
                    chunkm[strs[m][n]]=1+chunkm.get(strs[m][n],0)
                if chunki==chunkm:
                    used[m]=True
                    group.append(strs[m])
            answer.append(group)
        return answer

무수한 코드 수정 끝에 최종적으로 통과한 답안

output이 어떤 단어들은 잘 범주에 묶는데 어떤 단어들은 못 묶길래 처음에는 머리속에서 계속 시뮬레이션을 돌려봤는데 찾을 수가 없었다.

그래서 머리속에서만 해결하려는걸 포기하고 어디서 문제가 발생하는건지 알기위해 print문을 코드 중간 중간 끼워넣으면서 디버깅했다.

특히 chunki, chunkm print하고 주목해서 봤는데 chunkm이 for문 밖에서 초기화되어 다음 단어의 알바벳이 딕셔너리에 정리될 때 이전 단어가 정리된 딕셔너리에 누적되고 있던것이 문제였다. 간단한 문제였는데 이걸 못찾아서 너무 긴 시간을 돌아왔다..

교훈 : 나를 과신하지말고 print를 적극적으로 써서 디버깅하자

각 단어의 알바펫 빈도를 dictionary로 세고 그것을 for문을 통해 단어끼리 같은 dictionary인지 비교하는 과정을 반복했다. 같은 dictionary라면 anagram이므로 group이라는 list에 저장하고 모든 word에 대한 검토가 끝나면 answer에 추가했다.

bruteforce이지만 반복횟수를 줄이기 위해 boolean 배열을 하나 만들어서 answer에 들어간 단어는 기록해놓고 그 단어들의 dictionary를 만드는 차례가 오면 continue로 넘겼다.

나는 알파벳 빈도를 count하는 것만 hashtable을 구현했지 같은 anagram을 판별하기 위해 탐색하는 과정에서 hashtable을 쓰지 못했다.

그걸 구현하려면 딕셔너리에서 특정 key에 value가 있는지 여부를 확인하는 과정이 필요한데, dict은 value가 없는 경우 참조할 수 없는 문제가 있었고, 당시에 defaultdict의 존재를 몰랐다. 그래서 계속 에러가 나서 해당 구현을 시도하다 포기했다.

따라서 풀이의 시간복잡도는 $\mathcal{O}(m^2 \cdot n)$ 로 높은 편이다.

모든 답안이 공통적으로 defaultdict을 사용한다.

defaultdict이 뭔지 간략히 정리하고 가자.

아래는 claude에게 defaultdict이 뭔지 물어보고 얻는 답변이다.

defaultdict는 Python의 collections 모듈에서 제공하는 딕셔너리의 하위 클래스입니다. 일반 딕셔너리와 달리 존재하지 않는 키에 접근할 때 KeyError를 발생시키지 않고, 대신 미리 지정된 기본값을 자동으로 생성해주는 특별한 딕셔너리입니다.

defaultdict를 생성할 때는 기본값을 생성할 함수(callable)를 인자로 전달해야 합니다:

from collections import defaultdict
 
# 기본값이 0인 defaultdict
dd = defaultdict(int)
print(dd['존재하지않는키'])  # 0
 
# 기본값이 빈 리스트인 defaultdict
dd_list = defaultdict(list)
print(dd_list['존재하지않는키'])  # []
 
# 기본값이 빈 문자열인 defaultdict
dd_str = defaultdict(str)
print(dd_str['존재하지않는키'])  # ''

일반 딕셔너리는 존재하지 않는 키에 접근하면 KeyError를 발생시킵니다:

# 일반 딕셔너리
normal_dict = {}
# print(normal_dict['없는키'])  # KeyError 발생
 
# defaultdict
dd = defaultdict(int)
print(dd['없는키'])  # 0 (에러 없음)

from collections import defaultdict
 
text = "hello world"
counter = defaultdict(int)
 
for char in text:
    counter[char] += 1  # 키가 없어도 0부터 시작
 
print(dict(counter))  # {'h': 1, 'e': 1, 'l': 3, 'o': 2, ' ': 1, 'w': 1, 'r': 1, 'd': 1}

from collections import defaultdict
 
students = [
    ('김철수', '수학'),
    ('이영희', '영어'),
    ('박민수', '수학'),
    ('최지영', '영어'),
    ('김철수', '영어')
]
 
groups = defaultdict(list)
for name, subject in students:
    groups[subject].append(name)
 
print(dict(groups))
# {'수학': ['김철수', '박민수'], '영어': ['이영희', '최지영', '김철수']}

from collections import defaultdict
 
# 2차원 딕셔너리
nested = defaultdict(dict)
nested['사용자1']['점수'] = 100
nested['사용자1']['레벨'] = 5
 
# 또는 defaultdict를 중첩
nested2 = defaultdict(lambda: defaultdict(int))
nested2['사용자1']['점수'] = 100
nested2['사용자2']['점수'] += 50  # 기본값 0에서 시작

=> dict도 중첩 딕셔너리를 만들 수는 있지만 매번 키의 존재를 확인해야하는 불편함이 있다.

기본값을 생성하는 함수를 확인하거나 변경할 수 있습니다:

dd = defaultdict(list)
print(dd.default_factory)  # <class 'list'>
 
dd.default_factory = set  # 기본값 생성 함수 변경
print(dd['새키'])  # set()

dd = defaultdict(int)
dd['a'] = 1
dd['b'] = 2
 
# 일반 딕셔너리로 변환
normal_dict = dict(dd)
print(type(normal_dict))  # <class 'dict'>

키 존재 여부 확인: defaultdict에서 키에 접근하면 자동으로 해당 키가 생성되므로, 키 존재 여부를 확인할 때는 in 연산자를 사용해야 합니다:

dd = defaultdict(int)
print('key' in dd)  # False
print(dd['key'])    # 0 (키가 생성됨)
print('key' in dd)  # True

기본값 생성 함수: 인자 없이 호출 가능한 객체(callable)여야 합니다:

# 올바른 예
dd = defaultdict(list)
dd = defaultdict(lambda: "기본값")
 
# 잘못된 예
# dd = defaultdict("기본값")  # TypeError

defaultdict는 특히 데이터 집계, 그룹핑, 카운팅 작업에서 코드를 간결하게 만들어주는 매우 유용한 도구입니다.

class Solution:
    def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
        res = defaultdict(list)
        for s in strs:
            sortedS = ''.join(sorted(s))
            res[sortedS].append(s)
        return list(res.values())

핵심아이디어

아나그램은 같은 문자들을 다른 순서로 배열한 단어들이므로, 각 문자열을 정렬했을 때 같은 결과가 나오면 아나그램이다.

과정

문자열 s의 문자들을 알파벳 순으로 정렬한다.

=> .join()은 Python의 문자열 메서드로, 리스트나 반복 가능한 객체의 요소들을 하나의 문자열로 합쳐주는 함수 / 구분자.join(반복가능한_객체)

정렬된 문자열을 키로 사용해 원본 문자열을 해당 그룹에 추가한다.

딕셔너리의 모든 값(values, 리스트들)을 리스트로 변환해 반환한다.

시간복잡도는 $\mathcal{O}(m \log n)$ 이다.

class Solution:
    def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
        res = defaultdict(list)
        for s in strs:
            count = [0] * 26
            for c in s:
                count[ord(c) - ord('a')] += 1
            res[tuple(count)].append(s)
        return list(res.values())

dictionary 대신 알파벳이 26자이므로 배열로 알파벳 빈도를 기록하고 (아스키코드 이용),

그 배열 값들을 tuple로 묶어 defaultdict에 key로, 단어를 value로 저장한다.

그런데 이때 .append를 활용하여 value가 list인 defaultdict에 단어가 추가되며 저장될 수 있도록 하였다.

그러고 .values()로 그룹화된 단어 리스트들의 리스트를 제출했다.

단 .vaules()의 type은 <class 'dict_values'>이기에 list()를 활용해 변환한 것이다.

Medium

Given an integer array nums and an integer k, return the k most frequent elements within the array.

The test cases are generated such that the answer is always unique.

You may return the output in any order.

Example 1:

Input: nums = [1,2,2,3,3,3], k = 2
 
Output: [2,3]

Example 2:

Input: nums = [7,7], k = 1
 
Output: [7]

Constraints:

1 <= nums.length <= 10^4.
-1000 <= nums[i] <= 1000
1 <= k <= number of distinct elements in nums.

class Solution:
    def topKFrequent(self, nums: List[int], k: int) -> List[int]:
        answer=[]
        numbers={}
        for num in nums:
            numbers[num]=1+numbers.get(num,0)
        sorted_nums=sorted(numbers.items(), key=lambda x: x[1], reverse=True)
        for i in range(k):
            answer.append(sorted_nums[i][0])
        return answer

dictionary를 이용한 hash table로 nums에 있는 각 숫자의 빈도를 계산하고

그것을 value를 기준으로 내림차순 정렬한다음 상위 k개를 담아서 제출했다.

이후 문제 풀 때 정렬에 시간을 뺏기고 싶지 않아서 sorted 함수의 정의를 찾아보고 적용했다.

그런데 solutions를 보니 정렬을 직접 구현해보는게 이 문제의 출제의도인 것 같다.

시간복잡도는 $\mathcal{O}(n \log n)$ .

solutions를 보기전에 sorted() 함수가 어떻게 동작하는지 간단히 보고가자.

# 기본 사용법
numbers = [3, 1, 4, 1, 5, 9, 2, 6]
sorted_numbers = sorted(numbers)
print(sorted_numbers)  # [1, 1, 2, 3, 4, 5, 6, 9]
 
# 원본은 변경되지 않음
print(numbers)  # [3, 1, 4, 1, 5, 9, 2, 6]

# key 함수가 각 요소에 적용되어 비교 기준을 만듭니다
words = ['apple', 'pie', 'banana', 'book']
 
# 길이 기준 정렬
sorted_by_length = sorted(words, key=len)
print(sorted_by_length)  # ['pie', 'apple', 'book', 'banana']
 
# 실제로는 이런 과정을 거칩니다:
# 'apple' → len('apple') = 5
# 'pie' → len('pie') = 3
# 'banana' → len('banana') = 6
# 'book' → len('book') = 4
# 그래서 3, 4, 5, 6 순서로 정렬됩니다

data = {'apple': 3, 'banana': 1, 'orange': 2}
 
# 1단계: data.items() 실행
items = list(data.items())
print(items)  # [('apple', 3), ('banana', 1), ('orange', 2)]
 
# 2단계: key=lambda x: x[1] 적용
# 각 튜플에서 인덱스 1(값)을 추출
# ('apple', 3) → 3
# ('banana', 1) → 1  
# ('orange', 2) → 2
 
# 3단계: 추출된 값들로 정렬
# 1, 2, 3 순서로 정렬되어
# [('banana', 1), ('orange', 2), ('apple', 3)]

# sorted가 내부적으로 하는 일을 단계별로 보여주는 예시
def my_sorted_simulation(items, key_func):
    # 1단계: 각 항목에 key 함수 적용
    keyed_items = []
    for item in items:
        key_value = key_func(item)
        keyed_items.append((key_value, item))
    
    print("Key 함수 적용 후:", keyed_items)
    
    # 2단계: key 값을 기준으로 정렬
    keyed_items.sort(key=lambda x: x[0])
    
    # 3단계: 원본 항목만 추출
    result = [item for key_value, item in keyed_items]
    return result
 
# 테스트
data = [('apple', 3), ('banana', 1), ('orange', 2)]
result = my_sorted_simulation(data, lambda x: x[1])
print("최종 결과:", result)

sorted()는 key 함수를 각 요소에 적용해서 비교 기준을 만든 후 정렬한다.

class Solution:
    def topKFrequent(self, nums: List[int], k: int) -> List[int]:
        count = {}
        for num in nums:
            count[num] = 1 + count.get(num, 0)
 
        arr = []
        for num, cnt in count.items():
            arr.append([cnt, num])
        arr.sort()
 
        res = []
        while len(res) < k:
            res.append(arr.pop()[1])
        return res

내가 접근한 방식과 동일하다.

다만 dictionary의 sort를 하는 과정을 더 상세하게 구현하였고

(value가 이차원배열의 앞[0]에 오게 저장하고 해당 배열을 정렬하여 value 기준 정렬을 구현),

그렇게 정렬된 이차원배열에서 k개를 pop 시켜서 제출할 list에 넣었는데(append) 그걸 for문 대신 while로 구현한 코드가 흥미롭다. 코드가 더 깔끔해보이는 인상을 준다.

시간복잡도는 $$.

class Solution:
    def topKFrequent(self, nums: List[int], k: int) -> List[int]:
        count = {}
        for num in nums:
            count[num] = 1 + count.get(num, 0)
 
        heap = []
        for num in count.keys():
            heapq.heappush(heap, (count[num], num))
            if len(heap) > k:
                heapq.heappop(heap)
 
        res = []
        for i in range(k):
            res.append(heapq.heappop(heap)[1])
        return res

숫자들의 빈도를 hash table로 count하는 것까지는 동일하다.

다만 숫자 빈도의 내림차순 정렬을 heap을 이용해 구현했고 그를 위해 (value, key) 형태의 튜플로 만들어 밀어넣었다.

시간복잡도는 $\mathcal{O}(n \log k)$ .

class Solution:
    def topKFrequent(self, nums: List[int], k: int) -> List[int]:
        count = {}
        freq = [[] for i in range(len(nums) + 1)]
 
        for num in nums:
            count[num] = 1 + count.get(num, 0)
        for num, cnt in count.items():
            freq[cnt].append(num)
        
        res = []
        for i in range(len(freq) - 1, 0, -1):
            for num in freq[i]:
                res.append(num)
                if len(res) == k:
                    return res

bucket sort를 이용한 풀이법이다.

문제에서 순서 상관 없이 상위 k개를 반환하면 된다고 했기에 가능한 풀이이다.

코드 시작에 freq를 초기화 한 방식이 인상적이다.

배열안에 배열을 숫자 개수보다 하나 많게 초기화해서 이차원 배열을 구성한다.

이는 모든 수가 같은 숫자일 때 max인 개수가 전체 숫자 개수가 되는데 그 개수에 해당하는 Index에 숫자를 넣기 위함이다.

마지막으로 위에서 내려오면서 k개를 정답에 추가한다.

시간복잡도는 $\mathcal{O}(n)$ .

Medium

Design an algorithm to encode a list of strings to a single string. The encoded string is then decoded back to the original list of strings.

Please implement encode and decode

Example 1:

Input: ["neet","code","love","you"]
 
Output:["neet","code","love","you"]

Example 2:

Input: ["we","say",":","yes"]
 
Output: ["we","say",":","yes"]

Constraints:

0 <= strs.length < 100
0 <= strs[i].length < 200
strs[i] contains only UTF-8 characters.

class Solution:
 
    def encode(self, strs: List[str]) -> str:
        encoded=""
        for word in strs:
            for c in word:
                if c=='/':
                    encoded+="//"
                else:
                    encoded+=c
            encoded+="/s"
        return encoded
 
    def decode(self, s: str) -> List[str]:
        decoded=[]
        word=""
        getready=False
        for c in s:
            if getready:
                if c=='/':
                    word+=c
                    getready=False
                elif c=='s':
                    decoded.append(word)
                    word=""
                    getready=False
            else:
                if c=='/':
                    getready=True
                else:
                    word+=c
        return decoded

줄바꿈 개행이 '\n'으로 이루어지는 점에 착안했다.

/s로 단어가 달라지는 구간을 표시하였고 /가 문자열에 포함될 것을 고려하여 /가 있으면 //로 기록해 /가 문자열에 있음을 알 수 있도록 하였다.

/가 들어오면 스위치(bool -> True)를 켜고, 아닌 경우에는 단어에 포함되도록 기록했다.

스위치가 켜졌을 경우 /인지 s인지를 구분하는과정을 거친다. / 인경우는 /를 단어에 포함시키고, s라면 단어를 리스트에 초과하고 단어를 담는 리스트를 초기화시킨다.

class Solution:
    def encode(self, strs: List[str]) -> str:
        if not strs:
            return ""
        sizes, res = [], ""
        for s in strs:
            sizes.append(len(s))
        for sz in sizes:
            res += str(sz)
            res += ','
        res += '#'
        for s in strs:
            res += s
        return res
 
    def decode(self, s: str) -> List[str]:
        if not s:
            return []
        sizes, res, i = [], [], 0
        while s[i] != '#':
            cur = ""
            while s[i] != ',':
                cur += s[i]
                i += 1
            sizes.append(int(cur))
            i += 1
        i += 1
        for sz in sizes:
            res.append(s[i:i + sz])
            i += sz
        return res

각 단어들의 길이를 숫자와 쉼표(',')로 단어의 개수를 나누어 str의 시작에 기록해놓고 # 기호로 그것이 마무리됨을 표시한다. 그러고 모든 단어를 string으로 연결한다.

#가 나오기 전까지 쉼표와 숫자를 구분하며 배열에 각 단어의 길이를 미리 저장해 놓는다. 숫자가 1 figure가 아닌경우, 예를 들어 2 figure인 경우에는 char가 2개 쓰이는 셈이므로 string으로 일단 숫자를 잇고 쉼표가 나오면 int로 변환해 배열에 저장한다.

그런다음 각 단어의 길이를 토대로 해당 길이만큼 list에 포함시킨다.

Time complexity: O(m) for each encode() and decode() function calls.
Space complexity: O*(m+n) for each encode() and decode() function calls.

Where m is the sum of lengths of all the strings and n is the number of strings.

class Solution:
    
    def encode(self, strs: List[str]) -> str:
        res = ""
        for s in strs:
            res += str(len(s)) + "#" + s
        return res
 
    def decode(self, s: str) -> List[str]:
        res = []
        i = 0
        
        while i < len(s):
            j = i
            while s[j] != '#':
                j += 1
            length = int(s[i:j])
            i = j + 1
            j = i + length
            res.append(s[i:j])
            i = j
            
        return res

이번 경우는 encoding 과정에서 단어의 길이를 배열에 저장하지 않는다. 왜냐하면 string으로 합칠 때 바로바로 길이 + # + 단어 이런식으로 이어버리고 다음단어로 넘어가기 때문이다.

그래서 시간 복잡도는 solution1과 같지만 단어의 길이들을 저장할 필요가 없기에 공간복잡도가 줄어든다. (그래서 optimal)

근데 이러면 십의 자리거나 #가 단어에 들어가면 안될 것 같다는 생각이 드는데, 일단 decoding 과정을 이어서 보자.

숫자를 찾는과정을 투 포인터로 구현했다. 숫자의 시작점을 i, 끝점을 j로 두고 index j 에서 #가 나오면 멈추고 그 구간의 string을 숫자로 바꿔주었다.

#가 나올 때까지 str를 읽어들인다는 것은 숫자가 일의 자리이든 십의 자리이든 백의 자리이든 상관없다는 소리다.

그리고 그렇게 몇자리 숫자인지 알게되면 해당 숫자의 길이 만큼 string을 읽어들인다. 따라서 단어에 #가 나오더라도 전혀 상관이 없다.

물론 이론상으로는 그렇지만, 실제로는 네트워크 전송과정에서 일부 숫자 관련 비트가 뒤집히거나 하여 error가 발생한다면 이후 encoding에 문제가 생길 것이다.

이번 게시글에서는 Arrays & Hashing 관련 3문제 풀이를 적어보았다. 다음 게시글을 마지막으로 Arrays & Hashing 관련 문제 풀이를 마칠 예정이다.

알고리즘 문제풀이 정리 - Arrays & Hashing (2)

알고리즘 문제풀이 정리 - Arrays & Hashing (1)

알고리즘 문제풀이 정리 - Arrays & Hashing (3)