문자열 조작

알고리즘

문자열 조작

NowChan 2022. 4. 24. 16:18

Q1. : 주어진 문자열이 팰린드롬인지 확인하라. 대소문자를 구분하지 않으며, 영문자와 숫자만을 대상으로 한다.

https://leetcode.com/problems/valid-palindrome/

Example 1:

Input: s = "A man, a plan, a canal: Panama"
Output: true
Explanation: "amanaplanacanalpanama" is a palindrome.

Solutions :

import re

class Solution(object):
    def isPalindrome(self, s):

        s = s.lower()
        s = re.sub('[^a-z0-9]', '', s)
        
        return s == s[::-1]

문자열의 경우 슬라이싱 속도가 매우 빠르다. 슬라이싱 문법 중 s[:]은 사본을 리턴한다.

Q2 : 로그를 재정렬하라. 기준은 다음과 같다.

로그의 가장 앞 부분은 식별자다.
문자로 구성된 로그가 숫자 로그보다 앞에 온다.
식별자는 순서에 영향을 끼치지 않지만, 문자가 동일한 경우 식별자 순으로 한다.
숫자 로그는 입력 순서대로 한다.

https://leetcode.com/problems/reorder-data-in-log-files/

Example 1:

Input: logs = ["dig1 8 1 5 1","let1 art can","dig2 3 6","let2 own kit dig","let3 art zero"]
Output: ["let1 art can","let3 art zero","let2 own kit dig","dig1 8 1 5 1","dig2 3 6"]


Explanation:
The letter-log contents are all different, so their ordering is "art can", "art zero", "own kit dig".
The digit-logs have a relative order of "dig1 8 1 5 1", "dig2 3 6".

Solutions :

class Solution(object):
    def reorderLogFiles(self, logs):
        dig_log, let_log = [], []
        
        # split log by type
        for log in logs:
            if log.split()[1].isdigit():
                dig_log.append(log)
            else:
                let_log.append(log)
        
        let_log.sort(key = (lambda x: (x.split()[1:], x.split()[0])))
        
        return let_log + dig_log

lists.sort(key=lambda)로 정렬할 기준을 정할 수 있다.

# s의 요소 x를 공백을 기준으로 단어를 분리했을 때, 두 번째 단어를 기준으로 먼저 정렬한다.
# 두 번째 단어가 같을 경우 첫 번째 단어를 기준으로 정렬한다. 
s.sort(key=lambda x: (x.split()[1], x.split()[0]))

Q3. 금지된 단어를 제외한 가장 흔하게 등장하는 단어를 출력하라. 대소문자 구분을 하지 않으며, 구두점(마침표, 쉼표 등) 또한 무시한다.

https://leetcode.com/problems/most-common-word/

Example 1:

Input: paragraph = "Bob hit a ball, the hit BALL flew far after it was hit.", banned = ["hit"]
Output: "ball"

try:

import collections
import re

class Solution(object):
    def mostCommonWord(self, paragraph, banned):

        # split words except non-alphabet or number
        paragraph = re.sub("[^A-z0-9 ]", ' ', paragraph.lower())
        words = paragraph.split()
        
        # remove banned words
        for i in range(len(banned)):
            words = [word for word in words if word != banned[i]]
        
        # get target word
        target = collections.Counter(words).most_common(1)
        return target[0][0]

Solution :

# split words except non-alphabet or number
paragraph = re.sub(r"[^\w]", ' ', paragraph.lower())

정규식에서 ^는 not을 의미하고, \w는 단어 문자를 뜻한다. 즉, 위 정규식은 단어 문자가 아닌 모든 문자를 공백으로 바꾼다. 정규식에 r이 붙으면 문자열을 그대로 반환한다. (ex, print('a') → 'a\n')

정규식 작성법

# remove banned words
words = [word for word in words if word not in banned]

defaultdict와 max 함수를 사용해 최빈 단어를 구할 수도 있다.

# get target word
counts = collections.defaultdict(int)
for word in words:
	counts[word]+=1

# max(iterable, key=func)
return max(counts, key=counts.get)

Q4. 문자열 배열을 받아 애너그램 단위로 그룹핑하라.

https://leetcode.com/problems/group-anagrams/

Example 1:

Input: strs = ["eat","tea","tan","ate","nat","bat"]
Output: [["bat"],["nat","tan"],["ate","eat","tea"]]

Solution :

import collections

class Solution(object):
    def groupAnagrams(self, strs):
        anagrams = collections.defaultdict(list)

        for word in strs:
            anagrams[''.join(sorted(word))].append(word)

        return anagrams.values()

우선, str들 속 단어를 정렬한 후 ''문자열에 join()해서 dict의 key로 만든다. 그러면 같은 문자로 구성된 단어는 같은 key의 list에 append된다. 그 후 dict 속 값들을 출력하면 같은 문자로 구성된 단어끼리 묶인 list들이 출력된다.

anagrams를 default dict로 선언한 이유는 존재하지 않는 key를 삽입하려할 경우 KeyError가 발생하기 때문이다.

Q5. 가장 긴 팰린드롬 부분 문자열을 출력하라.

https://leetcode.com/problems/longest-palindromic-substring/

Example 1:

Input: s = "babad"
Output: "bab"

Solution :

class Solution(object):   
    def longestPalindrome(self, s):
        # 팰린드롬을 판별하면서 투 포인터 확장
        def expand(left, right):
            while left >=0 and right < len(s) and s[left] == s[right]:
                left -= 1
                right += 1
            return s[left+1 : right] # left-1 보정
        
        # 해당 사항이 없을 때 빠르게 리턴
        if len(s) < 2 or s == s[::-1]:
            return s
        
        result = '' # 문자열
        # 슬라이딩 윈도우 우측으로 이동
        for i in range(len(s)-1): # 짝수 window가 s의 끝까지 도달
            result = max(result,
                        expand(i, i+1), # 짝수(2칸) window
                        expand(i, i+2), # 홀수(3칸) window
                        key=len) # 길이 기준으로
        return result

2칸, 3칸 window를 만든다. ex) bb, aba를 모두 찾아내기 위해
window를 확장한 후 최대 크기의 팰린드롬을 반환한다.
확장한 문자열 중 len이 가장 큰 문자열을 리턴한다.

현재글문자열 조작

공부한 것을 기록하는 곳

scikit learn, SentimentIntetsityAnalyzer, WordNetLemmatizer, nlp, clustering, stackign ensemble, NMF, konlpy, kmeans, truncated svd, LinearRegression, LightGBM, Mercari Price Suggestion Challenge, 피마 인디언 당뇨병 예측, Kaggle, Silhouette Evaluation, 파이썬 머신러닝 완벽 가이드, okt, DBSCAN, 결정 트리,

my-github

Today :
Yesterday :

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

공부 기록