아이템 기반 최근접 이웃 협업 필터링 실습¶

최근접 이웃 협업 필터링은 사용자 기반과 아이템 기반으로 분류합니다. 이 중 추천 장확도가 더 뛰어난 아이템 기반의 협업 필터링을 구현해 보겠습니다. 사용자가 영화 평점을 매긴 사용자-영화 평점 행렬 데이터 세트를 다운로드하겠습니다. https://grouplens.org/datasets/movielens/latest/ 에서 내려받을 수 있습니다.

데이터 가공 및 변환¶

import numpy as np
import pandas as pd

movies = pd.read_csv('/content/drive/MyDrive/military/grouplens/movies.csv')
ratings = pd.read_csv('/content/drive/MyDrive/military/grouplens/ratings.csv')
print(movies.shape)
print(ratings.shape)

(9742, 3)
(100836, 4)

movies.head(2)

movies는 영화에 대한 메타 정보인 title, genres를 가지고 있습니다.

ratings.head(2)

ratings에는 userId, movieId, rating이 있고 timestamp는 현재로서는 큰 의미가 없는 칼럼입니다. 평점은 0.5점 단위로 5점까지 줄 수 있습니다. 영화는 아이템 기반 필터링에서 아이템을 담당합니다. 현재 데이터는 로우 형태로 돼있으므로, 사용자-영화 데이터 세트로 변경해야합니다. RubberDuck

DataFrame.pivot_table() 함수를 이용하면 사용자-아이템 데이터 세트로 변경하기 쉽습니다. DataFrame.pivot_table('rating', index='userId', columns='movieId')로 입력하면 칼럼은 movieId의 값들로 입력되고, 레코드는 'rating'값이 들어갑니다.

ratings = ratings[['userId', 'movieId', 'rating']]
ratings_matrix = ratings.pivot_table('rating', index='userId', columns='movieId')
ratings_matrix.head(3)

movies와 ratings를 join해서 movieId에 맞는 'title'로 바꾸겠습니다. 그리고 NaN값을 0으로 바꾸겠습니다.

# title 칼럼을 얻기 위해 movies와 조인
rating_movies = pd.merge(ratings, movies, on='movieId')

# columns='title'로 title 칼럼으로 피벗 수행.
ratings_matrix = rating_movies.pivot_table('rating', index='userId', columns='title')

# NaN 값을 모두 0으로 변환
ratings_matrix = ratings_matrix.fillna(0)
ratings_matrix.head(3)

영화 간 유사도 산출¶

코사인 유사도를 이용해 영화 간 유사도를 추정하겠습니다. sklearn의 cosine_similarity() 함수를 적용하면 행을 기준으로 서로 다른 행을 비교해 유사도를 산출합니다. 영화를 기준으로 적용하려면 ratings_matrix의 전치 행렬을 cosine_similarity()에 넣어야 합니다. 이를 위해 DataFrame.transpose() 함수를 이용합니다.

ratings_matrix_T = ratings_matrix.transpose()
ratings_matrix_T.head(3)

from sklearn.metrics.pairwise import cosine_similarity

item_sim = cosine_similarity(ratings_matrix_T, ratings_matrix_T)

# cosine_similarity()로 반환된 넘파이 행렬을 영화명을 매핑해 DataFrame으로 변환
item_sim_df = pd.DataFrame(data=item_sim, index=ratings_matrix.columns,\
                           columns=ratings_matrix.columns)
print(item_sim_df.shape)
item_sim_df.head(3)

(9719, 9719)

item_sim_df를 이용해 'Godfather, The (1972)'와 유사도가 높은 상위 6개 영화를 추출해보겠습니다.

item_sim_df["Godfather, The (1972)"].sort_values(ascending=False)[:6]

title
Godfather, The (1972)                        1.000000
Godfather: Part II, The (1974)               0.821773
Goodfellas (1990)                            0.664841
One Flew Over the Cuckoo's Nest (1975)       0.620536
Star Wars: Episode IV - A New Hope (1977)    0.595317
Fargo (1996)                                 0.588614
Name: Godfather, The (1972), dtype: float64

대부와 완전히 장르가 다른 영화도 포함돼있습니다. 이번엔 'Inception (2010)"과 유사도가 높은 순으로 자신을 제외한 상위 5개 영화를 추출하겠습니다.

item_sim_df["Inception (2010)"].sort_values(ascending=False)[1:6]

title
Dark Knight, The (2008)          0.727263
Inglourious Basterds (2009)      0.646103
Shutter Island (2010)            0.617736
Dark Knight Rises, The (2012)    0.617504
Fight Club (1999)                0.615417
Name: Inception (2010), dtype: float64

'다크 나이트'가 가장 유사도가 높고, 나머지는 스릴러와 액션이 가미된 영화가 높은 유사도를 나타내고 있습니다. 아이템 기반 유사도 데이터는 사용자의 평점 정보를 모두 취합해 영화에 따라 유사한 다른 영화를 추천할 수 있게 추천해줍니다. 이번엔 개인에게 특화된(Personalized) 영화 추천 알고리즘을 만들어 보겠습니다.

아이템 기반 최근접 이웃 협업 필터링으로 개인화된 영화 추천¶

아이템 기반 영화 유사도 데이터는 모든 사용자의 평점을 기준으로 영화의 유사도를 생성했습니다. 개인화된 영화 추천은 아직 개인이 관람하지 않은 영화에 대해 기존에 관람한 영화의 평점 데이터를 기반으로 모든 영화의 예측 평점을 계산해 높은 순으로 추천하는 방식입니다.

아이템 기반의 협업 필터링에서 개인화된 예측 평점식은 아래와 같습니다.

$$\hat{R}_{u, i} = \frac{\sum_{}^{N}(S_{i, N}*R_{u, N})}{\sum_{}^{N}(\left | S_{i, N} \right |)}$$

$$\hat{R}_{u, i} : 사용자 u, 아이템\ i의\ 예측\ 평점값 $$

$$S_{i, N} : 아이템\ i와\ 가장\ 유사도가\ 높은\ Top-N개\ 아이템\ 유사도\ 벡터 $$

$$R_{u, N} : 사용자\ u,\ 아이템\ i와\ 가장\ 유사도가\ 높은\ Top-N개\ 아이템에\ 대한\ 실제\ 평점\ 벡터$$

위 변수들은 rating_matrix, item_sim_df를 numpy 행렬로 변환해서 구할 수 있습니다. 위 식을 구현하는 함수 predict_rating()을 만들어 보겠습니다.

def predict_rating(ratings_arr, item_sim_arr):
  ratings_pred = ratings_arr.dot(item_sim_arr)/ np.array([np.abs(item_sim_arr).sum(axis=1)])
  return ratings_pred

ratings_pred = predict_rating(ratings_matrix.values, item_sim_df.values)
ratings_pred_matrix = pd.DataFrame(data=ratings_pred, index=ratings_matrix.index, \
                                   columns = ratings_matrix.columns)
ratings_pred_matrix.head(3)

실제 영화 평점이 0인 부분의 대다수가 예측값으로 채워졌습니다. 이는 R[u, N]과 S[i, N]의 모든 요소들이 내적되어 더해진 값이 보통 0이 아니기 때문입니다.

이 예측 결과와 실제 평점에 얼마나 차이가 있는지 MSE를 측정하겠습니다. 실제 데이터에서 0인 부분은 계산에서 제외하겠습니다.

from sklearn.metrics import mean_squared_error

# 사용자가 평점을 부여한 영화에 대해서만 예측 성능 평가 MSE를 구함.
def get_mse(pred, actual):
  # 평점이 있는 실제 영화만 추출
  pred = pred[actual.nonzero()].flatten()
  actual = actual[actual.nonzero()].flatten()
  return mean_squared_error(pred, actual)

print('아이템 기반 모든 최근접 이웃 MSE:', get_mse(ratings_pred, ratings_matrix.values))

아이템 기반 모든 최근접 이웃 MSE: 9.895354759094706

실제값과 예측값은 스케일이 달라서 MSE는 감소시키는 방향으로 개선해야한다고 생각하면 좋습니다.

예측 평점을 계산하는데 개별 영화와 모든 영화 간의 유사도 벡터를 이용하는 것이 아니라, 개별 영화와 가장 비슷한 유사도를 가지는 영화의 유사도 벡터만 예측값을 계산하는 데 적용합니다. 단점은 개별 예측값을 구하기 위해 행, 열 별로 for 루프를 반복하면서 Top-N 유사도 벡터를 구한다는 점입니다. 이는 데이터의 크기가 커지면 매우 오래 걸리는 로직입니다.

def predict_rating_topsim(ratings_arr, item_sim_arr, n=20):
  # 사용자-아이템 평점 행렬 크기만큼 0으로 채운 예측 행렬 초기화
  pred = np.zeros(ratings_arr.shape)

  # 사용자-아이템 평점 행렬의 열 크기만큼 루프 수행.
  for col in range(ratings_arr.shape[1]):
    # 유사도 행렬에서 유사도가 큰 순으로 n개 데이터 행렬의 인덱스 반환
    top_n_items = [np.argsort(item_sim_arr[:, col])[:-n-1:-1]]
    # 개인화된 예측 평점을 계산
    for row in range(ratings_arr.shape[0]):
      pred[row, col] = item_sim_arr[col, :][top_n_items].dot(ratings_arr[row, :][top_n_items].T)
      pred[row, col] /= np.sum(np.abs(item_sim_arr[col, :][top_n_items]))
    
  return pred

predict_rating_topsim() 함수를 이용해 예측 평점을 계산하고, 실제 평점과의 MSE를 구해보겠습니다.

ratings_pred = predict_rating_topsim(ratings_matrix.values, item_sim_df.values, n=20)
print('아이템 기반 최근접 Top-20 이웃 MSE: ', get_mse(ratings_pred, ratings_matrix.values))

# 계산된 예측 평점 데이터는 DataFrame으로 재생성
ratings_pred_matrix = pd.DataFrame(data=ratings_pred, index=ratings_matrix.index, \
                                   columns=ratings_matrix.columns)

/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:11: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  # This is added back by InteractiveShellApp.init_path()
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:12: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  if sys.path[0] == '':

아이템 기반 최근접 Top-20 이웃 MSE:  3.6949827608772314

MSE가 3.69로 기존 9.89보다 많이 향상됐습니다. userId=9인 사용자에 대해 영화를 추천해보겠습니다. 먼저 9번 사용자가 어떤 영화를 좋아하는지 확인해보겠습니다.

user_rating_id = ratings_matrix.loc[9, :]
user_rating_id[user_rating_id > 0].sort_values(ascending=False)[:10]

title
Adaptation (2002)                                                                 5.0
Austin Powers in Goldmember (2002)                                                5.0
Lord of the Rings: The Fellowship of the Ring, The (2001)                         5.0
Lord of the Rings: The Two Towers, The (2002)                                     5.0
Producers, The (1968)                                                             5.0
Citizen Kane (1941)                                                               5.0
Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)    5.0
Back to the Future (1985)                                                         5.0
Glengarry Glen Ross (1992)                                                        4.0
Sunset Blvd. (a.k.a. Sunset Boulevard) (1950)                                     4.0
Name: 9, dtype: float64

'오스틴 파워', '반지의 제왕'등 대작 영화, 어드벤처, 코미디 영화 등 흥행성이 좋은 영화에 좋은 평점을 주고 있습니다. 사용자가 이미 평점을 준 영화를 제외하고 추천할 수 있도록 평점을 주지 않은 영화를 리스트 객체로 반환하는 함수인 get_unseen_movies()를 생성합니다.

def get_unseen_movies(ratings_matrix, userId):
  # userId로 입력받은 사용자의 모든 영화 정보를 추출해 Series로 반환함.
  # 반환된 user_rating은 영화명(title)을 인덱스로 가지는 Series 객체임.
  user_rating = ratings_matrix.loc[userId, :]

  # user_rating이 0보다 크면 기존에 관람한 영화임. 대상 인덱스를 추출해 list 객체로 만듦.
  already_seen = user_rating[user_rating > 0].index.tolist()
  # 모든 영화명을 list 객체로 만듦.
  movies_list = ratings_matrix.columns.tolist()

  # list comprehension으로 already_seen에 해당하는 영화는 movie_list에서 제외함.
  unseen_list = [movie for movie in movies_list if movie not in already_seen]
  
  return unseen_list

사용자가 평점을 주지 않은 영화 리스트와 predict_rating_topsim()을 이용해 사용자에게 영화를 추천하는 recomm_movie_by_userid()를 만들겠습니다.

def recomm_movie_by_userid(pred_df, userId, unseen_list, top_n=10):
  # 예측 평점 DataFrame에서 사용자 id인덱스와 unseen_list로 들어온 영화명 칼럼을 추출해
  # 가장 예측 평점이 높은 순으로 정렬함.
  recomm_movies = pred_df.loc[userId, unseen_list].sort_values(ascending=False)[:top_n]
  return recomm_movies

# 사용자가 관람하지 않는 영화명 추출
unseen_list = get_unseen_movies(ratings_matrix, 9)

# 아이템 기반의 최근접 이웃 협업 필터링으로 영화 추천
recomm_movies = recomm_movie_by_userid(ratings_pred_matrix, 9, unseen_list, top_n=10)

# 평점 데이터를 DataFrame으로 생성.
recomm_movies = pd.DataFrame(data=recomm_movies.values, index=recomm_movies.index,\
                             columns=['pred_score'])
recomm_movies

'슈렉', '스파이더맨', '인디아나 존스' 등 다양하지만 높은 흥행성을 가진 작품이 추천됐습니다.

궁금한 점¶

DataFrame.pivot_table()
pd.merge()
pred[actual.nonzero()].flatten()
Item 기반에서 유사도를 구할 때 cosine_similarity는 행 벡터들의 값(평점)들이 유사한 것을 통해 cosine 값을 추정한다. 같은 사람이 각각 영화에 준 평점이 비슷할 수록 유사도값이 유사해지는 것 같다.

np.argsort(arr)[s:e:step] # s에서 e까지 step 만큼 더한 인덱스의 ndarray 반환
 A /= B # same as A=A/B

출처: 파이썬 머신러닝 완벽가이드(권철민)
사진 출처:

	movieId	title	genres
0	1	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy
1	2	Jumanji (1995)	Adventure\|Children\|Fantasy

title	'71 (2014)	'Hellboy': The Seeds of Creation (2004)	'Round Midnight (1986)	'Salem's Lot (2004)	'Til There Was You (1997)	'Tis the Season for Love (2015)	'burbs, The (1989)	'night Mother (1986)	(500) Days of Summer (2009)	*batteries not included (1987)	...All the Marbles (1981)	...And Justice for All (1979)	00 Schneider - Jagd auf Nihil Baxter (1994)	1-900 (06) (1994)	10 (1979)	10 Cent Pistol (2015)	10 Cloverfield Lane (2016)	10 Items or Less (2006)	10 Things I Hate About You (1999)	10 Years (2011)	10,000 BC (2008)	100 Girls (2000)	100 Streets (2016)	101 Dalmatians (1996)	101 Dalmatians (One Hundred and One Dalmatians) (1961)	101 Dalmatians II: Patch's London Adventure (2003)	101 Reykjavik (101 Reykjavík) (2000)	102 Dalmatians (2000)	10th & Wolf (2006)	10th Kingdom, The (2000)	10th Victim, The (La decima vittima) (1965)	11'09"01 - September 11 (2002)	11:14 (2003)	11th Hour, The (2007)	12 Angry Men (1957)	12 Angry Men (1997)	12 Chairs (1971)	12 Chairs (1976)	12 Rounds (2009)	12 Years a Slave (2013)	...	Zathura (2005)	Zatoichi and the Chest of Gold (Zatôichi senryô-kubi) (Zatôichi 6) (1964)	Zazie dans le métro (1960)	Zebraman (2004)	Zed & Two Noughts, A (1985)	Zeitgeist: Addendum (2008)	Zeitgeist: Moving Forward (2011)	Zeitgeist: The Movie (2007)	Zelary (2003)	Zelig (1983)	Zero Dark Thirty (2012)	Zero Effect (1998)	Zero Theorem, The (2013)	Zero de conduite (Zero for Conduct) (Zéro de conduite: Jeunes diables au collège) (1933)	Zeus and Roxanne (1997)	Zipper (2015)	Zodiac (2007)	Zombeavers (2014)	Zombie (a.k.a. Zombie 2: The Dead Are Among Us) (Zombi 2) (1979)	Zombie Strippers! (2008)	Zombieland (2009)	Zone 39 (1997)	Zone, The (La Zona) (2007)	Zookeeper (2011)	Zoolander (2001)	Zoolander 2 (2016)	Zoom (2006)	Zoom (2015)	Zootopia (2016)	Zulu (1964)	Zulu (2013)	[REC] (2007)	[REC]² (2009)	[REC]³ 3 Génesis (2012)	anohana: The Flower We Saw That Day - The Movie (2013)	eXistenZ (1999)	xXx (2002)	xXx: State of the Union (2005)	¡Three Amigos! (1986)	À nous la liberté (Freedom for Us) (1931)
userId
1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	4.0	0.0
2	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	3.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
3	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0

userId	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31	32	33	34	35	36	37	38	39	40	...	571	572	573	574	575	576	577	578	579	580	581	582	583	584	585	586	587	588	589	590	591	592	593	594	595	596	597	598	599	600	601	602	603	604	605	606	607	608	609	610
title
'71 (2014)	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	4.0
'Hellboy': The Seeds of Creation (2004)	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
'Round Midnight (1986)	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0

title	'71 (2014)	'Hellboy': The Seeds of Creation (2004)	'Round Midnight (1986)	'Salem's Lot (2004)	'Til There Was You (1997)	'Tis the Season for Love (2015)	'burbs, The (1989)	'night Mother (1986)	(500) Days of Summer (2009)	*batteries not included (1987)	...All the Marbles (1981)	...And Justice for All (1979)	00 Schneider - Jagd auf Nihil Baxter (1994)	1-900 (06) (1994)	10 (1979)	10 Cent Pistol (2015)	10 Cloverfield Lane (2016)	10 Items or Less (2006)	10 Things I Hate About You (1999)	10 Years (2011)	10,000 BC (2008)	100 Girls (2000)	100 Streets (2016)	101 Dalmatians (1996)	101 Dalmatians (One Hundred and One Dalmatians) (1961)	101 Dalmatians II: Patch's London Adventure (2003)	101 Reykjavik (101 Reykjavík) (2000)	102 Dalmatians (2000)	10th & Wolf (2006)	10th Kingdom, The (2000)	10th Victim, The (La decima vittima) (1965)	11'09"01 - September 11 (2002)	11:14 (2003)	11th Hour, The (2007)	12 Angry Men (1957)	12 Angry Men (1997)	12 Chairs (1971)	12 Chairs (1976)	12 Rounds (2009)	12 Years a Slave (2013)	...	Zathura (2005)	Zatoichi and the Chest of Gold (Zatôichi senryô-kubi) (Zatôichi 6) (1964)	Zazie dans le métro (1960)	Zebraman (2004)	Zed & Two Noughts, A (1985)	Zeitgeist: Addendum (2008)	Zeitgeist: Moving Forward (2011)	Zeitgeist: The Movie (2007)	Zelary (2003)	Zelig (1983)	Zero Dark Thirty (2012)	Zero Effect (1998)	Zero Theorem, The (2013)	Zero de conduite (Zero for Conduct) (Zéro de conduite: Jeunes diables au collège) (1933)	Zeus and Roxanne (1997)	Zipper (2015)	Zodiac (2007)	Zombeavers (2014)	Zombie (a.k.a. Zombie 2: The Dead Are Among Us) (Zombi 2) (1979)	Zombie Strippers! (2008)	Zombieland (2009)	Zone 39 (1997)	Zone, The (La Zona) (2007)	Zookeeper (2011)	Zoolander (2001)	Zoolander 2 (2016)	Zoom (2006)	Zoom (2015)	Zootopia (2016)	Zulu (1964)	Zulu (2013)	[REC] (2007)	[REC]² (2009)	[REC]³ 3 Génesis (2012)	anohana: The Flower We Saw That Day - The Movie (2013)	eXistenZ (1999)	xXx (2002)	xXx: State of the Union (2005)	¡Three Amigos! (1986)	À nous la liberté (Freedom for Us) (1931)
title
'71 (2014)	1.0	0.000000	0.000000	0.0	0.0	0.0	0.000000	0.0	0.141653	0.0	0.000000	0.000000	0.0	0.0	0.0	0.0	0.285169	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0.0	0.0	0.0	0.0	0.0	0.000000	0.0	0.0	0.0	0.000000	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.8	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.65561	0.0	0.0	0.0	0.212814	0.919145	0.0	0.0	0.120996	0.0	0.0	0.0	0.149201	0.0	0.0	0.0	0.178042	0.0	0.0	0.342055	0.543305	0.707107	0.0	0.0	0.139431	0.327327	0.0	0.0
'Hellboy': The Seeds of Creation (2004)	0.0	1.000000	0.707107	0.0	0.0	0.0	0.000000	0.0	0.000000	0.0	0.000000	0.715542	0.0	0.0	0.0	0.0	0.000000	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.150269	0.0	0.0	0.0	0.0	0.0	0.000000	0.0	0.0	0.0	0.124109	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.00000	0.0	0.0	0.0	0.148970	0.000000	0.0	0.0	0.000000	0.0	0.0	0.0	0.000000	0.0	0.0	0.0	0.000000	0.0	0.0	0.000000	0.000000	0.000000	0.0	0.0	0.000000	0.000000	0.0	0.0
'Round Midnight (1986)	0.0	0.707107	1.000000	0.0	0.0	0.0	0.176777	0.0	0.000000	0.0	0.707107	0.505964	0.0	0.0	0.0	0.0	0.000000	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.106256	0.0	0.0	0.0	0.0	0.0	0.707107	0.0	0.0	0.0	0.197457	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.00000	0.0	0.0	0.0	0.105338	0.000000	0.0	0.0	0.000000	0.0	0.0	0.0	0.000000	0.0	0.0	0.0	0.000000	0.0	0.0	0.000000	0.000000	0.000000	0.0	0.0	0.000000	0.000000	0.0	0.0

title	'71 (2014)	'Hellboy': The Seeds of Creation (2004)	'Round Midnight (1986)	'Salem's Lot (2004)	'Til There Was You (1997)	'Tis the Season for Love (2015)	'burbs, The (1989)	'night Mother (1986)	(500) Days of Summer (2009)	*batteries not included (1987)	...All the Marbles (1981)	...And Justice for All (1979)	00 Schneider - Jagd auf Nihil Baxter (1994)	1-900 (06) (1994)	10 (1979)	10 Cent Pistol (2015)	10 Cloverfield Lane (2016)	10 Items or Less (2006)	10 Things I Hate About You (1999)	10 Years (2011)	10,000 BC (2008)	100 Girls (2000)	100 Streets (2016)	101 Dalmatians (1996)	101 Dalmatians (One Hundred and One Dalmatians) (1961)	101 Dalmatians II: Patch's London Adventure (2003)	101 Reykjavik (101 Reykjavík) (2000)	102 Dalmatians (2000)	10th & Wolf (2006)	10th Kingdom, The (2000)	10th Victim, The (La decima vittima) (1965)	11'09"01 - September 11 (2002)	11:14 (2003)	11th Hour, The (2007)	12 Angry Men (1957)	12 Angry Men (1997)	12 Chairs (1971)	12 Chairs (1976)	12 Rounds (2009)	12 Years a Slave (2013)	...	Zathura (2005)	Zatoichi and the Chest of Gold (Zatôichi senryô-kubi) (Zatôichi 6) (1964)	Zazie dans le métro (1960)	Zebraman (2004)	Zed & Two Noughts, A (1985)	Zeitgeist: Addendum (2008)	Zeitgeist: Moving Forward (2011)	Zeitgeist: The Movie (2007)	Zelary (2003)	Zelig (1983)	Zero Dark Thirty (2012)	Zero Effect (1998)	Zero Theorem, The (2013)	Zero de conduite (Zero for Conduct) (Zéro de conduite: Jeunes diables au collège) (1933)	Zeus and Roxanne (1997)	Zipper (2015)	Zodiac (2007)	Zombeavers (2014)	Zombie (a.k.a. Zombie 2: The Dead Are Among Us) (Zombi 2) (1979)	Zombie Strippers! (2008)	Zombieland (2009)	Zone 39 (1997)	Zone, The (La Zona) (2007)	Zookeeper (2011)	Zoolander (2001)	Zoolander 2 (2016)	Zoom (2006)	Zoom (2015)	Zootopia (2016)	Zulu (1964)	Zulu (2013)	[REC] (2007)	[REC]² (2009)	[REC]³ 3 Génesis (2012)	anohana: The Flower We Saw That Day - The Movie (2013)	eXistenZ (1999)	xXx (2002)	xXx: State of the Union (2005)	¡Three Amigos! (1986)	À nous la liberté (Freedom for Us) (1931)
userId
1	0.070345	0.577855	0.321696	0.227055	0.206958	0.194615	0.249883	0.102542	0.157084	0.178197	0.119402	0.185026	0.269199	0.521031	0.141683	0.116623	0.135441	0.224885	0.226528	0.113608	0.185277	0.303638	0.113608	0.255040	0.260446	0.326968	0.305769	0.155031	0.348717	0.186870	0.119402	0.099756	0.206331	0.348717	0.267407	0.237128	0.050947	0.050947	0.200747	0.156893	...	0.186554	0.050947	0.040443	0.121184	0.178482	0.104488	0.104488	0.110808	0.102542	0.175859	0.179162	0.231606	0.093467	0.094357	0.112690	0.113608	0.164231	0.086360	0.277215	0.262709	0.180320	0.112690	0.111653	0.130131	0.248312	0.132009	0.285913	0.113608	0.155861	0.155927	0.113608	0.181738	0.133962	0.128574	0.006179	0.212070	0.192921	0.136024	0.292955	0.720347
2	0.018260	0.042744	0.018861	0.000000	0.000000	0.035995	0.013413	0.002314	0.032213	0.014863	0.000000	0.005220	0.093722	0.000000	0.014296	0.016398	0.043685	0.019004	0.020071	0.015640	0.028349	0.043477	0.015640	0.019634	0.016893	0.008251	0.010919	0.013711	0.000000	0.020300	0.000000	0.002726	0.022639	0.000000	0.032268	0.031130	0.040699	0.040699	0.024950	0.043495	...	0.021269	0.040699	0.030610	0.019721	0.002215	0.023352	0.023352	0.028403	0.002314	0.006791	0.033143	0.010933	0.018806	0.003525	0.011425	0.015640	0.030904	0.017290	0.019250	0.039449	0.038895	0.011425	0.035400	0.038101	0.034181	0.026764	0.000000	0.015640	0.037980	0.006859	0.015640	0.020855	0.020119	0.015745	0.049983	0.014876	0.021616	0.024528	0.017563	0.000000
3	0.011884	0.030279	0.064437	0.003762	0.003749	0.002722	0.014625	0.002085	0.005666	0.006272	0.091413	0.007483	0.018710	0.080626	0.006995	0.006766	0.006988	0.005427	0.006743	0.006923	0.005389	0.008943	0.006923	0.008559	0.009333	0.006316	0.031652	0.007376	0.009832	0.022056	0.091413	0.002548	0.008762	0.009832	0.008773	0.004379	0.001117	0.001117	0.007007	0.005163	...	0.008810	0.001117	0.000000	0.010758	0.007362	0.003726	0.003726	0.004479	0.002085	0.005546	0.006601	0.010054	0.008080	0.003444	0.005636	0.006923	0.007347	0.010048	0.009912	0.010571	0.007156	0.005636	0.003597	0.001240	0.008107	0.006664	0.006615	0.006923	0.006186	0.006225	0.006923	0.011665	0.011800	0.012225	0.000000	0.008194	0.007017	0.009229	0.010420	0.084501

공부 기록

아이템 기반 최근접 이웃 협업 필터링 실습

아이템 기반 최근접 이웃 협업 필터링 실습¶

데이터 가공 및 변환¶

영화 간 유사도 산출¶

아이템 기반 최근접 이웃 협업 필터링으로 개인화된 영화 추천¶

궁금한 점¶

'파이썬 머신 러닝 완벽 가이드' 카테고리의 다른 글

'파이썬 머신 러닝 완벽 가이드'의 다른글

티스토리툴바

	userId	movieId	rating	timestamp
0	1	1	4.0	964982703
1	1	3	4.0	964981247

	pred_score
title
Shrek (2001)	0.866202
Spider-Man (2002)	0.857854
Last Samurai, The (2003)	0.817473
Indiana Jones and the Temple of Doom (1984)	0.816626
Matrix Reloaded, The (2003)	0.800990
Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)	0.765159
Gladiator (2000)	0.740956
Matrix, The (1999)	0.732693
Pirates of the Caribbean: The Curse of the Black Pearl (2003)	0.689591
Lord of the Rings: The Return of the King, The (2003)	0.676711

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

아이템 기반 최근접 이웃 협업 필터링 실습

아이템 기반 최근접 이웃 협업 필터링 실습¶

데이터 가공 및 변환¶

영화 간 유사도 산출¶

아이템 기반 최근접 이웃 협업 필터링으로 개인화된 영화 추천¶

궁금한 점¶

'파이썬 머신 러닝 완벽 가이드' 카테고리의 다른 글

'파이썬 머신 러닝 완벽 가이드'의 다른글

관련글

티스토리툴바