:: Sai Chaitanya Gaddam | An Explorer For Netflix Prize Dataset ::

Publications

Explorations

Exploring the Netlfixprize dataset

This is also posted in the Netflixprize Community Forum. Follow the discussion there if you are interested in this kind of thing.

I put together a viewer to make the dataset more accessible (for lack of a better word).

Screenshot:

#

While tweaking my algorithms and looking at recommendations it offered for certain users, I often felt it would help to know more about the movie and its neighbors in the assigned category. (*-More about neighborhood calculation towards the end).

http://gflix.appspot.com allows you to do just that.

You can either just search for a movie in the dataset by throwing in keywords. For example, typing in the keyword unfaithful not only returns the movie with that title, but also every other movie that has the word in its synopsis.

Entering multiple keywords (e.g. football and drama) fetches all the movies with both keywords.

You can directly access the neighborhood of a movie in the Netflixprize dataset by accessing gflix.appspot.com/netflix/movie_index
(e.g. http://gflix.appspot.com/netflix/3150 )

When looking at the neighborhood of a movie, you can access the synopsis of a particular movie by hovering your mouse over it for a bit.


Movie info can also be accessed directly at gflix.appspot.com/netflix/movie_index/info
(e.g. http://gflix.appspot.com/netflix/3150/info )


Oh, and you can add any movie to your Netflix queue by clicking on the plus sign :)
(Design caveat: It looks better on firefox because I didn't spend time twiddling with CSS stuff in IE)

Now that Netflix has offered an API, a lot more things can be done. I'd like to hear your suggestions.


-CS

*Neighborhood calculations are not user-specific and are based on number of users in the intersection of movie vector. These numbers are normalized by user and movie rating-frequencies to account for the overemphasis of widely watched movies.

Research Nuggets

CONFIGR:
is a first-generation neural model for contour interpolation.

Knowledge Discovery from Labeled Web Documents:
is an attempt to sythesize disparate semantic information from user tags.