For our Dataset Analysis project, we have chosen a movies dataset from the Github repository. Our choice was based on a common interest we both have, the general area of movies, but also one in which we both have extensive knowledge of. Our analysis will look at movies released between 2000-2015, an analysis of the genre and revenue variables, and will examine the correlation between budget and revenue.
The dataset we are using here comes from the Udacity--Project-Investigate-TMDB-Movies-Dataset project on GitHub. It is a sample of a much larger dataset from the The Movie Database which contains over 830,000 entries.