Guided Project: Analyzing movie reviews
Posted on Wed 08 July 2015 in Projects
import pandas
movies = pandas.read_csv("fandango_score_comparison.csv")
movies
import matplotlib.pyplot as plt
%matplotlib inline
plt.hist(movies["Fandango_Stars"])
plt.hist(movies["Metacritic_norm_round"])
Fandango vs Metacritic Scores¶
There are no scores below a 3.0
in the Fandango reviews. The Fandango reviews also tend to center around 4.5
and 4.0
, whereas the Metacritic reviews seem to center around 3.0
and 3.5
.
import numpy
f_mean = movies["Fandango_Stars"].mean()
m_mean = movies["Metacritic_norm_round"].mean()
f_std = movies["Fandango_Stars"].std()
m_std = movies["Metacritic_norm_round"].std()
f_median = movies["Fandango_Stars"].median()
m_median = movies["Metacritic_norm_round"].median()
print(f_mean)
print(m_mean)
print(f_std)
print(m_std)
print(f_median)
print(m_median)
Fandango vs Metacritic Methodology¶
Fandango appears to inflate ratings and isn't transparent about how it calculates and aggregates ratings. Metacritic publishes each individual critic rating, and is transparent about how they aggregate them to get a final rating.
Fandango vs Metacritic number differences¶
The median metacritic score appears higher than the mean metacritic score because a few very low reviews "drag down" the median. The median fandango score is lower than the mean fandango score because a few very high ratings "drag up" the mean.
Fandango ratings appear clustered between 3
and 5
, and have a much narrower random than Metacritic reviews, which go from 0
to 5
.
Fandango ratings in general appear to be higher than metacritic ratings.
These may be due to movie studio influence on Fandango ratings, and the fact that Fandango calculates its ratings in a hidden way.
plt.scatter(movies["Metacritic_norm_round"], movies["Fandango_Stars"])
movies["fm_diff"] = numpy.abs(movies["Metacritic_norm_round"] - movies["Fandango_Stars"])
movies.sort_values(by="fm_diff", ascending=False).head(5)
from scipy.stats import pearsonr
r_value, p_value = pearsonr(movies["Fandango_Stars"], movies["Metacritic_norm_round"])
r_value
Fandango and Metacritic correlation¶
The low correlation between Fandango and Metacritic scores indicates that Fandango scores aren't just inflated, they are fundamentally different. For whatever reason, it appears like Fandango both inflates scores overall, and inflates scores differently depending on the movie.
from scipy.stats import linregress
slope, intercept, r_value, p_value, stderr_slope = linregress(movies["Metacritic_norm_round"], movies["Fandango_Stars"])
pred = 3 * slope + intercept
pred
Finding Residuals¶
pred_1 = 1 * slope + intercept
pred_5 = 5 * slope + intercept
plt.scatter(movies["Metacritic_norm_round"], movies["Fandango_Stars"])
plt.plot([1,5],[pred_1,pred_5])
plt.xlim(1,5)
plt.show()