5 Analysis

  • Answers to the research questions
  • Different methods considered
  • Competing approaches
  • Justifications

5.1 Research Questions

Do budgets positively correlate with profits?

During the exploratory data analysis, the clear correlation between budget and income was observed. For modelling of that relation 4 different models were used and compared to their adj.r.squared value. As expected, the linear model was best when both budget and income was at logarithmic scale.

#> # A tibble: 4 x 3
#>   model                         r_squared adj_r_squraed
#>   <chr>                             <dbl>         <dbl>
#> 1 Linear Model                      0.405         0.405
#> 2 Polynomial Model (2nd degree)     0.416         0.415
#> 3 Polynomial Model (3rd degree)     0.416         0.415
#> 4 Linear Model (Log)                0.389         0.389
#> 
#> Call:
#> lm(formula = profit ~ poly(budget, degree = 2), data = drop_na(movies_df, 
#>     budget))
#> 
#> Residuals:
#>       Min        1Q    Median        3Q       Max 
#> -6.96e+08 -4.11e+07 -2.19e+07  1.38e+07  1.89e+09 
#> 
#> Coefficients:
#>                           Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)               9.74e+07   2.12e+06   45.96   <2e-16 ***
#> poly(budget, degree = 2)1 6.53e+09   1.29e+08   50.67   <2e-16 ***
#> poly(budget, degree = 2)2 1.05e+09   1.29e+08    8.11    7e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1.29e+08 on 3701 degrees of freedom
#> Multiple R-squared:  0.416,  Adjusted R-squared:  0.415 
#> F-statistic: 1.32e+03 on 2 and 3701 DF,  p-value: <2e-16

According to predicted best model (Polynomial Model (2nd degree)), each unit increase of the budget significantly (p < 2e-16) increase the profit variable. So, budgets positively correlate with profits. While, the polynomial model is slightly better according the adjusted r squared value (explains %1 more of the outcome variability compared to linear model), the linear model is preferred for the rest of the analysis as it is more easier to interpret and comment. The graph below also shows that both model has similar trend in the majority of the data.

Does this prediction change regarding the movie genre?

To understand the relation between budget and profit in different genres, same linear modelling was applied in each genres seperately. The table below shows the coefficients of the budget in different genres. Except documentary and western genres, most ot the categories have similar coefficients. As a result, the prediction changes regarding the movie genres.

genres
estimate
std.error
statistic
p.value
Musical
2.655
0.399
6.662
0
Adventure
2.599
0.134
19.437
0
Sci-Fi
2.586
0.158
16.41
0
Fantasy
2.539
0.161
15.724
0
IMAX
2.527
0.446
5.669
0
Crime
2.524
0.109
23.138
0
Action
2.507
0.097
25.931
0
(no genres listed)
2.504
0.025
101.849
0.00625
Romance
2.488
0.151
16.466
0
Film-Noir
2.329
0.622
3.745
0.00381
1–10 of 20 rows

Does this prediction change based on the gender type of the main characters?

Similary, to identify the budget - profit relation for the star genders, linear modelling was applied in each star gender categories (male-female, female-female, female-male) seperately. The table below shows the coefficients of the budget in different star gender categories. The table below shows the budget coefficients on profit for each category.

star_cast
estimate
std.error
statistic
p.value
Female-Female
2.858
0.192
14.864
1.07784513989633e-36
Male-Male
2.417
0.069
35.147
2.15468315294753e-199
Female-Male
2.311
0.073
31.602
2.79536794548671e-174
Other
2.094
0.215
9.732
1.3737490760986e-16

Can we see differences in the effect regarding the movie production studio?

As majority of the studios have 2 or less movies in the given dataset, the linear model couldn’t be applied on those. In the table below, the budget coeffients and other statistics were shown for each studio. The test showed that, in the scope of most of the studios, the budget coefficient and relation significance vary a lot.

studio
estimate
std.error
statistic
p.value
Palace Pictures
62.504
52.106
1.2
0.442398993737211
Illumination Entertainment
54.946
14.561
3.773
0.00925128805891875
See-Saw Films
26.178
118.661
0.221
0.861765776482355
Celador Films
25.247
22.446
1.125
0.462660003040865
Produzioni Europee Associate (PEA)
25.072
13.338
1.88
0.311246647186398
Majestic Films International
21.863
4.972
4.397
0.142351471482454
Allied Filmmakers
21.342
14.344
1.488
0.376714263144201
Alliance
20.309
56.587
0.359
0.780630376724337
EMI Films
19.649
13.008
1.511
0.269983832991976
Howard W. Koch Productions
19.092
19.519
0.978
0.507032465965189
1–10 of 596 rows
...
Created with Highcharts 8.1.2estimate-,log10(p.value)Effect of Studio on Budget-Income Relation-4-20246810121416182022051015202530354045

Does this prediction change based on the year?

As we did on the previous comparisons, to determine the changes on budget - profit relation over years, linear modelling was applied in each star years seperately. The table below shows the coefficients of the budget in different years. As the numbers of movies before 1975 was too low, those are omitted from analysis.

#> Warning: All elements of `...` must be named.
#> Did you want `data = c(movieId, title, genres, budget, profit, studio, star_female, 
#>     star_male, star_unknown)`?

year
estimate
std.error
statistic
p.value
1985
4.725
0.875
5.402
0.00000182793256728775
1991
3.994
0.613
6.513
3.47497645955292e-8
1978
3.797
1.633
2.325
0.0301983400218991
1993
3.72
0.842
4.416
0.0000342409178271375
2018
3.671
0.634
5.794
0.00000419386652855935
2015
3.558
0.341
10.428
2.1120143621256e-18
1997
3.259
0.547
5.962
5.66973316767721e-8
1984
3.15
1.082
2.911
0.00537549051683664
1989
3.049
0.8
3.809
0.000334391990295247
2009
2.794
0.347
8.044
5.502496592645e-13
1–10 of 43 rows
Created with Highcharts 8.1.2yearestimateThe Changes on Budget - Profit Relation Between 1975 and 20161976197819801982198419861988199019921994199619982000200220042006200820102012201420162018-1012345

The graph shows that after the begining of 2000’s, the effect of bugget on the income started to increase slightly.