Chapter 2 Analysis of Food Data (Webscraping)
I was interested in seeing if a higher Difficulty of a dish translated to a higher Rating, because why else would you make a more difficult dish if it rated lower than an easier dish. My first Plot (Figure 1.0) visualises how the number of Ingredients used along with the Difficulty of making the dish translated to the dishes Rating. As I expected the more Ingredients required in a dish the higher the difficulty whereas the easiest dishes required the least. It can also be seen that there are more higher rated dishes as the difficulty increases.
2.1 A view of the data
::kable(
knitrhead(Taste, 5), caption = 'Brief View of the data ',
booktabs = TRUE )
ID | Title | URL | Keywords | Prep | Cook | Ingredients_N | Difficulty | Servings | Makes | Rating | Rating_N | Comments_N | Date | Energy | Fat_Total | Fat_Sat | Sugar | Carbs | Fibre | Protein | Chol | Sodium | Ingredient_1 | Ingredient_2 | Ingredient_3 | Ingredient_4 | Ingredient_5 | Ingredient_6 | Ingredient_7 | Ingredient_8 | Ingredient_9 | Ingredient_10 | Ingredient_11 | Ingredient_12 | Ingredient_13 | Ingredient_14 | Ingredient_15 | Ingredient_16 | Ingredient_17 | Ingredient_18 | Ingredient_19 | Ingredient_20 | Ingredient_21 | Ingredient_22 | Ingredient_23 | Ingredient_24 | Ingredient_25 | Ingredient_26 | Ingredient_27 | Ingredient_28 | Ingredient_29 | Ingredient_30 | Ingredient_31 | Ingredient_32 | Ingredient_33 | Ingredient_34 | Ingredient_35 | Ingredient_36 | Ingredient_37 | Ingredient_38 | Ingredient_39 | Ingredient_40 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Chicken And Leek Casserole Recipe | http://www.taste.com.au/recipes/1/chicken+and+leek+casserole | NA | 00:20 | 01:05 | 12 | EASY | 4 | NA | 4.5 | 198 | 206 | 01/06/2006 | 3088 | 12 | 34 | 2 | 43 | 5 | 59 | 225 | 1464.62 | 1/4 cup plain flour | 8 chicken thigh fillets, trimmed, halved crossways | 1 tablespoon butter | 1 tablespoon olive oil | 1 leek, halved, washed, sliced | 4 rashers bacon, rind removed, chopped | 2 garlic cloves, crushed | 3 cups Campbell’s Real Stock Chicken | 1/2 cup white wine | 100g button mushrooms, sliced | 100g green beans, trimmed, halved | 1 cup couscous | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
19 | Hot Chocolate With Marshmallows Recipe | http://www.taste.com.au/recipes/19/hot+chocolate+with+marshmallows | NA | 00:05 | 00:10 | 6 | EASY | 8 | NA | 5.0 | 1 | 1 | 01/06/2006 | 2094 | 18 | 29 | 44 | 48 | NA | 12 | NA | 125.52 | 1 cup thickened cream | 2 litres milk | 2 teaspoons vanilla extract | 1 tablespoon caster sugar | 250g good-quality dark chocolate, broken into squares | 24 marshmallows | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
21 | Breakfast Mushrooms On Toast Recipe | http://www.taste.com.au/recipes/21/breakfast+mushrooms+on+toast | NA | 00:10 | 00:10 | 8 | EASY | 4 | NA | 4.5 | 3 | 3 | 01/06/2006 | 976 | 4 | 18 | 1 | 13 | 1 | 4 | 9 | 160.02 | 1/4 cup olive oil | 1 garlic clove, crushed | 2 tablespoons flat-leaf parsley leaves, chopped | 1 teaspoon thyme leaves, roughly chopped | 1 lemon, rind finely grated | 4 large flat mushrooms, stalks trimmed | 4 thick slices sourdough bread | 40g Boursin herbs and garlic cheese | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
22 | Steamed Lemon Cake Recipe | http://www.taste.com.au/recipes/22/steamed+lemon+cake | NA | 00:20 | 00:30 | 9 | EASY | 8 | NA | 3.0 | 2 | 2 | 01/06/2006 | 1829 | 16 | 27 | 29 | 42 | 1 | 6 | 160 | 226.65 | 3 eggs | 3/4 cup caster sugar | 75g butter, melted, cooled | 2 tablespoons double thick cream | 2 lemons, rind finely grated, juiced | 1 cup plain flour | 1/2 teaspoon baking powder | 1/2 cup lemon curd | 200ml double thick cream | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
24 | French Roast Chicken With Whole Garlic Recipe | http://www.taste.com.au/recipes/24/french+roast+chicken+with+whole+garlic | NA | 00:10 | 01:00 | 8 | EASY | 4 | NA | 4.5 | 7 | 7 | 01/06/2006 | 2482 | 11 | 42 | 1 | 1 | 2 | 53 | 211 | 719.56 | You will need unwaxed cooking string. | 1 large lemon | 2 1/2 tablespoons extra-virgin olive oil | 4 garlic cloves, crushed | 1 teaspoon sea salt | 1.7kg free-range whole chicken, cleaned, rinsed, patted dry | 4 sprigs thyme | 4 bulbs garlic | potato mash and steamed green beans, to serve | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
2.2 Do more Ingredients Mean a Higher Difficulty
## Setting the Difficulty as an Ordered Factor ##
$Difficulty=factor(Taste$Difficulty, levels=
Tastec("SUPER EASY","EASY", "CAPABLE COOKS", "ADVANCED"), ordered=TRUE)
## Will only Use Complete Observations ##
=subset(Taste, select=c(Ingredients_N,Difficulty, Rating, Prep))
Taste2=na.omit(Taste2)
Taste2## Changing Preparation Time from Factor to Numeric ##
$Prep2=as.numeric(as.factor(Taste2$Prep))
Taste2### My First Plot ###
ggplot(Taste2)+aes(x=Difficulty, y=Ingredients_N, na.rm=TRUE, color=Rating)+
geom_point(alpha=.7,size=1,stat="Identity")+geom_jitter()+
scale_colour_gradient(low="gold",high="purple4")+geom_boxplot(alpha=.6)+
theme_light()+ylab("Number of Ingredients Used")+
ggtitle("The Number of Ingredients Used vs Difficulty of the Recipe by Ratings")
### Does a Higher Preparation Time Mean a Higher Rating, Is it Worth it? ###
### My Second Plot ###
ggplot(Taste2)+aes(x=Difficulty, y=Prep2, color=Rating)+
geom_point(alpha=.7,size=1,stat="Identity")+geom_jitter()+
scale_colour_gradient(low="gold",high="purple4")+
geom_boxplot(alpha=.4)+theme_light()+ylab("Preparation time (minutes) ")+
ggtitle("Dish Preparation time vs Difficulty of the Recipe by Ratings")+coord_flip()