How is data stored?
- On a hard disk… (see here for a little history on computer storage)
- But while we analyze data with R it is normally stored in the computer’s RAM
- Q: How much RAM does your computer have?
- Simple ‘big data’ definition: Data of size bigger than your RAM
- But with big data we may reach various limits
- …we can’t load it all into R because of the RAM limit
- …our PC (personal computer) may not have enough hard disk storage
- …we may need our PC to run all the time to collect the data (e.g. Twitter) but it heats up our 12sqm student room…
- …our PC may be to slow to do tasks on a big data set.
- R (as software) is designed to analzye data but not to store (big) data → database
- Strategy (see video): Develop model for data subset and scale it up (or not, e.g. sample)
- Categorization of big data problems:
- Extract data
- Compute on the parths
- Compute on the whole
- …we have to resort to some other tools: SQL and Google BigQuery
- Various tutorial on working with big data/databases, e.g. here.