Data: How is it stored?
- On a hard disk… (see here for a little history on computer storage)
- But while we analyze data with R it is normally stored in the computer’s RandomAccessMemory
- Q: How much RAM does your computer have?
- Simple ‘big data’ definition: Data of size bigger than your RAM
- But with big data we may reach various limits
- …we can’t load it all into R because of the RAM limit
- …our PC (personal computer) may not have enough hard disk storage
- …we may need our PC to run all the time (e.g. Twitter) but it’s not energy efficient (…heats up our room)…
- …our PC may be to slow to do tasks on a big data set.
- R designed for data analysis, not for big data analysis → use databases
- Strategies (Gold 2019): (1) (Down-)Sample and Model, (2) Chunk and Pull (estimate across parts), (3) Push Compute to Data (estimate within database)
- Generally, we have to resort to some other tools, e.g., SQL and Google BigQuery
- Various tutorial on working with big data/databases, e.g. here.