3.8 Data: How is it stored?

  • On a hard disk… (see here for a little history on computer storage)
  • But while we analyze data with R it is normally stored in the computer’s RandomAccessMemory
    • Q: How much RAM does your computer have?
  • Simple ‘big data’ definition: Data of size bigger than your RAM
  • But with big data we may reach various limits
    • …we can’t load it all into R because of the RAM limit
    • …our PC (personal computer) may not have enough hard disk storage
    • …we may need our PC to run all the time (e.g. Twitter) but it’s not energy efficient (…heats up our room)…
    • …our PC may be to slow to do tasks on a big data set.
  • R designed for data analysis, not for big data analysis → use databases
  • Strategies (later!): (1) (Down-)Sample and Model, (2) Chunk and Pull (estimate across parts), (3) Push Compute to Data (estimate within database)
  • Generally, we have to resort to some other tools, e.g., SQL and Google BigQuery
    • Various tutorial on working with big data/databases, e.g. here.