16 February 2021
This is a basic rundown of how to interact with Google Storage programmatically in R. You need Google Storage in order to use Document AI and other Google APIs on scale, because these services do not accept bulk file submissions directly. Instead they use Google Storage as an intermediary, so you need to know how to get files in and out of Google Storage.
It is possible to bulk upload and download files to Google Storage in the Google Cloud Console. In fact, for uploads it can sometimes be easier than doing it programmatically. But downloads and deletions are cumbersome if you have a lot of files. And since bulk processing in DAI can only be done with code, you might as well keep the whole workflow in R.
The biggest hurdle to using any Google API is authentication. It’s daunting for several reasons. For one, it involves abstract new concepts like “service accounts”, “Oauth2.0”, and “scopes”. For another, the Google Cloud Console is so crowded it’s an absolute nightmare to navigate as a beginner. In addition, different R packages have different procedures for authenticating with Google Cloud Services (GCS).
A full explanation of Google API authentication would fill a small book, but suffice to say here that there are several different ways to authenticate to GCS from R. In the following I will walk you through one such way, the one I think is the simplest and most robust if you are primarily planning to use Google Storage and Google Document AI.
If you have one already, you can use that. Or you can create a burner account for your GCS work.
While logged in to your gmail account, go to the Google Cloud Console. Agree to the terms of service and click “Try for free”.