Pins for Python

The Python pins library provides a way to easily share data sets, models, and other Python objects. Your resources may be text files (CSV, JSON, etc.), Arrow/Feather files, or any other format you use to share data. Pinned objects can be stored on a variety of “boards”, including local folders (to share on a networked drive or with dropbox), Posit Connect, Amazon S3, and more.

Sharing data can be useful in many situations, for example:

  1. Multiple pieces of content require the same input data. Rather than copying that data, each piece of content references a single source of truth hosted on Posit Connect.

  2. Content depends on data or model objects that need to be regularly updated. Rather than redeploying the content each time the data changes, use a pinned resource and update only the data. The data update can occur using a scheduled Jupyter Notebook document. Your content will read the newest data on each run.

  3. You need to share resources that aren’t structured for traditional tools like databases. For example, models saved as Python objects aren’t easy to store in a database. Rather than using email or file systems to share data files, use Posit Connect to host these resources as pins.

Pins and large data sets

An important factor in determining whether or not to use a pin is the size of the data or object in use. As a general rule of thumb, we don’t recommend using pins with files over 500 MB. If you find yourself routinely pinning data larger than this, then you might need to reconsider your data engineering pipeline.

Create a Pin Board

Posit Connect is easy to use as a board for pinning Python objects. Create a board to use with board = board_rsconnect(). This function takes server_url and api_key arguments which inform how you will authenticate to Posit Connect. If not specified, pins will attempt to read an api_key from the CONNECT_API_KEY environment variable.

Note

To read pickle files from a Pin board, you must set the allow_pickle_read=True argument in board_rsconnect(). The pickle module is not secure, so only read files you trust. For more information, refer to the Python documentation.

import os
from pins import board_rsconnect

API_KEY = os.getenv('CONNECT_API_KEY')
SERVER = os.getenv('CONNECT_SERVER')

board = board_rsconnect(server_url=SERVER, api_key=API_KEY)

Posit Connect will automatically apply values for these environment variables for deployed content at run time, so there is no need to include them in your code (never a best practice) or specify them in the Vars Pane unless your server administrator has disabled that function.

Note

The automatic generation of these environment variables may be disabled for security reasons. Reach out to your Posit Connect server administrator or review the Admin Guide for additional details.

Read and Write Pins

Once you have a pin board, you can write data to it with .pin_write(). It requires three arguments: an object, a name, and a pin type:

from pins.data import mtcars
board.pin_write(mtcars.head(), "hadley/mtcars", type="csv")

The first argument is the object to save, and the second argument gives the “name” of pin. On Posit Connect, this name will be used along with your Username to retrieve or read data from the pin. Running the code above should yield a success message that looks something like this: Writing to pin 'hadley/mtcars'.

The username you provide (ex. ‘hadley’), must match the API key used to establish the board and authenticate to Posit Connect. If the username does not match, you will receive an error.

After you’ve pinned an object, you can read it back with .pin_read():

board.pin_read("hadley/mtcars")

Pin Metadata

Every pin is accompanied by some metadata that you can access with .pin_meta(). This will return the metadata generated by default. This includes:

  • A title, a brief textual description of the dataset.

  • An optional description, where you can provide more details.

  • The date-time when the pin was created.

  • The file_size, in bytes, of the underlying files.

  • A unique pin_hash that you can supply to .pin_read() to ensure that you’re reading exactly the data that you expect.

When creating the pin, you can override the default description or provide additional metadata that is stored with the data:

board.pin_write(
    mtcars,
    name="mtcars2",
    type="csv",
    description = "Data extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).",
    metadata = {
        "source": "Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411."
    }
)

Learn more about Pin Metadata.

Using a Pin

Once a pin has been deployed, it is easy to share the pin with colleagues.

You can manage content settings for deployed pins just like you would for other content types. For example, you can manage access controls to pins to determine who should be able to view and utilize the resource.

Posit Connect provides a preview of pinned data objects, their metadata, and a direct download link which can be accessed at the content url:

Example of a Python Pin on Posit Connect.

Updating a Pin

Pins are objects; they are not backed by source code and so they cannot be directly scheduled. A common pattern for updating pinned data on a schedule is to run .pin_write() inside a scheduled Jupyter Notebook. Writing to the same pin multiple times creates a version history which can be accessed under the “More” button dropdown menu.