Chapter 3 Data


3.1 Open Payments Background

The Open Payments system, published by the Centers for Medicare and Medicaid Services (CMS), aggregates information on financial relationships between “reporting entities” - pharmaceutical industry companies - and “covered recipients” - healthcare practices and physicians (Medicare & Medicaid Services (2022a)). Reporting entities are required to report all financial relationships to CMS; however, there is often a significant delay in this reporting. As such, when CMS publishes the data for the previous year in June of the following year, there is often missing data. Therefore, the 2020 data should be considered with an air of caution, as it was published most recently and may not reflect all financial relationships from that calendar year.

CMS subdivides financial relationships into three categories: general, research, and ownership. The official definitions for these payment types are as follows (Medicare & Medicaid Services (2022b)):

  • General: Payments that are not associated with a research study.
  • Research: Payments that are not associated with a research study.
  • Ownership: Ownership and investment interest in companies, which describes both the actual dollar amount invested and the value of the ownership or investment interest.

In this study, the analysis focuses exclusively on the general payment category across the time period of 2015 to 2020. The other payment categories are limited in sample size when considering physicians in primary care practice. Within the general payments category, reporting entities are further required to classify the nature of the payment - the reason the payment or transfer of value was made. The available categories for the nature of a general payment to a physician are:

  • Consulting fees
  • Compensation for services other than consulting, including serving as faculty or as a speaker at an event other than a continuing education program
  • Honoraria
  • Gifts
  • Entertainment
  • Food and beverage
  • Travel and lodging
  • Education
  • Research
  • Charitable contributions
  • Debt forgiveness
  • Royalty or license
  • Current or prospective ownership or investment interest
  • Compensation for serving as faculty or as a speaker for a medical education program.
  • Grant
  • Acquisitions

These categories are established by CMS, but the responsibility falls on reporting entities to classify payments accordingly. Thus, there can be differences in how these categories are applied between pharmaceutical companies. Therefore, one must again consider these categories with an air of caution, as the full scope of the nature of payments may not be known.

3.2 Open Payments Dataset

The base data for this analysis are the published general payment files for the years 2015-2020. As these files contain data on payments to both healthcare practices and physicians, the data was filtered to reflect only those payments to physicians who are medical doctors (not nurses, dentists, etc.). Similarly, variables specific to hospitals were removed. So, each row represents a single payment to a primary care physician.

The remaining data covers all physician specialties. As this analysis focuses on primary care physicians, the data was filtered for physicians identifying as specialties deemed to be comprehensive of the primary care experience. The specialties included are:

  • General Practice
  • Pediatrics
  • Family Medicine
  • Internal Medicine
  • Family Medicine|Obesity Medicine
  • Internal Medicine|Obesity Medicine
  • Family Medicine|Adolescent Medicine
  • Preventive Medicine|Public Health & General Preventive Medicine
  • Internal Medicine|Adolescent Medicine
  • Pediatrics|Adolescent Medicine
  • Family Medicine|Geriatric Medicine
  • Family Medicine|Adult Medicine
  • Internal Medicine|Geriatric Medicine
Unique Physicians Receiving Payments
Years Unique Physicians
2015 182,388
2016 128,083
2017 172,989
2018 170,640
2019 157,399
2020 66,860
TOTAL 315,016

3.3 Spatial Datasets

To analyze geospatial trends in the Open Payments data, this analysis utilized two datasets: zip_code_db and us_zipcodes().

zip_code_db comes from the zipcodeR package, and it contains detailed information on US zip codes (Rozzi (2021)). Foremost, this dataset provides the longitude and latitude of the centroid of each ZIP code, which are critical for mapping payments to geographic locations. Variables of value include:

  • zipcode: 5 digit U.S. ZIP code
  • major_city: Major city serving the ZIP code
  • common_city_list: List of common cities represented by the ZIP code
  • county: Name of county containing the ZIP code
  • state: Two-digit state code for ZIP code location
  • lat: Latitude of the centroid for the ZIP code
  • lng: Longitude of the centroid for the ZIP code
  • timezone: Timezone of the ZIP code
  • area_code_list: List of area codes for telephone numbers within this ZIP code
  • population: Total population of the ZIP code
  • population_density: Population density of the ZIP code (persons per square mile)
  • housing_units: Number of housing units within the ZIP code
  • median_home_value: Median home price within the ZIP code
  • median_household_income: Median household income within the ZIP code

Further, us_zipcodes() provides the spatial geometry necessary for visualizing the boundaries of ZIP codes. From the USBoundaries package, a call to us_zipcodes() returns the contemporary boundaries of US ZIP codes in a Simple Features (sf) object (Mullen & Bratt (2018)).