4.1 The Night Sky
For as long as people have lived, the night sky has filled us with wonder. What are all of those points of light we call stars? Modern times and modern telescopes allow us to examine the sky with greater understanding of what we observe, but with no less wonder. Most of the points of light we see in the sky are stars, many of similar character to our own sun, but others with great differences in mass, size, and brightness. Some points of light are, in fact, distant galaxies of billions of stars. Astronomers continue to make fantastic discoveries.
One natural question is, just as our Sun is orbited by eight planets (sorry Pluto), do other stars have planets? But whereas stars give off their own light, many of which are visible from the surface of the Earth at night to the naked eye, the only other planets visible by light with our eyes are some of the planets in our own solar system where the light from the sun is reflected back to Earth. We have no way to capture visible light reflecting from any potential exoplanets, planets outside of our own solar system.
4.2 Exoplanet Discovery
However, dating back to the late 1980s and early 1990s, astronomers using a variety of instruments and methods began to observe data that was interpreted, cautiously at first, as evidence of distant planets. This discovery of exoplanets, planets orbiting stars other than our own Sun, is now strongly supported by research by multiple different groups. The 1995 discovery of planet orbiting a main-sequence star was awarded the 2019 Nobel Prize in Physics. A main-sequence star is one that falls in a large continuous band in a stellar plot of color versus brightness. As of June 1, 2021, NASA reports the discovery of over 4,400 confirmed exoplanets and thousands of other candidates. NASA maintains a public data base NASA Exoplanet Archive that records exoplanet data. On the web site, values about the planets in each column are referred to as parameters, a word which has a different technical meaning in the field of statistics.
In astronomy, a parameter in this sense is a numerical value that represents a characteristic of the planet, its star, or the steller system from which it arises. From a statistical point of view, these characteristics might also be called parameters, but the numerical values in the tables are estimates, statistics calculated from observed data to estimate the unknown parameter values.
4.3 Exoplanet Data
Data in the file exoplanets_default_2021-06-15.csv contains one row for each confirmed exoplanet from the NASA Planetary Systems data base pulled on June 15, 2021. This file contains the default parameter set where the values for each variable are from a single study and are, hence, self-consistent. In the full planetary systems data base, there is a single row per planet per study, so planets can appear more than once of they appear in multiple studies. Different studies might make different assumptions along with using different raw spectral data, and thus may arrive at different calculated numerical values for the same variable. One of these single studies per planet is deemed to be the default case and that is the data in this first file. As some methods do not allow certain values to be estimated, this file of default values contains substantial missing data, especially in key variables such as the planet mass and radius.
In contrast, the second file, exoplanet_composite_2021-06-15.csv, is from the planetary systems composite data base in which data from multiple studies is combined. The method used is to begin with the default values and then fill in missing data from other studies. When there are multiple options, the values that have the smallest estimated error are preferred or the most recent values in the absence of error estimates. This composite data base has fewer missing values, but the values may be inconsistent with one another as values can be based on different raw data, assumptions, and methods of analysis.
The first data file has 92 columns of data and the second has 84, but we will examine only a few of them. The variables we will be working with are defined below. If you want more details, please see the NASA exoplanet archive, using the original variable names. The next code block reads in both datasets, selects variables, and renames them. You will be introduced to this code in more detail in the dplyr chapter.
22.214.171.124 Variable descriptions
planet = Planet Name
star = Star Name
method = Method by which the planet was first identified
year = Discovery Year (the year it was discovered)
p_number = Number of Planets in the System
s_number = Number of Stars in the System
radius = Planet Radius (units: Earth Radius)
mass = Approximate or Minimum Planet Mass (units: Earth Mass)
period = Orbital Period (in Earth days)
eccentricity = Orbital Eccentricity, between 0 and 1 with 0 a perfect circle
spectral_type = Spectral Type, the Morgan-Keenan spectral classification of the star
4.3.1 Methods of Discovery
There are over ten methods of discovery represented in the data base, the most common of which is the transit method. The transit method works by monitoring the total light output from a star over time. If there is a dip in the amount of light that occurs in a regular periodic manner, this could suggest that a planet crosses between the star and the line of sight of the telescope. To be credible, observers may need to see the dip in light appear at least three times. The depth of the transit (the reduction in the light produced) may be used to estimate the radius of the planet relative to the radius of the star. Here is a YouTube video to illustrate the transit method.
The second most common method is called radial velocity (RV), which seeks a wobble, suggesting that the star might be orbited by another massive object. In this method, the light spectrum of the star is observed on multiple nights, sometimes multiple times per night. Analysis of shifts of the observed spectrum over time allow researchers to estimate how fast the star is moving toward or away from the observer. When the planet moves away, the spectrum is shifted toward red, and when moving toward the observer, the shift is toward blue. These shifts can be used to estimate the radial velocity. A particular periodic pattern versus time suggests the motion of the star is affected by the planetary orbit. Properties of the shape of the curve allow for the planet’s mass to be estimated. A YouTube video illustrates the radial velocity method.
Ideally, we would like to be able to estimate both the mass and the radius of exoplanets. However, some methods are better adapted to estimate one better than the other. If we could determine a relationship between mass and radius, this would help for us to be able to estimate missing estimates from those where we only have a single measurement.
4.3.2 Earth and Jupiter
The planet masses and radii are measured in Earth units, so a planet with mass 10 has a mass ten times greater than that of Earth and a planet with a radius of 0.8 has an estimated planet radius that is 80% that of the Earth. For massive planets, it may be easier for us to comprehend the values relative to the mass and radius of Jupiter, the largest planet in our solar system. Jupiter has a mass that is about 318 times as large as that of the Earth and a radius which is 10.97 times as large. We can convert the unit of mass and radius from those relative the Earth to be relative to Jupiter by dividing the values provided by the appropriate value.
4.3.3 Mass and Radius
The mass-radius relationship of exoplanets is the relationship between exoplanets’ radius and mass. Modeling the relationship between the mass and radius is important for estimation as we may be able to predict the value missing from one using model based on the value of the other. It is also, of interest to know both of these variables when categorizing planets as the typical values for rocky planets such as the Earth and other inner rocky planets in the solar system differ from those of the large outer planets, such as the gas giants Jupiter and Saturn.
4.3.4 Spectral Types
The Morgan-Keenan system classifies stars by spectral class with a single letter from the sequence O, B, A, F, G, K, and M where O-type stars are the hottest and M-type are the coolest. There are additional letters for classes of stars not in this classical system, such as D for white dwarfs, and S and C for carbon stars. Added to the letter is a number from 0 to 9 with 0 being the hottest and 9 being the coolest. Most classifications use a single digit, but subsequent values are possible.
In addition, there is an added Roman numeral to represent the luminosity class, based on the width of certain absorption lines in the light spectrum. A value of I represents a supergiant and V is a main-sequence star. The classification of the Sun is G2V. The surface temperatures of class G stars range from 5200 to 6000 degrees Kelvin and at about 5700, the Sun is on the hotter side of this range (2 is closer to 0 than it is to 9). The Sun is a main-sequence star, as described by the V at the end of the classification.
4.4 Exoplanet Questions
We will consider the following questions as we explore the exoplanet data sets.
What is the relationship between planetary mass and radius among the confirmed exoplanets? Is there a strong linear relationship or is the relationship more complicated?
Are there collections of planets that do not follow the pattern between mass and radius? Are there other variables which help to distinguish these planets?
How do the equations for predicting mass from radius and radius from mass compare with one another?
In the default data set, what is the relationship between the method used and the frequency of missing mass and radius estimates?
Most planets are named by the name of the star followed by a space and a single letter. Are there exceptions to this norm? Can you make a guess as to why?
What is the relative frequency of spectral classifications of stars in these data sets?
What fraction of missing data from the default data set is filled in by the composite data set?
4.5 The Journey Forward
In later chapters of the book, we will again reconsider these data sets as we examine modeling relationships, such as the relationship between radius and mass, and work with the strings in the text data for variables such as the planet and star names and the spectral type.