Appendix C — Tips on Converting to Python

Translating R code into Python can be a smooth transition with the right approach. Let’s start with the basics, from installing packages to loading libraries, and compare the equivalents between R and Python, including the popular tidyverse in R and its counterparts in Python.

C.1 Packages and Libraries

  1. Installing Packages:

    • R:

      install.packages("package_name")
    • Python (using pip):

      !pip install package_name
    • Python (using conda):

      !conda install package_name
  2. Loading Libraries:

    • R:

      library(package_name)
    • Python:

      import package_name

C.2 Comparing tidyverse with its Python equivalents

  • tidyverse (R): tidyverse is a collection of R packages designed for data science, including dplyr for data manipulation, ggplot2 for data visualization, tidyr for data tidying, etc.

    library(tidyverse)
  • Python Equivalents:

    • pandas: Similar to dplyr, pandas provides powerful data manipulation tools.

      import pandas as pd
    • matplotlib/seaborn: Comparable to ggplot2, these libraries are used for data visualization.

      import matplotlib.pyplot as plt
      import seaborn as sns
    • numpy: While not a direct equivalent to tidyr, numpy offers functionalities for array manipulation and numerical computing, which can be handy for data tidying tasks.

      import numpy as np
    • scikit-learn: Provides tools for data preprocessing, modelling, and evaluation, resembling some functionalities of tidyverse packages like modelr.

      from sklearn import ...
    • tidyverse-like package: There isn’t a single package in Python that encompasses the entire functionality of tidyverse, but you can combine pandas, matplotlib/seaborn, numpy, and scikit-learn to achieve similar results.

By understanding these equivalences and leveraging the rich ecosystem of Python libraries, you can effectively translate your R code into Python, ensuring a smooth transition while retaining the analytical power and flexibility you need for your projects.

C.3 Creating data making statistics

  1. Creating Basic Data:

    • R:

      # Create a data frame
      data <- data.frame(
        x = c(1, 2, 3, 4, 5),
        y = c(2, 3, 4, 5, 6)
      )
    • Python (using pandas):

      import pandas as pd
      
      # Create a DataFrame
      data = pd.DataFrame({
          'x': [1, 2, 3, 4, 5],
          'y': [2, 3, 4, 5, 6]
      })
  2. Basic Statistics:

    • R:

      # Summary statistics
      summary(data)
    • Python (using pandas):

      # Summary statistics
      print(data.describe())

C.4 Building a Linear Regression Model

  • R:

    # Load the lm function from the stats package
    library(stats)
    
    # Fit a linear regression model
    lm_model <- lm(y ~ x, data = data)
    
    # Summary of the model
    summary(lm_model)
  • Python (using statsmodels):

    import statsmodels.api as sm
    
    # Add a constant term for intercept
    X = sm.add_constant(data['x'])
    
    # Fit a linear regression model
    lm_model = sm.OLS(data['y'], X).fit()
    
    # Summary of the model
    print(lm_model.summary())
  • Python (using scikit-learn):

    from sklearn.linear_model import LinearRegression
    
    # Initialize the model
    lm_model = LinearRegression()
    
    # Fit the model
    lm_model.fit(data[['x']], data['y'])
    
    # Coefficients
    print("Intercept:", lm_model.intercept_)
    print("Coefficient:", lm_model.coef_)

While the syntax and libraries may differ slightly, the overall process remains conceptually similar. By understanding these comparisons, you can effectively transition between R and Python for data analysis and modelling tasks.

C.5 Example of a Model Workflow

  1. Data Preprocessing:

    import pandas as pd
    from sklearn.preprocessing import StandardScaler
    data = pd.read_csv('data.csv')
    data.fillna(method='ffill', inplace=True)  # Forward fill missing values
    scaler = StandardScaler()
    scaled_data = scaler.fit_transform(
      data[['feature1', 'feature2', 'feature3']]
      )
  2. Model Selection and Training:

    from sklearn.model_selection import train_test_split
    from sklearn.linear_model import LinearRegression
    from sklearn.metrics import mean_squared_error
    
    X = data[['feature1', 'feature2', 'feature3']]
    y = data['DALYs']
    X_train, 
    X_test, 
    y_train, 
    y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    model = LinearRegression()
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    
    mse = mean_squared_error(y_test, y_pred)
    print(f'Mean Squared Error: {mse}')
  3. Time Series Forecasting Example:

    from fbprophet import Prophet
    ts_data = data[['date', 'DALYs']]
    ts_data.rename(columns={'date': 'ds', 'DALYs': 'y'}, inplace=True)
    model = Prophet()
    model.fit(ts_data)
    future = model.make_future_dataframe(periods=365)
    forecast = model.predict(future)
    model.plot(forecast)

By following these steps, you can analyze DALYs and infectious diseases, drawing trends, understanding relationships, and predicting future outcomes effectively.