3  Functions and Loops

3.1 Introduction

In programming, we often perform the same tasks repeatedly. Functions and Loops help us write cleaner, shorter, and more efficient code.

  • Function is a block of code that can be called anytime to perform a specific task.
  • Loop is used to run the same code repeatedly without rewriting it.

3.2 What Is a Function?

A function is a block of code designed to perform a specific task. Using functions helps us avoid redundant code.

This visual representation helps illustrate how functions work systematically. The label “Function Machine” on the machine reinforces that it applies a specific rule to transform the input into an output. The function in the image is:

f(x)=x+3

This means that any number inputted into the machine will have 3 added to it before being output.

3.2.1 Function in ax+b

This function takes three numbers as inputs and returns their calculation.

Python Code

# Function to multiply 'a' with 'x' and add 'b'
def function1(a, x, b):
    return a * x + b

# Example usage
print(function1(2, 3, 4))  # Output: (2 * 3) + 4 = 10
10

R Code

# Function to multiply 'a' with 'x' and add 'b'
function1 <- function(a, x, b) {
  return(a * x + b)
}

# Example usage
print(function1(2, 3, 4))  # Output: (2 * 3) + 4 = 10
[1] 10

3.2.2 Value Comparator

This function analyzes two datasets by calculating their mean, median, and standard deviation, useful in data analysis.

Python Code

import statistics
from tabulate import tabulate

# Function to compare two datasets
def compare_data(group1, group2):
    return {
        "group1": {
            "mean": statistics.mean(group1),
            "median": statistics.median(group1),
            "std_dev": statistics.stdev(group1)
        },
        "group2": {
            "mean": statistics.mean(group2),
            "median": statistics.median(group2),
            "std_dev": statistics.stdev(group2)
        }
    }

# Sample datasets
data1 = [10, 20, 30, 40, 50]
data2 = [15, 25, 35, 45, 55]

# Get results
results = compare_data(data1, data2)

# Convert results to a table format
table = [
    ["Metric", "Group 1", "Group 2"],
    ["Mean", results["group1"]["mean"], results["group2"]["mean"]],
    ["Median", results["group1"]["median"], results["group2"]["median"]],
    ["Standard Deviation", results["group1"]["std_dev"], results["group2"]["std_dev"]]
]

# Print table
print(tabulate(table, headers="firstrow", tablefmt="grid"))
+--------------------+-----------+-----------+
| Metric             |   Group 1 |   Group 2 |
+====================+===========+===========+
| Mean               |   30      |   35      |
+--------------------+-----------+-----------+
| Median             |   30      |   35      |
+--------------------+-----------+-----------+
| Standard Deviation |   15.8114 |   15.8114 |
+--------------------+-----------+-----------+

R Code

# Load library
library(knitr)

# Function to compare two datasets
compare_data <- function(group1, group2) {
  data.frame(
    Statistic = c("Mean", "Median", "Std Dev"),
    Group1 = round(c(mean(group1), median(group1), sd(group1)), 2),
    Group2 = round(c(mean(group2), median(group2), sd(group2)), 2)
  )
}

# Sample data
data1 <- c(10, 20, 30, 40, 50)
data2 <- c(15, 25, 35, 45, 55)

# Print as formatted table
kable(compare_data(data1, data2))
Statistic Group1 Group2
Mean 30.00 35.00
Median 30.00 35.00
Std Dev 15.81 15.81

Functions save time by allowing code reuse, improve program organization and readability, and make debugging and future development easier.

3.2.3 Geometric Properties

In the field of computational geometry, functions are essential for converting mathematical expressions into executable code. For example, the formulas for calculating the area and perimeter of various two-dimensional shapes can be implemented as separate functions. This approach makes the development process more efficient and easier to manage. The following sections explain in detail how these geometric formulas are coded, using Python and R as examples.

Shape Area Formula (A) Perimeter Formula (P) Variables Description
Triangle A=12(b×h) P=a+b+c b = base, h = height, a, b, c = sides
Rectangle A=l×b P=2(l+b) l = length, b = breadth
Square A=s×s P=4×s s = side
Circle A=πr2 P=2πr r = radius, π=3.14 or 227
Ellipse A=π×a×b P=π(a+b) a = semi-major axis, b = semi-minor axis
Parallelogram A=b×h P=2(a+b) b = base, h = height, a, b = lengths of opposite sides
Rhombus A=12(d1×d2) P=4×a d1,d2 = diagonals, a = side
Trapezium A=12(a+b)×h Sum of all sides a, b = lengths of parallel sides, h = height


With the formulas provided above, you can create functions that calculate the area and perimeter for different shapes. This not only makes your code modular and easier to maintain but also enables you to test individual pieces of logic in isolation. This example below in Python and R that demonstrate how to implement functions for these calculations.

Python Code

import math

# Function to calculate area and perimeter for multiple shapes
def calculate_area_perimeter(shape, **kwargs):
    if shape == "triangle":
        base = kwargs.get("base")
        height = kwargs.get("height")
        side_a = kwargs.get("side_a")
        side_b = kwargs.get("side_b")
        side_c = kwargs.get("side_c")
        area = 0.5 * base * height
        perimeter = side_a + side_b + side_c
    elif shape == "rectangle":
        length = kwargs.get("length")
        breadth = kwargs.get("breadth")
        area = length * breadth
        perimeter = 2 * (length + breadth)
    elif shape == "square":
        side = kwargs.get("side")
        area = side ** 2
        perimeter = 4 * side
    elif shape == "circle":
        radius = kwargs.get("radius")
        area = math.pi * radius ** 2
        perimeter = 2 * math.pi * radius
    elif shape == "ellipse":
        a = kwargs.get("a")
        b = kwargs.get("b")
        area = math.pi * a * b
        perimeter = math.pi * (a + b)
    elif shape == "parallelogram":
        base = kwargs.get("base")
        height = kwargs.get("height")
        side_a = kwargs.get("side_a")
        side_b = kwargs.get("side_b")
        area = base * height
        perimeter = 2 * (side_a + side_b)
    elif shape == "rhombus":
        d1 = kwargs.get("d1")
        d2 = kwargs.get("d2")
        side = kwargs.get("side")
        area = 0.5 * d1 * d2
        perimeter = 4 * side
    elif shape == "trapezium":
        a = kwargs.get("a")
        b = kwargs.get("b")
        height = kwargs.get("height")
        side_a = kwargs.get("side_a")
        side_b = kwargs.get("side_b")
        area = 0.5 * (a + b) * height
        perimeter = a + b + side_a + side_b
    else:
        return "Invalid shape. Choose a valid 2D shape."

    return {"area": area, "perimeter": perimeter}
# Example usage
result = calculate_area_perimeter("triangle", 
                                  base=6, 
                                  height=4, 
                                  side_a=5, 
                                  side_b=6, 
                                  side_c=7)
print("Triangle-Area & Perimeter:", result["area"], "and", result["perimeter"])
Triangle-Area & Perimeter: 12.0 and 18

R Code

# Function to calculate area and perimeter for multiple shapes
calculate_area_perimeter <- function(shape, ...) {
  args <- list(...)
  
  if (shape == "triangle") {
    base <- args$base
    height <- args$height
    side_a <- args$side_a
    side_b <- args$side_b
    side_c <- args$side_c
    area <- 0.5 * base * height
    perimeter <- side_a + side_b + side_c
  } else if (shape == "rectangle") {
    length <- args$length
    breadth <- args$breadth
    area <- length * breadth
    perimeter <- 2 * (length + breadth)
  } else if (shape == "square") {
    side <- args$side
    area <- side^2
    perimeter <- 4 * side
  } else if (shape == "circle") {
    radius <- args$radius
    area <- pi * radius^2
    perimeter <- 2 * pi * radius
  } else if (shape == "ellipse") {
    a <- args$a
    b <- args$b
    area <- pi * a * b
    perimeter <- pi * (a + b)
  } else if (shape == "parallelogram") {
    base <- args$base
    height <- args$height
    side_a <- args$side_a
    side_b <- args$side_b
    area <- base * height
    perimeter <- 2 * (side_a + side_b)
  } else if (shape == "rhombus") {
    d1 <- args$d1
    d2 <- args$d2
    side <- args$side
    area <- 0.5 * d1 * d2
    perimeter <- 4 * side
  } else if (shape == "trapezium") {
    a <- args$a
    b <- args$b
    height <- args$height
    side_a <- args$side_a
    side_b <- args$side_b
    area <- 0.5 * (a + b) * height
    perimeter <- a + b + side_a + side_b
  } else {
    stop("Invalid shape. Choose a valid 2D shape.")
  }
  
  return(list(area = area, perimeter = perimeter))
}
# Example usage
result <- calculate_area_perimeter("triangle", 
                                   base = 6, 
                                   height = 4, 
                                   side_a = 5, 
                                   side_b = 6, 
                                   side_c = 7)
cat("Triangle - Area & Perimeter:", result$area, "and", result$perimeter, "\n")
Triangle - Area & Perimeter: 12 and 18 

3.3 What Is a Loop?

Loops allow us to execute the same code multiple times without rewriting it. Loops allow us to perform repetitive calculations for mathematical analysis and data processing. Types of Loops:

  • For Loop – Used when the number of repetitions is known.
  • While Loop – Used when repetitions depend on a condition.

3.3.1 Fibonacci Sequence

The Fibonacci sequence is a series of numbers where each number is the sum of the two preceding ones:

F(n)=F(n1)+F(n2)

Example: $0,1,1,2,3,5,8,13,21,\dots$

Python Code

def fibonacci(n):
    fib_series = [0, 1]
    for i in range(2, n):
        fib_series.append(fib_series[-1] + fib_series[-2])
    return fib_series

print(fibonacci(10))  # Output: [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

3.3.1.1 R Code

fibonacci <- function(n) {
  fib_series <- c(0, 1)
  for (i in 3:n) {
    fib_series <- c(fib_series, fib_series[i-1] + fib_series[i-2])
  }
  return(fib_series)
}

print(fibonacci(10))  # Output: 0 1 1 2 3 5 8 13 21 34
 [1]  0  1  1  2  3  5  8 13 21 34

3.3.2 Arithmetic & Geometric Sequences

This function generates a sequence based on the type specified: either an arithmetic sequence or a geometric sequence. For an arithmetic sequence, each term is obtained by adding a constant difference to the previous term. For a geometric sequence, each term is obtained by multiplying the previous term by a constant ratio.

Python Code

def generate_sequence(seq_type, n, a, d=None, r=None):
    """
    Generate an arithmetic or geometric sequence.

    Parameters:
        seq_type (str): Type of sequence - "arithmetic" or "geometric".
        n (int): The number of terms in the sequence.
        a (numeric): The first term of the sequence.
        d (numeric, optional): The common difference (required for arithmetic).
        r (numeric, optional): The common ratio (required for geometric).

    Returns:
        list: A list containing the generated sequence.
    """
    sequence = []
    if seq_type.lower() == "arithmetic":
        if d is None:
            raise ValueError("'d' must be provided for an arithmetic sequence")
        for i in range(n):
            sequence.append(a + i * d)
    elif seq_type.lower() == "geometric":
        if r is None:
            raise ValueError("'r' must be provided for a geometric sequence")
        for i in range(n):
            sequence.append(a * (r ** i))
    else:
        raise ValueError("seq_type must be either 'arithmetic' or 'geometric'")
    return sequence

# Example usage:
print(generate_sequence("arithmetic", 10, 1, d=2))
[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
print(generate_sequence("geometric", 10, 1, r=3))
[1, 3, 9, 27, 81, 243, 729, 2187, 6561, 19683]

R Code

generate_sequence <- function(seq_type, n, a, d = NULL, r = NULL) {
  #' Generate an arithmetic or geometric sequence.
  #'
  #' @param seq_type specifying the type of sequence:"arithmetic"/"geometric".
  #' @param n The number of terms in the sequence.
  #' @param a The first term of the sequence.
  #' @param d The common difference (required for arithmetic sequences).
  #' @param r The common ratio (required for geometric sequences).
  #'
  #' @return A numeric vector containing the generated sequence.
  
  sequence <- numeric(n)
  if (tolower(seq_type) == "arithmetic") {
    if (is.null(d)) stop("'d' must be provided for an arithmetic sequence.")
    for (i in 1:n) {
      sequence[i] <- a + (i - 1) * d
    }
  } else if (tolower(seq_type) == "geometric") {
    if (is.null(r)) stop("'r' must be provided for a geometric sequence.")
    for (i in 1:n) {
      sequence[i] <- a * (r^(i - 1))
    }
  } else {
    stop("seq_type must be either 'arithmetic' or 'geometric'")
  }
  return(sequence)
}

# Example usage:
print(generate_sequence("arithmetic", 10, 1, d = 2))
 [1]  1  3  5  7  9 11 13 15 17 19
print(generate_sequence("geometric", 10, 1, r = 3))
 [1]     1     3     9    27    81   243   729  2187  6561 19683

3.3.3 Simple Linear Regression

Linear regression is used to find the relationship between an independent variable X and a dependent variable Y:

Y=aX+b

where:

  • a is the slope
  • b is the intercept

Python Code

import numpy as np

# Data (X: study hours, Y: exam scores)
X = np.array([1, 2, 3, 4, 5])
Y = np.array([2, 4, 5, 4, 5])

# Calculate slope (a) and intercept (b)
n = len(X)
sum_x, sum_y = sum(X), sum(Y)
sum_xy = sum(X * Y)
sum_x2 = sum(X ** 2)

a = (n * sum_xy - sum_x * sum_y) / (n * sum_x2 - sum_x ** 2)
b = (sum_y - a * sum_x) / n

print(f"Linear Regression: Y = {a:.2f}X + {b:.2f}")
Linear Regression: Y = 0.60X + 2.20

R Code

# Data
X <- c(1, 2, 3, 4, 5)
Y <- c(2, 4, 5, 4, 5)

# Calculate slope (a) and intercept (b)
n <- length(X)
sum_x <- sum(X)
sum_y <- sum(Y)
sum_xy <- sum(X * Y)
sum_x2 <- sum(X^2)

a <- (n * sum_xy - sum_x * sum_y) / (n * sum_x2 - sum_x^2)
b <- (sum_y - a * sum_x) / n

print(paste("Linear Regression: Y =", round(a, 2), "X +", round(b, 2)))
[1] "Linear Regression: Y = 0.6 X + 2.2"

Functions and loops help us create simpler and more efficient code. By understanding these two concepts, we can write better and more readable programs.

3.4 Applied of Functions and Loops

Let’s apply these Functions and Loops to real-world data science tasks:

3.4.1 Creating a Dataset

Python Code

import pandas as pd
import random

def create_employee_dataset(num_employees):
    positions = {
        "Staff": (3000, 5000, 1, 5),
        "Supervisor": (5000, 8000, 5, 10),
        "Manager": (8000, 12000, 10, 15),
        "Director": (12000, 15000, 15, 25)
    }
    
    departments = ["Finance", "HR", "IT", "Marketing", "Operations", "Sales"]
    locations = ["New York", "Los Angeles", "Chicago", "Houston", "Phoenix"]
    
    data = {
        "ID_Number": [],
        "Position": [],
        "Salary": [],
        "Age": [],
        "Experience": [],
        "Department": [],
        "Location": []
    }
    
    for _ in range(num_employees):
        id_number = random.randint(10000, 99999)
        position = random.choice(list(positions.keys()))
        salary = random.randint(positions[position][0], 
                 positions[position][1])
        experience = random.randint(positions[position][2], 
                      positions[position][3])
        age = experience + random.randint(22, 35)  # aligns with experience
        department = random.choice(departments)
        location = random.choice(locations)
        
        data["ID_Number"].append(id_number)
        data["Position"].append(position)
        data["Salary"].append(salary)
        data["Age"].append(age)
        data["Experience"].append(experience)
        data["Department"].append(department)
        data["Location"].append(location)
    
    return pd.DataFrame(data)

# Create the employee dataset
df = create_employee_dataset(20)
print(df)
    ID_Number    Position  Salary  Age  Experience  Department     Location
0       31664     Manager    9469   40          13   Marketing      Phoenix
1       57548     Manager   10709   37          14  Operations      Phoenix
2       13562  Supervisor    5866   45          10       Sales      Houston
3       91248     Manager   11387   38          14       Sales     New York
4       72479    Director   13022   37          15     Finance      Phoenix
5       91592     Manager   10717   42          10          HR      Chicago
6       25909  Supervisor    5419   33           8          IT      Phoenix
7       32553     Manager   11872   43          13  Operations     New York
8       35805    Director   14026   55          24          IT      Phoenix
9       90914  Supervisor    7150   36           7   Marketing  Los Angeles
10      45298  Supervisor    7401   44          10  Operations      Chicago
11      29184       Staff    3792   27           4       Sales  Los Angeles
12      19068    Director   14157   49          23   Marketing     New York
13      28591  Supervisor    5114   33          10          IT  Los Angeles
14      33830    Director   13132   52          21     Finance      Phoenix
15      70002    Director   12521   55          22   Marketing      Houston
16      12044       Staff    4769   38           4  Operations      Houston
17      87472     Manager    8936   46          15          HR      Phoenix
18      79668       Staff    4248   39           4       Sales  Los Angeles
19      44143       Staff    4808   32           2  Operations  Los Angeles

R Code

create_employee_dataset <- function(num_employees) {
  # Define positions with corresponding salary and experience ranges
  positions <- list(
    "Staff" = c(3000, 5000, 1, 5),
    "Supervisor" = c(5000, 8000, 5, 10),
    "Manager" = c(8000, 12000, 10, 15),
    "Director" = c(12000, 15000, 15, 25)
  )
  
  # Define additional categorical data: departments and locations
  departments <- c("Finance", "HR", "IT", "Marketing", "Operations", "Sales")
  locations <- c("New York", "Los Angeles", "Chicago", "Houston", "Phoenix")
  
  # Initialize empty vectors for each column
  ID_Number <- integer(num_employees)
  Position <- character(num_employees)
  Salary <- integer(num_employees)
  Age <- integer(num_employees)
  Experience <- integer(num_employees)
  Department <- character(num_employees)
  Location <- character(num_employees)
  
  # Generate data for each employee
  for (i in 1:num_employees) {
    ID_Number[i] <- sample(10000:99999, 1)
    pos <- sample(names(positions), 1)
    Position[i] <- pos
    
    salary_range <- positions[[pos]][1:2]
    Salary[i] <- sample(salary_range[1]:salary_range[2], 1)
    
    exp_range <- positions[[pos]][3:4]
    Experience[i] <- sample(exp_range[1]:exp_range[2], 1)
    
    Age[i] <- Experience[i] + sample(22:35, 1)
    Department[i] <- sample(departments, 1)
    Location[i] <- sample(locations, 1)
  }
  
  # Combine the vectors into a data frame
  df <- data.frame(
    ID_Number = ID_Number,
    Position = Position,
    Salary = Salary,
    Age = Age,
    Experience = Experience,
    Department = Department,
    Location = Location,
    stringsAsFactors = FALSE
  )
  
  return(df)
}

# Example usage:
df <- create_employee_dataset(20)
print(df)
   ID_Number   Position Salary Age Experience Department    Location
1      11803 Supervisor   7547  30          7         IT     Houston
2      48461    Manager   8741  38         13      Sales Los Angeles
3      26434    Manager  11151  46         12 Operations Los Angeles
4      48427      Staff   3146  32          1    Finance     Phoenix
5      65893   Director  13552  50         20      Sales     Chicago
6      92659      Staff   4195  36          3      Sales     Houston
7      27991   Director  12675  44         19         IT     Chicago
8      30970   Director  13565  51         19         IT Los Angeles
9      15621   Director  14047  41         16 Operations     Phoenix
10     42111      Staff   3889  29          2         IT    New York
11     14887   Director  13916  50         17  Marketing Los Angeles
12     25807   Director  13133  49         23 Operations    New York
13     83427      Staff   4196  32          3      Sales     Houston
14     94366    Manager  10143  38         13    Finance     Phoenix
15     69326 Supervisor   7432  28          5  Marketing     Chicago
16     87057 Supervisor   6272  39          8         IT     Chicago
17     42149    Manager  10577  36         11 Operations    New York
18     38515    Manager  11753  39         11    Finance     Houston
19     47482   Director  13966  46         19         IT Los Angeles
20     35140   Director  13960  44         22    Finance     Phoenix

3.4.2 Basic Statistics

Python Code

import pandas as pd
import numpy as np

def manual_statistics(df, column=None):
    def stats_for_column(values):
        # Remove missing values for accurate computations
        values = values.dropna()
        if pd.api.types.is_numeric_dtype(values):
            count = len(values)
            mean_value = np.mean(values)
            median_value = np.median(values)
            variance_value = np.var(values, ddof=1) if count > 1 else 0
            std_dev_value = np.sqrt(variance_value)
            min_value = np.min(values)
            max_value = np.max(values)
            q1 = np.percentile(values, 25)
            q3 = np.percentile(values, 75)
            return {
                "count": count,
                "mean": mean_value,
                "median": median_value,
                "variance": variance_value,
                "std_dev": std_dev_value,
                "min": min_value,
                "q1": q1,
                "q3": q3,
                "max": max_value
            }
        else:
            count = len(values)
            unique_count = values.nunique()
            mode_series = values.mode()
            mode_value = mode_series.iloc[0] if not mode_series.empty else None
            frequency = values.value_counts().to_dict()
            return {
                "count": count,
                "unique": unique_count,
                "mode": mode_value,
                "frequency": frequency
            }

    if column is not None:
        return stats_for_column(df[column])
    else:
        summary = {}
        for col in df.columns:
            summary[col] = stats_for_column(df[col])
        return summary
# Get summary statistics for all columns
stats_all = manual_statistics(df)

# Display the results in attractive tables using pandas' to_markdown()
for col, stats in stats_all.items():
    print(f"\n### Summary Statistics for '{col}'\n")
    if pd.api.types.is_numeric_dtype(df[col]):
        # Create a DataFrame for numeric statistics with Statistic and Value 
        stats_df = pd.DataFrame({
            "Statistic": list(stats.keys()),
            "Value": list(stats.values())
        })
        print(stats_df.to_markdown(index=False))
    else:
        # For categorical data, create summary table and frequency distribution
        summary_df = pd.DataFrame({
            "Statistic": ["count", "unique", "mode"],
            "Value": [stats["count"], stats["unique"], stats["mode"]]
        })
        freq_dict = stats["frequency"]
        freq_df = pd.DataFrame({
            "Category": list(freq_dict.keys()),
            "Frequency": list(freq_dict.values())
        })
        print(summary_df.to_markdown(index=False))
        print("\n")
        print(freq_df.to_markdown(index=False))

### Summary Statistics for 'ID_Number'

| Statistic   |          Value |
|:------------|---------------:|
| count       |    20          |
| mean        | 49628.7        |
| median      | 39974          |
| variance    |     7.7402e+08 |
| std_dev     | 27821.2        |
| min         | 12044          |
| q1          | 29035.8        |
| q3          | 74276.2        |
| max         | 91592          |

### Summary Statistics for 'Position'

| Statistic   | Value   |
|:------------|:--------|
| count       | 20      |
| unique      | 4       |
| mode        | Manager |


| Category   |   Frequency |
|:-----------|------------:|
| Manager    |           6 |
| Supervisor |           5 |
| Director   |           5 |
| Staff      |           4 |

### Summary Statistics for 'Salary'

| Statistic   |           Value |
|:------------|----------------:|
| count       |    20           |
| mean        |  8925.75        |
| median      |  9202.5         |
| variance    |     1.29651e+07 |
| std_dev     |  3600.7         |
| min         |  3792           |
| q1          |  5342.75        |
| q3          | 12034.2         |
| max         | 14157           |

### Summary Statistics for 'Age'

| Statistic   |    Value |
|:------------|---------:|
| count       | 20       |
| mean        | 41.05    |
| median      | 39.5     |
| variance    | 58.7868  |
| std_dev     |  7.66726 |
| min         | 27       |
| q1          | 36.75    |
| q3          | 45.25    |
| max         | 55       |

### Summary Statistics for 'Experience'

| Statistic   |    Value |
|:------------|---------:|
| count       | 20       |
| mean        | 12.15    |
| median      | 11.5     |
| variance    | 43.2921  |
| std_dev     |  6.57967 |
| min         |  2       |
| q1          |  7.75    |
| q3          | 15       |
| max         | 24       |

### Summary Statistics for 'Department'

| Statistic   | Value      |
|:------------|:-----------|
| count       | 20         |
| unique      | 6          |
| mode        | Operations |


| Category   |   Frequency |
|:-----------|------------:|
| Operations |           5 |
| Marketing  |           4 |
| Sales      |           4 |
| IT         |           3 |
| Finance    |           2 |
| HR         |           2 |

### Summary Statistics for 'Location'

| Statistic   | Value   |
|:------------|:--------|
| count       | 20      |
| unique      | 5       |
| mode        | Phoenix |


| Category    |   Frequency |
|:------------|------------:|
| Phoenix     |           7 |
| Los Angeles |           5 |
| Houston     |           3 |
| New York    |           3 |
| Chicago     |           2 |

R code

library(knitr)
library(kableExtra)

manual_statistics <- function(df, column = NULL) {
  # Helper function to compute statistics for a single column
  stats_for_column <- function(values) {
    # Remove NA values for accurate computations
    values <- values[!is.na(values)]
    
    if (is.numeric(values)) {
      count <- length(values)
      mean_value <- mean(values)
      median_value <- median(values)
      variance_value <- if (count > 1) var(values) else 0
      std_dev_value <- sqrt(variance_value)
      min_value <- min(values)
      max_value <- max(values)
      q1 <- as.numeric(quantile(values, 0.25))
      q3 <- as.numeric(quantile(values, 0.75))
      
      return(list(
        count    = count,
        mean     = mean_value,
        median   = median_value,
        variance = variance_value,
        std_dev  = std_dev_value,
        min      = min_value,
        q1       = q1,
        q3       = q3,
        max      = max_value
      ))
    } else {
      count <- length(values)
      unique_count <- length(unique(values))
      tab <- table(values)
      mode_value <- names(tab)[which.max(tab)]
      frequency <- as.list(tab)
      
      return(list(
        count     = count,
        unique    = unique_count,
        mode      = mode_value,
        frequency = frequency
      ))
    }
  }
  
  # If a specific column is provided, compute statistics only for that column.
  if (!is.null(column)) {
    return(stats_for_column(df[[column]]))
  } else {
    # Otherwise, compute statistics for each column in the DataFrame.
    summary <- list()
    for (col in names(df)) {
      summary[[col]] <- stats_for_column(df[[col]])
    }
    return(summary)
  }
}
# Hitung summary statistics untuk semua kolom
stats_all <- manual_statistics(df)

# Loop untuk menampilkan hasil setiap kolom dengan DT::datatable
for (col in names(stats_all)) {
  cat(paste0("<h3>Summary Statistics for '", col, "'</h3>"))
  
  col_stats <- stats_all[[col]]
  
  if (is.numeric(df[[col]])) {
    stats_df <- data.frame(
      Statistic = names(col_stats),
      Value = as.numeric(unlist(col_stats)),
      stringsAsFactors = FALSE
    )
    print(DT::datatable(stats_df, 
                        caption = paste("Summary for", col),
                        options = list(pageLength = 5, autoWidth = TRUE)))
  } else {
    summary_df <- data.frame(
      Statistic = c("count", "unique", "mode"),
      Value = c(col_stats$count, col_stats$unique, col_stats$mode),
      stringsAsFactors = FALSE
    )
    freq_df <- as.data.frame(do.call(rbind, col_stats$frequency))
    freq_df <- cbind(Category = rownames(freq_df), freq_df)
    rownames(freq_df) <- NULL
    names(freq_df)[2] <- "Frequency"
    
    print(DT::datatable(summary_df, 
                        caption = paste("Summary for", col),
                        options = list(pageLength = 5, autoWidth = TRUE)))
    cat("<br>")
    print(DT::datatable(freq_df, 
                        caption = paste("Frequency Distribution for", col),
                        options = list(pageLength = 5, autoWidth = TRUE)))
  }
  
  cat("<br><br>")
}
FALSE <h3>Summary Statistics for 'ID_Number'</h3><br><br><h3>Summary Statistics for 'Position'</h3><br><br><br><h3>Summary Statistics for 'Salary'</h3><br><br><h3>Summary Statistics for 'Age'</h3><br><br><h3>Summary Statistics for 'Experience'</h3><br><br><h3>Summary Statistics for 'Department'</h3><br><br><br><h3>Summary Statistics for 'Location'</h3><br><br><br>