Skip to content

Data Generator Documentation

generate_age()

Generate a realistic age for a customer. The age is drawn from a normal distribution centered around 35 with a standard deviation of 10, but is then clamped to be within the range of 18 to 70.

Returns:

Name Type Description
int int

An integer representing the generated age, constrained between 18 and 70.

Source code in CRR/db/data_generator.py
 7
 8
 9
10
11
12
13
14
15
16
17
def generate_age() -> int:
    """
    Generate a realistic age for a customer. The age is drawn from a normal distribution
    centered around 35 with a standard deviation of 10, but is then clamped to be within
    the range of 18 to 70.

    Returns:
        int: An integer representing the generated age, constrained between 18 and 70.
    """
    age = int(random.normalvariate(35, 10))
    return max(18, min(age, 70))  # Ensure age is within a realistic range

generate_customer(customer_id)

Generate a dictionary representing a customer with various personal details.

Parameters:

Name Type Description Default
customer_id int

The unique identifier for the customer.

required

Returns:

Name Type Description
dict Dict[str, str or float]

A dictionary containing the customer's ID, full name, email address, age, phone number, address, and marital status.

Source code in CRR/db/data_generator.py
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
def generate_customer(customer_id: int) -> Dict[str, str or float]:
    """
    Generate a dictionary representing a customer with various personal details.

    Parameters:
        customer_id (int): The unique identifier for the customer.

    Returns:
        dict: A dictionary containing the customer's ID, full name, email address, age,
              phone number, address, and marital status.
    """
    return {
        "CustomerID": customer_id,
        "FullName": fake.name(),
        "EmailAddress": fake.email(),
        "Age": generate_age(),
        "PhoneNumber": fake.phone_number(),
        "Address": fake.address(),
        "Married": fake.random_element(elements=("Yes", "No"))
    }

generate_order(order_id, customer_id, product_id, order_date, quantity)

Generate a dictionary representing an order, linking a customer to a product and including order details.

Parameters:

Name Type Description Default
order_id int

The unique identifier for the order.

required
customer_id int

The unique identifier of the customer placing the order.

required
product_id int

The unique identifier of the ordered product.

required
order_date str

The date the order was placed.

required
quantity int

The quantity of the product ordered.

required

Returns:

Name Type Description
dict Dict[str, str or float]

A dictionary containing the order's ID, the customer's ID, the order date, the product's ID, and the quantity of the product ordered.

Source code in CRR/db/data_generator.py
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
def generate_order(order_id: int, customer_id: int, product_id: int, order_date: str, quantity: int) -> Dict[str, str or float]:
    """
    Generate a dictionary representing an order, linking a customer to a product and including order details.

    Parameters:
        order_id (int): The unique identifier for the order.
        customer_id (int): The unique identifier of the customer placing the order.
        product_id (int): The unique identifier of the ordered product.
        order_date (str): The date the order was placed.
        quantity (int): The quantity of the product ordered.

    Returns:
        dict: A dictionary containing the order's ID, the customer's ID, the order date,
              the product's ID, and the quantity of the product ordered.
    """
    return {
        "OrderID": order_id,
        "CustomerID": customer_id,
        "OrderDate": order_date,
        "ProductID": product_id,
        "Quantity": quantity
    }

generate_product(product_id)

Generate a dictionary representing a product with a name and price. The price range varies depending on the product ID.

Parameters:

Name Type Description Default
product_id int

The unique identifier for the product.

required

Returns:

Name Type Description
dict Dict[str, str or float]

A dictionary containing the product's ID, name, and price. Price varies depending on whether the product ID modulo 3 equals 0, 1, or 2.

Source code in CRR/db/data_generator.py
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
def generate_product(product_id: int) -> Dict[str, str or float]:
    """
    Generate a dictionary representing a product with a name and price.
    The price range varies depending on the product ID.

    Parameters:
        product_id (int): The unique identifier for the product.

    Returns:
        dict: A dictionary containing the product's ID, name, and price. Price varies
              depending on whether the product ID modulo 3 equals 0, 1, or 2.
    """
    # Different price ranges for different types of products
    price_range = (20, 1000) if product_id % 3 == 0 else (5, 100) if product_id % 3 == 1 else (1, 50)
    return {
        "ProductID": product_id,
        "ProductName": fake.word().capitalize(),
        "Price": round(random.uniform(*price_range), 2)
    }