Skip to content

CSV Generator Documentation

adjust_order_quantity(customer_id, num_customers)

Adjust order quantities based on customer segmentation to simulate different purchasing behaviors.

Parameters:

Name Type Description Default
customer_id int

Unique identifier for the customer.

required
num_customers int

Total number of customers in the dataset.

required

Returns:

Name Type Description
int int

The quantity of the product ordered, with higher quantities for less frequent customers to simulate bulk buying behavior.

Source code in CRR/db/GenerateCsv.py
29
30
31
32
33
34
35
36
37
38
39
40
41
def adjust_order_quantity(customer_id: int, num_customers: int) -> int:
    """
    Adjust order quantities based on customer segmentation to simulate different purchasing behaviors.

    Parameters:
        customer_id (int): Unique identifier for the customer.
        num_customers (int): Total number of customers in the dataset.

    Returns:
        int: The quantity of the product ordered, with higher quantities for less frequent customers
             to simulate bulk buying behavior.
    """
    return random.randint(1, 5) if customer_id <= num_customers // 2 else random.randint(5, 10)

generate_customers(num_customers)

Generate a DataFrame of customers with detailed personal and contact information.

Parameters:

Name Type Description Default
num_customers int

The number of customers to generate.

required

Returns:

Type Description
DataFrame

pd.DataFrame: A DataFrame containing detailed information for each customer, including unique identifiers and personal attributes.

Source code in CRR/db/GenerateCsv.py
43
44
45
46
47
48
49
50
51
52
53
54
55
def generate_customers(num_customers: int) -> pd.DataFrame:
    """
    Generate a DataFrame of customers with detailed personal and contact information.

    Parameters:
        num_customers (int): The number of customers to generate.

    Returns:
        pd.DataFrame: A DataFrame containing detailed information for each customer, including
                      unique identifiers and personal attributes.
    """
    customers = [generate_customer(i) for i in range(1, num_customers + 1)]
    return pd.DataFrame(customers)

generate_orders(num_customers, num_products, num_orders)

Generate a DataFrame of orders, linking customers and products with additional details like order dates and quantities.

Parameters:

Name Type Description Default
num_customers int

Total number of customers in the dataset.

required
num_products int

Total number of products available for order.

required
num_orders int

Number of orders to generate.

required

Returns:

Type Description
DataFrame

pd.DataFrame: A DataFrame containing order details, linking customer IDs with product IDs and including the date and quantity of each order.

Source code in CRR/db/GenerateCsv.py
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
def generate_orders(num_customers: int, num_products: int, num_orders: int) -> pd.DataFrame:
    """
    Generate a DataFrame of orders, linking customers and products with additional details like order dates and quantities.

    Parameters:
        num_customers (int): Total number of customers in the dataset.
        num_products (int): Total number of products available for order.
        num_orders (int): Number of orders to generate.

    Returns:
        pd.DataFrame: A DataFrame containing order details, linking customer IDs with product IDs and including
                      the date and quantity of each order.
    """
    orders = []
    for i in range(1, num_orders + 1):
        customer_id = random.choice(range(1, num_customers + 1))
        product_id = random.choice(range(1, num_products + 1))
        start_date, end_date = modify_order_dates(customer_id, num_customers)
        order_date = fake.date_between(start_date=start_date, end_date=end_date)
        quantity = adjust_order_quantity(customer_id, num_customers)
        orders.append(generate_order(i, customer_id, product_id, order_date, quantity))
    return pd.DataFrame(orders)

generate_products(num_products)

Generate a DataFrame of products with variable pricing and naming.

Parameters:

Name Type Description Default
num_products int

The number of products to generate.

required

Returns:

Type Description
DataFrame

pd.DataFrame: A DataFrame containing product details such as product ID, name, and price.

Source code in CRR/db/GenerateCsv.py
57
58
59
60
61
62
63
64
65
66
67
68
def generate_products(num_products: int) -> pd.DataFrame:
    """
    Generate a DataFrame of products with variable pricing and naming.

    Parameters:
        num_products (int): The number of products to generate.

    Returns:
        pd.DataFrame: A DataFrame containing product details such as product ID, name, and price.
    """
    products = [generate_product(i) for i in range(1, num_products + 1)]
    return pd.DataFrame(products)

modify_order_dates(customer_id, num_customers)

Modify order dates to introduce variation based on customer segments, simulating different levels of customer engagement over time.

Parameters:

Name Type Description Default
customer_id int

Unique identifier for the customer.

required
num_customers int

Total number of customers in the dataset.

required

Returns:

Type Description
Tuple[str, str]

Tuple[str, str]: Start and end dates for generating random order dates, with more recent dates for more engaged customer segments.

Source code in CRR/db/GenerateCsv.py
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def modify_order_dates(customer_id: int, num_customers: int) -> Tuple[str, str]:
    """
    Modify order dates to introduce variation based on customer segments, simulating different
    levels of customer engagement over time.

    Parameters:
        customer_id (int): Unique identifier for the customer.
        num_customers (int): Total number of customers in the dataset.

    Returns:
        Tuple[str, str]: Start and end dates for generating random order dates, with more recent
                         dates for more engaged customer segments.
    """
    if customer_id <= num_customers // 3:
        return '-30d', 'today'
    elif customer_id <= 2 * num_customers // 3:
        return '-90d', 'today'
    else:
        return '-1y', 'today'

save_to_csv(df, filename)

Save a DataFrame to a CSV file.

Parameters:

Name Type Description Default
df DataFrame

The DataFrame to save.

required
filename str

The path or filename to which the DataFrame should be saved.

required

Returns:

Type Description

None

Source code in CRR/db/GenerateCsv.py
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
def save_to_csv(df: pd.DataFrame, filename: str):
    """
    Save a DataFrame to a CSV file.

    Parameters:
        df (pd.DataFrame): The DataFrame to save.
        filename (str): The path or filename to which the DataFrame should be saved.

    Returns:
        None
    """
    df.to_csv(filename, index=False)