Eventum Logo

Eventum

E-Commerce Transactions → CSV

Generate a shaped CSV dataset of purchases, refunds, and chargebacks for analytics, database seeding, or ML training.

Build a generator that produces an e-commerce transaction dataset as a CSV file. Transactions include purchases, refunds, and chargebacks with realistic distributions — 80% purchases, 15% refunds, 5% chargebacks. The result is a ready-to-import CSV with correlated fields (customer, product, amount).

What you'll build

The generator uses:

  • linspace input — 10,000 timestamps spread evenly across a date range.
  • chance picking mode — weighted random selection between three transaction types.
  • file output with a template formatter — each event becomes a CSV row.
  • JSON samples — product catalog loaded from a file.
  • shared state — running revenue counter across all events.

The entire dataset is generated in one burst using sample mode (live_mode: false), so 10,000 rows complete in seconds.

Prerequisites

Project structure

generator.yml
purchase.jinja
refund.jinja
chargeback.jinja
products.json

Build it

Create the project directory

mkdir -p ecommerce-csv/{templates,data}
cd ecommerce-csv

Create the product catalog

A JSON sample file with product names, categories, and price ranges. Each transaction picks a random product from this list.

data/products.json
[
  { "name": "Wireless Mouse", "category": "Electronics", "min_price": 15, "max_price": 45 },
  { "name": "USB-C Hub", "category": "Electronics", "min_price": 25, "max_price": 80 },
  { "name": "Desk Lamp", "category": "Home", "min_price": 20, "max_price": 60 },
  { "name": "Notebook Set", "category": "Office", "min_price": 8, "max_price": 25 },
  { "name": "Bluetooth Speaker", "category": "Electronics", "min_price": 30, "max_price": 120 },
  { "name": "Coffee Mug", "category": "Home", "min_price": 10, "max_price": 30 },
  { "name": "Backpack", "category": "Accessories", "min_price": 35, "max_price": 90 },
  { "name": "Webcam", "category": "Electronics", "min_price": 40, "max_price": 150 },
  { "name": "Plant Pot", "category": "Home", "min_price": 12, "max_price": 35 },
  { "name": "Mechanical Keyboard", "category": "Electronics", "min_price": 60, "max_price": 200 }
]

Write transaction templates

Each template produces a single CSV row. All three templates share the same column order: transaction_id, timestamp, type, customer_id, customer_name, product, category, amount, currency.

Purchase — the most common transaction. Picks a random product and generates a price within its range.

templates/purchase.jinja
{% set product = module.rand.choice(samples.products) %}
{% set amount = module.rand.number.floating(product.min_price, product.max_price) %}
{% do shared.set("revenue", shared.get("revenue", 0) + amount) %}
{{ module.rand.crypto.uuid4() }},{{ timestamp.strftime("%Y-%m-%d %H:%M:%S") }},purchase,C-{{ module.rand.number.integer(10000, 99999) }},{{ module.faker.locale.en.name() }},{{ product.name }},{{ product.category }},{{ "%.2f" | format(amount) }},USD

JSON sample rows support named access via object keys: product.name, product.category, product.min_price, product.max_price.

Refund — returns a product. The amount is negative.

templates/refund.jinja
{% set product = module.rand.choice(samples.products) %}
{% set amount = module.rand.number.floating(product.min_price, product.max_price) %}
{% do shared.set("revenue", shared.get("revenue", 0) - amount) %}
{{ module.rand.crypto.uuid4() }},{{ timestamp.strftime("%Y-%m-%d %H:%M:%S") }},refund,C-{{ module.rand.number.integer(10000, 99999) }},{{ module.faker.locale.en.name() }},{{ product.name }},{{ product.category }},-{{ "%.2f" | format(amount) }},USD

Chargeback — disputed transaction. Rare (5% of total).

templates/chargeback.jinja
{% set product = module.rand.choice(samples.products) %}
{% set amount = module.rand.number.floating(product.min_price, product.max_price) %}
{% do shared.set("revenue", shared.get("revenue", 0) - amount) %}
{{ module.rand.crypto.uuid4() }},{{ timestamp.strftime("%Y-%m-%d %H:%M:%S") }},chargeback,C-{{ module.rand.number.integer(10000, 99999) }},{{ module.faker.locale.en.name() }},{{ product.name }},{{ product.category }},-{{ "%.2f" | format(amount) }},USD

The shared.set("revenue", ...) call tracks a running total across all events. This doesn't appear in the CSV — it's an example of cross-template coordination. You could use it in a summary template or inspect it via the Studio State tab.

Configure the generator

The chance picking mode assigns a probability to each template. The chances must sum to 1.0.

generator.yml
input:
  - linspace:
      start: "2025-01-01T00:00:00"
      end: "2025-12-31T23:59:59"
      count: 10000
      endpoint: true

event:
  template:
    mode: chance
    samples:
      products:
        type: json
        source: data/products.json
    templates:
      - purchase:
          template: templates/purchase.jinja
          chance: 0.80
      - refund:
          template: templates/refund.jinja
          chance: 0.15
      - chargeback:
          template: templates/chargeback.jinja
          chance: 0.05

output:
  - file:
      path: transactions.csv
      write_mode: overwrite

Key decisions:

  • linspace spreads 10,000 timestamps evenly across the entire year 2025 — about one transaction every 53 minutes.
  • chance mode: 80% of timestamps render the purchase template, 15% refund, 5% chargeback.
  • The file output uses write_mode: overwrite so each run starts fresh. The default plain formatter passes each CSV row through as-is.

Run it

Use eventum generate in sample mode for fast, one-shot generation:

eventum generate --path generator.yml --id ecommerce --live-mode false

The --live-mode false flag releases all 10,000 timestamps instantly instead of waiting for their scheduled wall-clock times. The entire dataset generates in a few seconds.

Add a CSV header and check the output:

sed -i '1i transaction_id,timestamp,type,customer_id,customer_name,product,category,amount,currency' transactions.csv
head -5 transactions.csv
transaction_id,timestamp,type,customer_id,customer_name,product,category,amount,currency
f47ac10b-58cc-4372-a567-0e02b2c3d479,2025-01-01 00:00:00,purchase,C-38472,John Smith,Wireless Mouse,Electronics,32.15,USD
8a3b5c7d-1234-4e6f-9abc-def012345678,2025-01-01 00:53:00,purchase,C-91034,Maria Garcia,Webcam,Electronics,89.50,USD
c9d1e2f3-4567-4890-abcd-ef1234567890,2025-01-01 01:46:00,refund,C-55219,Alex Johnson,Desk Lamp,Home,-42.30,USD
a1b2c3d4-5678-4901-bcde-f12345678901,2025-01-01 02:39:00,purchase,C-72841,Sarah Williams,Backpack,Accessories,67.80,USD

You should see approximately 8,000 purchases, 1,500 refunds, and 500 chargebacks. Check the distribution:

tail -n +2 transactions.csv | cut -d',' -f3 | sort | uniq -c | sort -rn

Going further

  • Add customer loyalty — use shared state to track repeat customers by keeping a customer pool and reusing IDs with some probability.
  • Seasonal patterns — replace linspace with time-patterns using a triangular distribution to create Black Friday and holiday spikes.
  • Multiple currencies — add params for currency and run multiple generators in parallel with different currency settings.
  • Database seeding — swap the file output for ClickHouse to populate a transactions table directly.

What's next

On this page