E-Commerce Transactions → CSV
Generate a shaped CSV dataset of purchases, refunds, and chargebacks for analytics, database seeding, or ML training.
Build a generator that produces an e-commerce transaction dataset as a CSV file. Transactions include purchases, refunds, and chargebacks with realistic distributions — 80% purchases, 15% refunds, 5% chargebacks. The result is a ready-to-import CSV with correlated fields (customer, product, amount).
What you'll build
The generator uses:
- linspace input — 10,000 timestamps spread evenly across a date range.
- chance picking mode — weighted random selection between three transaction types.
- file output with a template formatter — each event becomes a CSV row.
- JSON samples — product catalog loaded from a file.
- shared state — running revenue counter across all events.
The entire dataset is generated in one burst using sample mode (live_mode: false), so 10,000 rows complete in seconds.
Prerequisites
Project structure
Build it
Create the project directory
mkdir -p ecommerce-csv/{templates,data}
cd ecommerce-csvCreate the product catalog
A JSON sample file with product names, categories, and price ranges. Each transaction picks a random product from this list.
[
{ "name": "Wireless Mouse", "category": "Electronics", "min_price": 15, "max_price": 45 },
{ "name": "USB-C Hub", "category": "Electronics", "min_price": 25, "max_price": 80 },
{ "name": "Desk Lamp", "category": "Home", "min_price": 20, "max_price": 60 },
{ "name": "Notebook Set", "category": "Office", "min_price": 8, "max_price": 25 },
{ "name": "Bluetooth Speaker", "category": "Electronics", "min_price": 30, "max_price": 120 },
{ "name": "Coffee Mug", "category": "Home", "min_price": 10, "max_price": 30 },
{ "name": "Backpack", "category": "Accessories", "min_price": 35, "max_price": 90 },
{ "name": "Webcam", "category": "Electronics", "min_price": 40, "max_price": 150 },
{ "name": "Plant Pot", "category": "Home", "min_price": 12, "max_price": 35 },
{ "name": "Mechanical Keyboard", "category": "Electronics", "min_price": 60, "max_price": 200 }
]Write transaction templates
Each template produces a single CSV row. All three templates share the same column order: transaction_id, timestamp, type, customer_id, customer_name, product, category, amount, currency.
Purchase — the most common transaction. Picks a random product and generates a price within its range.
{% set product = module.rand.choice(samples.products) %}
{% set amount = module.rand.number.floating(product.min_price, product.max_price) %}
{% do shared.set("revenue", shared.get("revenue", 0) + amount) %}
{{ module.rand.crypto.uuid4() }},{{ timestamp.strftime("%Y-%m-%d %H:%M:%S") }},purchase,C-{{ module.rand.number.integer(10000, 99999) }},{{ module.faker.locale.en.name() }},{{ product.name }},{{ product.category }},{{ "%.2f" | format(amount) }},USDJSON sample rows support named access via object keys: product.name, product.category, product.min_price, product.max_price.
Refund — returns a product. The amount is negative.
{% set product = module.rand.choice(samples.products) %}
{% set amount = module.rand.number.floating(product.min_price, product.max_price) %}
{% do shared.set("revenue", shared.get("revenue", 0) - amount) %}
{{ module.rand.crypto.uuid4() }},{{ timestamp.strftime("%Y-%m-%d %H:%M:%S") }},refund,C-{{ module.rand.number.integer(10000, 99999) }},{{ module.faker.locale.en.name() }},{{ product.name }},{{ product.category }},-{{ "%.2f" | format(amount) }},USDChargeback — disputed transaction. Rare (5% of total).
{% set product = module.rand.choice(samples.products) %}
{% set amount = module.rand.number.floating(product.min_price, product.max_price) %}
{% do shared.set("revenue", shared.get("revenue", 0) - amount) %}
{{ module.rand.crypto.uuid4() }},{{ timestamp.strftime("%Y-%m-%d %H:%M:%S") }},chargeback,C-{{ module.rand.number.integer(10000, 99999) }},{{ module.faker.locale.en.name() }},{{ product.name }},{{ product.category }},-{{ "%.2f" | format(amount) }},USDThe shared.set("revenue", ...) call tracks a running total across all events. This doesn't appear in the CSV — it's an example of cross-template coordination. You could use it in a summary template or inspect it via the Studio State tab.
Configure the generator
The chance picking mode assigns a probability to each template. The chances must sum to 1.0.
input:
- linspace:
start: "2025-01-01T00:00:00"
end: "2025-12-31T23:59:59"
count: 10000
endpoint: true
event:
template:
mode: chance
samples:
products:
type: json
source: data/products.json
templates:
- purchase:
template: templates/purchase.jinja
chance: 0.80
- refund:
template: templates/refund.jinja
chance: 0.15
- chargeback:
template: templates/chargeback.jinja
chance: 0.05
output:
- file:
path: transactions.csv
write_mode: overwriteKey decisions:
- linspace spreads 10,000 timestamps evenly across the entire year 2025 — about one transaction every 53 minutes.
- chance mode: 80% of timestamps render the
purchasetemplate, 15%refund, 5%chargeback. - The file output uses
write_mode: overwriteso each run starts fresh. The defaultplainformatter passes each CSV row through as-is.
Run it
Use eventum generate in sample mode for fast, one-shot generation:
eventum generate --path generator.yml --id ecommerce --live-mode falseThe --live-mode false flag releases all 10,000 timestamps instantly instead of waiting for their scheduled wall-clock times. The entire dataset generates in a few seconds.
Add a CSV header and check the output:
sed -i '1i transaction_id,timestamp,type,customer_id,customer_name,product,category,amount,currency' transactions.csv
head -5 transactions.csvtransaction_id,timestamp,type,customer_id,customer_name,product,category,amount,currency
f47ac10b-58cc-4372-a567-0e02b2c3d479,2025-01-01 00:00:00,purchase,C-38472,John Smith,Wireless Mouse,Electronics,32.15,USD
8a3b5c7d-1234-4e6f-9abc-def012345678,2025-01-01 00:53:00,purchase,C-91034,Maria Garcia,Webcam,Electronics,89.50,USD
c9d1e2f3-4567-4890-abcd-ef1234567890,2025-01-01 01:46:00,refund,C-55219,Alex Johnson,Desk Lamp,Home,-42.30,USD
a1b2c3d4-5678-4901-bcde-f12345678901,2025-01-01 02:39:00,purchase,C-72841,Sarah Williams,Backpack,Accessories,67.80,USDYou should see approximately 8,000 purchases, 1,500 refunds, and 500 chargebacks. Check the distribution:
tail -n +2 transactions.csv | cut -d',' -f3 | sort | uniq -c | sort -rnGoing further
- Add customer loyalty — use
sharedstate to track repeat customers by keeping a customer pool and reusing IDs with some probability. - Seasonal patterns — replace
linspacewith time-patterns using a triangular distribution to create Black Friday and holiday spikes. - Multiple currencies — add
paramsfor currency and run multiple generators in parallel with different currency settings. - Database seeding — swap the
fileoutput for ClickHouse to populate a transactions table directly.