Data Seeding
Data seeding populates your database with initial data—whether it’s reference data for production, realistic test data for staging, or development fixtures for local testing. This guide covers environment-specific seed scripts and idempotent patterns.
Why Seed Data?
Different environments need different data:
flowchart LR
subgraph Local["Local Development"]
L1["Test users"]
L2["Sample posts"]
L3["Mock data"]
end
subgraph Staging["Staging"]
S1["QA test accounts"]
S2["Realistic data sets"]
S3["Edge case data"]
end
subgraph Production["Production"]
P1["Admin users"]
P2["Default categories"]
P3["System config"]
end
| Environment | Seed Purpose | Data Characteristics |
|---|---|---|
| Local | Development testing | Fake data, many records, diverse scenarios |
| Staging | QA and integration testing | Realistic data, test accounts, edge cases |
| Production | Initial setup | Minimal essential data, real admin accounts |
Seed Scripts Per Environment
Create separate seed scripts for each environment to handle their unique requirements.
Project Structure
Organize your seed scripts in a dedicated directory:
scripts/├── seed/│ ├── index.ts # Seed runner (entry point)│ ├── local.ts # Local development seeds│ ├── staging.ts # Staging environment seeds│ ├── production.ts # Production seeds (minimal)│ └── data/│ ├── users.ts # User seed data│ └── posts.ts # Post seed dataBase Seed Types
Define shared types for your seed data:
import type { InferInsertModel } from 'drizzle-orm';import type { users, posts } from '../../src/db/schema';
export type NewUser = InferInsertModel<typeof users>;export type NewPost = InferInsertModel<typeof posts>;
export interface SeedContext { environment: 'local' | 'staging' | 'production'; verbose: boolean;}Local Development Seeds
Local seeds include extensive test data for development:
import { drizzle } from 'drizzle-orm/d1';import { users, posts } from '../../src/db/schema';import type { NewUser, NewPost } from './types';
const localUsers: NewUser[] = [ { name: 'Alice Developer', email: 'alice@localhost.test', createdAt: Date.now() }, { name: 'Bob Tester', email: 'bob@localhost.test', createdAt: Date.now() }, { name: 'Charlie Admin', email: 'charlie@localhost.test', createdAt: Date.now() },];
const localPosts: NewPost[] = [ { title: 'Getting Started with Drizzle', content: 'This is a sample post for local development...', authorId: 1, createdAt: Date.now() }, { title: 'D1 Database Tips', content: 'Here are some tips for working with D1...', authorId: 1, createdAt: Date.now() }, { title: 'Testing Your API', content: 'Learn how to test your API endpoints...', authorId: 2, createdAt: Date.now() },];
export async function seedLocal(db: ReturnType<typeof drizzle>) { console.log('🌱 Seeding local database...');
// Insert users const insertedUsers = await db.insert(users).values(localUsers).returning(); console.log(` ✓ Inserted ${insertedUsers.length} users`);
// Insert posts const insertedPosts = await db.insert(posts).values(localPosts).returning(); console.log(` ✓ Inserted ${insertedPosts.length} posts`);
console.log('✅ Local seeding complete');}Staging Seeds
Staging seeds include realistic test accounts and QA data:
import { drizzle } from 'drizzle-orm/d1';import { users, posts } from '../../src/db/schema';import type { NewUser, NewPost } from './types';
const stagingUsers: NewUser[] = [ { name: 'QA Tester', email: 'qa@staging.example.com', createdAt: Date.now() }, { name: 'Staging Admin', email: 'admin@staging.example.com', createdAt: Date.now() },];
export async function seedStaging(db: ReturnType<typeof drizzle>) { console.log('🌱 Seeding staging database...');
const insertedUsers = await db.insert(users).values(stagingUsers).returning(); console.log(` ✓ Inserted ${insertedUsers.length} users`);
console.log('✅ Staging seeding complete');}Production Seeds
Production seeds are minimal—only essential reference data:
import { drizzle } from 'drizzle-orm/d1';import { users } from '../../src/db/schema';import type { NewUser } from './types';
// Only essential data for productionconst productionUsers: NewUser[] = [ { name: 'System Admin', email: 'admin@yourdomain.com', createdAt: Date.now() },];
export async function seedProduction(db: ReturnType<typeof drizzle>) { console.log('🌱 Seeding production database...');
const insertedUsers = await db.insert(users).values(productionUsers).returning(); console.log(` ✓ Inserted ${insertedUsers.length} admin users`);
console.log('✅ Production seeding complete');}Idempotent Seeding Pattern
Idempotent seeds can run multiple times without creating duplicate data. This is crucial for reliability.
Why Idempotency Matters
flowchart TD
A["Run seed script"] --> B{Data exists?}
B -->|No| C["Insert seed data"]
B -->|Yes| D["Skip or update"]
C --> E["✅ Seeding complete"]
D --> E
Without idempotency:
- Running seeds twice creates duplicate data
- Failed partial seeds leave inconsistent state
- Team members get different data
With idempotency:
- Seeds are safe to run anytime
- Recovery from failures is straightforward
- Consistent state across all environments
Pattern 1: Check Before Insert
The simplest approach—check if data exists before inserting:
import { eq } from 'drizzle-orm';import { drizzle } from 'drizzle-orm/d1';import { users } from '../../src/db/schema';
export async function seedUsersIdempotent(db: ReturnType<typeof drizzle>) { // Check if seed data already exists const existingUsers = await db.select().from(users).limit(1);
if (existingUsers.length > 0) { console.log(' ⏭️ Users already seeded, skipping...'); return; }
// Insert seed data await db.insert(users).values([ { name: 'Alice', email: 'alice@example.com', createdAt: Date.now() }, { name: 'Bob', email: 'bob@example.com', createdAt: Date.now() }, ]);
console.log(' ✓ Users seeded');}Pattern 2: Upsert by Unique Key
Use unique constraints to update existing records or insert new ones:
import { eq, sql } from 'drizzle-orm';import { drizzle } from 'drizzle-orm/d1';import { users } from '../../src/db/schema';import type { NewUser } from './types';
async function upsertUser( db: ReturnType<typeof drizzle>, user: NewUser) { // Check if user exists by email (unique key) const existing = await db .select() .from(users) .where(eq(users.email, user.email)) .limit(1);
if (existing.length > 0) { // Update existing user await db .update(users) .set({ name: user.name }) .where(eq(users.email, user.email)); console.log(` ↻ Updated user: ${user.email}`); } else { // Insert new user await db.insert(users).values(user); console.log(` ✓ Created user: ${user.email}`); }}
export async function seedUsersUpsert(db: ReturnType<typeof drizzle>) { const seedUsers: NewUser[] = [ { name: 'Alice', email: 'alice@example.com', createdAt: Date.now() }, { name: 'Bob', email: 'bob@example.com', createdAt: Date.now() }, ];
for (const user of seedUsers) { await upsertUser(db, user); }}Pattern 3: Using SQLite’s INSERT OR REPLACE
For simpler cases, use SQLite’s conflict resolution:
import { sql } from 'drizzle-orm';import { drizzle } from 'drizzle-orm/d1';
export async function seedWithConflictResolution(db: ReturnType<typeof drizzle>) { // Using raw SQL for INSERT OR IGNORE // This skips records that would violate unique constraints await db.run(sql` INSERT OR IGNORE INTO users (name, email, created_at) VALUES ('Alice', 'alice@example.com', ${Date.now()}), ('Bob', 'bob@example.com', ${Date.now()}) `);
console.log(' ✓ Users seeded (duplicates ignored)');}Pattern 4: Marker-Based Seeding
Track which seed versions have been applied:
import { eq, sql } from 'drizzle-orm';import { drizzle } from 'drizzle-orm/d1';import { sqliteTable, text, integer } from 'drizzle-orm/sqlite-core';
// Create a seed tracking tableconst seedMarkers = sqliteTable('_seed_markers', { id: integer('id').primaryKey({ autoIncrement: true }), name: text('name').notNull().unique(), appliedAt: integer('applied_at').notNull(),});
async function isSeedApplied( db: ReturnType<typeof drizzle>, seedName: string): Promise<boolean> { const result = await db .select() .from(seedMarkers) .where(eq(seedMarkers.name, seedName)) .limit(1); return result.length > 0;}
async function markSeedApplied( db: ReturnType<typeof drizzle>, seedName: string) { await db.insert(seedMarkers).values({ name: seedName, appliedAt: Date.now(), });}
export async function seedWithMarkers(db: ReturnType<typeof drizzle>) { const SEED_NAME = 'initial-users-v1';
// Check if this seed was already applied if (await isSeedApplied(db, SEED_NAME)) { console.log(` ⏭️ Seed "${SEED_NAME}" already applied, skipping...`); return; }
// Apply seed data await db.run(sql` INSERT INTO users (name, email, created_at) VALUES ('Alice', 'alice@example.com', ${Date.now()}), ('Bob', 'bob@example.com', ${Date.now()}) `);
// Mark seed as applied await markSeedApplied(db, SEED_NAME); console.log(` ✓ Seed "${SEED_NAME}" applied`);}Complete Seed Script Example
Here’s a complete, production-ready seed runner:
import { drizzle } from 'drizzle-orm/d1';import { eq } from 'drizzle-orm';import { users, posts } from '../../src/db/schema';import type { NewUser, NewPost } from './types';
// =============================================================================// Seed Data// =============================================================================
const seedData = { local: { users: [ { name: 'Alice Developer', email: 'alice@localhost.test', createdAt: Date.now() }, { name: 'Bob Tester', email: 'bob@localhost.test', createdAt: Date.now() }, { name: 'Charlie Admin', email: 'charlie@localhost.test', createdAt: Date.now() }, ] as NewUser[], posts: [ { title: 'Getting Started', content: 'Welcome to the app!', authorId: 1, createdAt: Date.now() }, { title: 'Advanced Tips', content: 'Here are some tips...', authorId: 2, createdAt: Date.now() }, ] as NewPost[], }, staging: { users: [ { name: 'QA Tester', email: 'qa@staging.example.com', createdAt: Date.now() }, { name: 'Staging Admin', email: 'admin@staging.example.com', createdAt: Date.now() }, ] as NewUser[], posts: [] as NewPost[], }, production: { users: [ { name: 'System Admin', email: 'admin@yourdomain.com', createdAt: Date.now() }, ] as NewUser[], posts: [] as NewPost[], },};
// =============================================================================// Idempotent Seed Functions// =============================================================================
async function seedUsers( db: ReturnType<typeof drizzle>, usersToSeed: NewUser[]): Promise<number> { let seededCount = 0;
for (const user of usersToSeed) { // Check if user exists by email const existing = await db .select() .from(users) .where(eq(users.email, user.email)) .limit(1);
if (existing.length === 0) { await db.insert(users).values(user); seededCount++; } }
return seededCount;}
async function seedPosts( db: ReturnType<typeof drizzle>, postsToSeed: NewPost[]): Promise<number> { let seededCount = 0;
for (const post of postsToSeed) { // Check if post exists by title and author const existing = await db .select() .from(posts) .where(eq(posts.title, post.title)) .limit(1);
if (existing.length === 0) { await db.insert(posts).values(post); seededCount++; } }
return seededCount;}
// =============================================================================// Main Seed Runner// =============================================================================
type Environment = 'local' | 'staging' | 'production';
export async function seed( db: ReturnType<typeof drizzle>, environment: Environment = 'local') { console.log(`\n🌱 Starting ${environment} database seed...\n`);
const data = seedData[environment];
if (!data) { throw new Error(`Unknown environment: ${environment}`); }
// Seed users const usersSeeded = await seedUsers(db, data.users); if (usersSeeded > 0) { console.log(` ✓ Seeded ${usersSeeded} new user(s)`); } else { console.log(` ⏭️ Users already seeded`); }
// Seed posts (if any) if (data.posts.length > 0) { const postsSeeded = await seedPosts(db, data.posts); if (postsSeeded > 0) { console.log(` ✓ Seeded ${postsSeeded} new post(s)`); } else { console.log(` ⏭️ Posts already seeded`); } }
console.log(`\n✅ ${environment} seeding complete\n`);}
// =============================================================================// CLI Entry Point// =============================================================================
// This would be called from your package.json scripts// Example: ts-node scripts/seed/index.ts localPackage.json Scripts
Add seed commands to your package.json:
{ "scripts": { "seed:local": "wrangler d1 execute DB --local --file=scripts/seed-local.sql", "seed:staging": "wrangler d1 execute DB --remote --env staging --file=scripts/seed-staging.sql", "seed:production": "wrangler d1 execute DB --remote --file=scripts/seed-production.sql" }}SQL-Based Seeds
For simpler projects, use SQL files directly:
-- Local development seed data-- Idempotent: uses INSERT OR IGNORE to skip existing records
-- Seed usersINSERT OR IGNORE INTO users (id, name, email, created_at)VALUES (1, 'Alice Developer', 'alice@localhost.test', 1705312200000), (2, 'Bob Tester', 'bob@localhost.test', 1705312200000), (3, 'Charlie Admin', 'charlie@localhost.test', 1705312200000);
-- Seed postsINSERT OR IGNORE INTO posts (id, title, content, author_id, created_at)VALUES (1, 'Getting Started with Drizzle', 'This is a sample post...', 1, 1705312200000), (2, 'D1 Database Tips', 'Here are some tips...', 1, 1705312200000), (3, 'Testing Your API', 'Learn how to test...', 2, 1705312200000);-- Staging seed data-- Idempotent: uses INSERT OR IGNORE to skip existing records
INSERT OR IGNORE INTO users (id, name, email, created_at)VALUES (1, 'QA Tester', 'qa@staging.example.com', 1705312200000), (2, 'Staging Admin', 'admin@staging.example.com', 1705312200000);-- Production seed data (minimal)-- Idempotent: uses INSERT OR IGNORE to skip existing records
INSERT OR IGNORE INTO users (id, name, email, created_at)VALUES (1, 'System Admin', 'admin@yourdomain.com', 1705312200000);Running Seeds
flowchart TD
A["Choose environment"] --> B{Which env?}
B -->|Local| C["pnpm seed:local"]
B -->|Staging| D["pnpm seed:staging"]
B -->|Production| E["pnpm seed:production"]
C --> F["Uses --local flag"]
D --> G["Uses --env staging"]
E --> H["Uses --remote flag"]
F --> I["Seeds local SQLite"]
G --> J["Seeds staging D1"]
H --> K["Seeds production D1"]
Local Development
# Apply seed data to local databasepnpm seed:localExpected output:
🌀 Executing on local database DB...Running query: INSERT OR IGNORE INTO users...✅ Query succeededRunning query: INSERT OR IGNORE INTO posts...✅ Query succeededStaging Environment
# Apply seed data to stagingpnpm seed:stagingExpected output:
🌀 Executing on remote database DB (f1e2d3c4-...)Running query: INSERT OR IGNORE INTO users...✅ Query succeededProduction Environment
# Apply seed data to production (use with caution!)pnpm seed:productionBest Practices
1. Always Use Idempotent Patterns
Every seed script should be safe to run multiple times:
// ✅ Good: Checks before insertingconst existing = await db.select().from(users).where(eq(users.email, email));if (existing.length === 0) { await db.insert(users).values(user);}
// ❌ Bad: Will fail or create duplicates on re-runawait db.insert(users).values(user);2. Keep Production Seeds Minimal
Production seeds should only include essential reference data:
// ✅ Good: Only essential dataconst productionSeeds = [ { name: 'System Admin', email: 'admin@company.com' },];
// ❌ Bad: Test data in productionconst productionSeeds = [ { name: 'Test User 1', email: 'test1@example.com' }, { name: 'Test User 2', email: 'test2@example.com' }, // ... 50 more test users];3. Use Realistic Data for Staging
Staging should mirror production scenarios:
// ✅ Good: Realistic staging dataconst stagingUsers = [ { name: 'QA Lead', email: 'qa-lead@staging.company.com' }, { name: 'Edge Case User', email: 'special+chars@staging.company.com' }, { name: '', email: 'empty-name@staging.company.com' }, // Test edge case];4. Version Your Seed Data
Track seed versions for reproducibility:
export const SEED_VERSION = '1.0.0';
// Include version in seed markersawait markSeedApplied(db, `users-${SEED_VERSION}`);5. Log Seed Operations
Always provide feedback during seeding:
console.log(`🌱 Starting ${environment} seed...`);console.log(` ✓ Seeded ${count} users`);console.log(` ⏭️ Posts already exist, skipping`);console.log(`✅ Seeding complete`);Quick Reference
| Task | Command |
|---|---|
| Seed local database | pnpm seed:local |
| Seed staging database | pnpm seed:staging |
| Seed production database | pnpm seed:production |
| Execute raw SQL locally | pnpm wrangler d1 execute DB --local --command "..." |
| Execute raw SQL on staging | pnpm wrangler d1 execute DB --remote --env staging --command "..." |
Summary
Data seeding is essential for consistent development and testing:
- Environment-specific seeds ensure each environment has appropriate data
- Idempotent patterns make seeds safe to run multiple times
- Production seeds should be minimal and contain only essential data
- Staging seeds should include realistic test scenarios
- Local seeds can be extensive for thorough development testing
By following these patterns, you’ll have reliable, repeatable database seeding across all environments.