Infrastructure Plan · v1.0

OneSummer Infrastructure

Scale-to-zero architecture for extreme seasonal load. COPPA-compliant from day one.

Stack SvelteKit + Docker + Postgres
Peak Season Feb – May
Off-Season Cost $0 – $20 / mo
Peak Cost $50 – $200 / mo
Compliance COPPA · FERPA-adjacent
00

Architecture Diagram

The architecture is split into three capability zones: a static CDN edge for the SvelteKit frontend, a serverless/container API tier that scales to zero in the off-season, and a managed database tier with connection pooling to absorb burst traffic during Feb–May.

System Architecture — OneSummer
CLIENT CDN EDGE API / SERVERLESS — SCALE TO ZERO DATA TIER Browser SvelteKit SPA Mobile Web Responsive PWA Static CDN Host Netlify · Vercel · Cloudflare Pages Global PoPs CDN + TLS DDoS protection Container API Fly.io · Railway · Cloud Run Docker · scale-to-zero Auth Service Clerk · Auth0 Parental consent flow Object Storage S3 · R2 · GCS Docs · profile images Email Resend · Postmark Transactional Connection Pooler PgBouncer · Supabase Pooler Transaction mode Postgres Primary Neon · Supabase · RDS Managed · autoscale Read Replica Search / Reports Peak season only Redis Cache Upstash (per-req) Sessions · rate limit --- API call / data flow Orange border = recommended vendor tier Green zone = scale-to-zero
Seasonality is the primary design constraint

80% of all traffic arrives February through May. The architecture must cost near nothing in summer and fall while being capable of handling concurrent application spikes during peak admission season — without manual intervention.

01

Design Principles

Sep
$0–5
Oct
$0–5
Nov
$0–5
Dec
$0–5
Jan
$10–30
Feb
$50–120
Mar
$80–200
Apr
$80–180
May
$40–100
Jun
$5–15
Jul
$0–5
Aug
$0–5

Scale to Zero

Every component that can auto-scale to zero must do so. No standing infrastructure during Jun–Jan except the database (minimum tier).

🔒

Security by Default

COPPA compliance is non-negotiable. Children's PII must be encrypted at rest and in transit, with parental consent gating all data collection.

Managed Over Self-Hosted

Prefer managed services to minimize operational burden. A small team should not be running Postgres or Redis servers manually.

🔁

Vendor Agnostic Design

Capabilities are defined first; vendor names are recommendations. Avoid proprietary lock-in at the data and compute layers.

02

Frontend Hosting

The SvelteKit frontend compiles to static assets (HTML, CSS, JS) at build time via adapter-static or SSR edge functions. These assets are served from a global CDN with no compute cost per request. This is the most cost-effective option and scales to any traffic level automatically.

💡
SSR vs. Static

For OneSummer, use static prerendering for all public-facing pages (discovery, camp profiles, marketing) and client-side rendering behind authentication. This eliminates serverless function invocations for the vast majority of page loads, reducing cost and latency.

Recommended: Netlify

Netlify's free tier covers 100 GB bandwidth and 300 build minutes per month — sufficient for off-season and early growth. Their SvelteKit adapter is well-maintained and build previews per PR are included at no cost.

Alternatives

Provider Free Tier SvelteKit Support Trade-offs
Netlify Recommended 100 GB BW, 300 build min Official adapter Best DX, easy forms/functions
Vercel 100 GB BW, hobby unlimited First-class Better if using edge functions heavily
Cloudflare Pages Unlimited BW Via adapter-cloudflare Best performance/cost at scale; DX slightly rougher
AWS Amplify / S3+CloudFront 12-mo free tier Manual config Most control; most setup overhead

Seasonal Configuration

  • No seasonal tuning required — CDN cost scales linearly with traffic and is effectively zero during low season on free/pro tiers.
  • Set aggressive cache headers: Cache-Control: public, max-age=31536000, immutable on hashed assets.
  • Enable HTTP/2 and Brotli compression (default on all recommended providers).
  • Use _headers / netlify.toml to set X-Frame-Options, Content-Security-Policy, and Permissions-Policy at the CDN edge — zero compute cost.
03

Backend / API

The API is a Docker container running a standard HTTP server (Fastify, Hono, or similar). The critical requirement is scale-to-zero: during the off-season, the container should cost nothing when idle. This narrows the field to platforms with built-in zero-scaling.

Cold Start Awareness

Scale-to-zero platforms impose cold start latency (typically 500ms–2s for Node containers). For OneSummer's use case — asynchronous application submissions, not real-time gaming — this is an acceptable trade-off. Implement a lightweight health-check keep-alive for the peak Feb–May window only if cold starts cause user-visible delays.

Recommended: Fly.io

Fly.io supports Docker containers with native scale-to-zero via min_machines_running = 0. Machines wake on the first inbound request. The free tier includes 3 shared-CPU VMs and 3 GB storage — enough for the API + background workers.

Alternatives

Provider Scale-to-Zero Free Tier Notes
Fly.io Recommended Yes 3 VMs, 3 GB Low cold starts; great CLI; global regions
Railway Yes $5 credit/mo Excellent DX; simpler ops than Fly
Google Cloud Run Yes 2M req/mo free Best at true serverless scale; GCP ecosystem
AWS App Runner Min 1 instance None Does not fully scale to zero; ~$7/mo minimum
Azure Container Apps Yes 180,000 vCPU-sec Good scale-to-zero; KEDA-based autoscaling

Container Configuration

# fly.toml — peak season config
[http_service]
  internal_port = 3000
  force_https   = true

  [http_service.concurrency]
    type       = "requests"
    soft_limit = 200   # scale up above this
    hard_limit = 250

[[vm]]
  size               = "shared-cpu-1x"
  memory             = "512mb"

[autoscale]
  min_machines_running = 0   # ← scale to zero off-season
  max_machines_running = 10  # ← burst capacity for Feb–May

Object Storage

Application documents, uploaded profile photos, and camp media are stored in S3-compatible object storage separate from the container. Cloudflare R2 is recommended for zero egress fees. Alternatives: AWS S3 Backblaze B2.

Email Delivery

Transactional emails (application confirmations, parental consent requests, status updates) require a dedicated provider. Resend offers 3,000 free emails/month with React Email template support. Alternative: Postmark.

04

Database

OneSummer requires a managed PostgreSQL database. The database cannot scale to zero — it must persist all user data year-round — but it can run on a minimal instance during the off-season and scale up for Feb–May. Connection pooling is critical: serverless/container platforms open many short-lived connections; without pooling, Postgres will exhaust its connection limit under modest load.

Connection Exhaustion is a Real Production Failure Mode

A single Fly.io machine at 250 concurrent requests, each holding a Postgres connection, will hit a max_connections limit almost immediately on a small instance. PgBouncer in transaction mode allows thousands of application-level requests to share a small pool of actual Postgres connections. This is not optional for the Feb–May peak season.

Recommended: Neon

Neon is a serverless Postgres provider that separates storage from compute. The free tier includes 0.5 GB storage, autoscaling compute, and a built-in connection pooler (PgBouncer). Compute scales to zero after a period of inactivity — during the off-season, the database compute cost approaches zero while data remains durable.

Alternatives

Provider Pooler Included Scale-to-Zero Free Tier Notes
Neon Recommended Yes (PgBouncer) Compute only 0.5 GB, 0.25 vCPU Best for serverless; branching for dev/staging
Supabase Yes (Supavisor) Project pause 500 MB, pauses after inactivity Also provides Auth, Storage, Realtime
PlanetScale (Vitess) Yes No 5 GB MySQL-compatible; not Postgres
AWS RDS / Aurora Serverless v2 RDS Proxy ($) Aurora Serverless None Most control; higher cost; ideal at growth scale
Fly Postgres Self-managed No Included in Fly plan Not fully managed; avoid unless ops-mature

Connection Pooling Architecture

# Application → PgBouncer → Postgres
# Use TRANSACTION mode for serverless/container workloads

DATABASE_URL="postgresql://user:pass@pooler.neon.tech:5432/onesummer?pgbouncer=true"
DATABASE_DIRECT_URL="postgresql://user:pass@ep-xxx.neon.tech:5432/onesummer"
#                  ↑ direct connection for migrations only

# Recommended pool settings (Neon default)
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 20   # actual Postgres connections
📒
Neon Database Branching

Neon's branching feature creates instant, copy-on-write database snapshots. Use this for: (1) staging branch that mirrors production schema, (2) per-PR preview database branches in CI, and (3) safe migration testing before applying to production.

Migration Strategy

Use Drizzle ORM or Prisma for schema management. Run migrations via the direct (non-pooled) connection URL only. Never run ALTER TABLE through the pooler in transaction mode — it will fail on long-running DDL statements.

Backup Policy

  • Neon provides point-in-time recovery (PITR) up to 7 days on free tier, 30 days on Pro.
  • Take a manual pg_dump backup before every major migration and store in object storage.
  • Set up a daily automated backup job via GitHub Actions cron during peak season.
05

CI / CD Pipeline

All deployments flow through GitHub Actions. The pipeline enforces test passage, security scanning, and preview deployment before any changes reach production.

📄
Pull Request
Feature branch
CI Checks
Lint, type, test
👀
Preview Deploy
Staging + DB branch
🚀
Merge to main
Auto-deploy prod
📊
Smoke Tests
Post-deploy checks

GitHub Actions Workflow Structure

# .github/workflows/
ci.yml           # runs on every PR
  ├─ lint (eslint + prettier)
  ├─ type-check (tsc --noEmit)
  ├─ unit tests (vitest)
  ├─ integration tests (against Neon branch)
  └─ security scan (npm audit + semgrep)

preview.yml      # runs on PR open/update
  ├─ build Docker image
  ├─ deploy API to Fly.io preview app
  ├─ deploy frontend to Netlify draft URL
  └─ create/update Neon DB branch

deploy.yml       # runs on merge to main
  ├─ build + push Docker image to registry
  ├─ run database migrations (direct URL)
  ├─ deploy to Fly.io production
  ├─ deploy frontend to Netlify production
  └─ post-deploy smoke tests

backup.yml       # cron: 0 2 * * * (2am daily, peak season)
  └─ pg_dump → compress → upload to R2

Container Registry

Push Docker images to GitHub Container Registry (ghcr.io) — free for public repos, $0.008/GB for private. Alternative: Docker Hub. Tag images with the Git SHA for deterministic rollbacks.

Secrets Management

Store all credentials in GitHub Actions Secrets (never in code or .env committed to the repo). Use environment-scoped secrets in GitHub to prevent staging secrets from reaching production jobs. Rotate the Neon database password and Fly API token quarterly.

06

Environments

Environment Purpose Frontend API Database Trigger
Production prod Live user traffic Netlify prod domain Fly.io prod app Neon main branch Merge to main
Staging stage Pre-release validation Netlify branch deploy Fly.io staging app Neon staging branch Merge to staging
Preview preview Per-PR review Netlify deploy preview Fly.io ephemeral app Neon PR branch PR opened/updated
Local local Developer machine localhost:5173 localhost:3000 Docker Postgres or Neon dev branch Manual
💡
Preview Environments and COPPA

Preview environments must never contain real user PII. Use synthetic seed data only. Neon's branching creates an empty schema branch — populate it with db:seed using anonymized test data. Document this in the contributing guide so all developers follow it.

07

Monitoring & Alerting

Observability for a seasonal platform has two distinct modes: a low-overhead baseline during the off-season, and active monitoring during the Feb–May peak window.

🔍

Error Tracking

Sentry (recommended) — free tier covers 5K errors/month. Captures frontend and backend exceptions with stack traces and context. Alternative: Highlight.io.

📊

Metrics & APM

Fly.io built-in metrics for CPU/memory/request latency. For more depth: Grafana Cloud free tier (10K metrics). Alternative: Datadog (expensive at scale).

🟢

Uptime / Synthetic

Better Uptime or UptimeRobot — free tier checks every 3 minutes. Alert on HTTP 5xx from the API health endpoint. Critical during Feb–May.

📋

Structured Logging

Fly.io log drainLogtail / Better Stack. Free tier: 1 GB/month. Use structured JSON logs with request IDs to correlate frontend errors to API calls.

Alerting Thresholds

MetricWarningCriticalAction
API error rate> 1%> 5%PagerDuty / Slack
P95 API latency> 800ms> 2000msScale up + investigate
DB connection usage> 70%> 90%Increase pool size
Disk usage (DB)> 70%> 90%Upgrade storage tier
Fly machine count≥ 8 machinesCapacity review
Uptime check failure1 failure3 consecutiveImmediate page

Seasonal Runbook

  • January 15: Enable uptime monitoring alerts. Review Fly.io autoscale limits. Verify connection pooler is healthy.
  • February 1: Switch to active monitoring mode. Set Sentry alert frequency to immediate. Enable daily DB backup job.
  • May 31: Disable expensive alerting. Reduce DB to minimum tier. Pause read replica.
  • Off-season: Weekly uptime check is sufficient. Sentry digest mode.
08

Cost Projections

All estimates assume a single-founder or small team operating at early-stage scale (thousands of users, not millions). Costs grow linearly with adoption; the architecture supports this without structural changes.

Monthly Infrastructure Cost — Annual View
$5
Sep
$5
Oct
$5
Nov
$5
Dec
$25
Jan
$90
Feb
$140
Mar
$130
Apr
$70
May
$12
Jun
$5
Jul
$5
Aug
Off-season ($0–$10/mo)
Ramp ($10–$30/mo)
Peak season ($50–$200/mo)

Cost Breakdown by Service

ServiceProviderOff-SeasonPeak SeasonScaling Trigger
Frontend CDN Netlify $0 $0 – $19/mo BW > 100 GB/mo
Container API Fly.io $0 (scaled to zero) $10 – $80/mo Concurrent requests
Postgres (compute) Neon $0 – $5/mo $19 – $69/mo Manual tier upgrade
Postgres (storage) Neon $0 – $3/mo $3 – $15/mo Data volume
Redis Cache Upstash $0 $0 – $10/mo Requests > 10K/day
Object Storage Cloudflare R2 $0 $0 – $5/mo Storage > 10 GB
Email Resend $0 $0 – $20/mo Emails > 3K/mo
Auth Clerk $0 $0 – $25/mo MAU > 10K
Monitoring Sentry + Logtail $0 $0 – $26/mo Error/log volume
Estimated Total $0 – $8/mo $50 – $269/mo
💰
Estimated Annual Spend: $200 – $600

The annual total depends heavily on user adoption during the Feb–May window. At early stage (hundreds of applicants), expect the lower bound. At thousands of concurrent users, expect $400–$600/year — still dramatically less than a traditional always-on infrastructure model that would cost $2,000–$6,000+ annually for the same capability.

09

Security & COPPA Compliance

COPPA applies if any user is under 13

The Children's Online Privacy Protection Act (COPPA) requires verifiable parental consent before collecting any personal information from children under 13. Violations carry civil penalties up to $51,744 per violation per child. This is not a future concern — it must be addressed before the first real user touches the product.

COPPA Compliance Infrastructure

Age Gating

Collect date of birth at registration. If the user is under 13, immediately pause data collection, store only the age flag (not the DoB) and route to the parental consent flow. Do not create a full profile until consent is verified.

Parental Consent Flow

Send a verifiable parental consent email (VPCE) to the provided parent email. The parent clicks a unique tokenized link, reviews what data will be collected, and explicitly consents. This token is single-use with a 72-hour expiry.

PII Isolation

Children's PII must be stored in a separate, encrypted Postgres schema with stricter access controls. Application-level code must pass through a COPPA-aware data access layer that logs all reads/writes to child records.

Data Minimization

Collect only what is necessary. Do not run analytics pixels, session recording tools, or third-party ad trackers on any page a child might view. Block all third-party scripts from the application domain.

Infrastructure Security Controls

ControlImplementationLayer
TLS everywhere Enforced at CDN and Fly.io ingress; no HTTP allowed Network
Secrets management GitHub Actions Secrets; Fly.io secrets; never in env files Application
Database encryption at rest Neon encrypts all storage with AES-256 by default Data
Database encryption in transit Require SSL on all Postgres connections; ?sslmode=require Data
Authentication Clerk (recommended) — handles session tokens, MFA, OAuth Application
Rate limiting Upstash Redis rate limiter at API middleware; 100 req/min per IP API
CSRF protection SvelteKit CSRF built-in for form actions; API uses Bearer tokens Application
Content Security Policy Strict CSP header via netlify.toml; no unsafe-inline Frontend
Input validation Zod schemas at API boundary; never trust client data Application
SQL injection prevention Parameterized queries only via ORM; no raw string interpolation Data
Dependency scanning Dependabot + npm audit in CI; weekly automated PR Pipeline
Object storage ACLs All buckets private by default; presigned URLs for user uploads Data

COPPA Data Handling Schema

-- Separate schema for child PII
CREATE SCHEMA child_data;

-- Row-level security — only the owning parent can read
ALTER TABLE child_data.profiles ENABLE ROW LEVEL SECURITY;

CREATE POLICY parent_owns_child
  ON child_data.profiles
  USING (parent_user_id = current_setting('app.current_user_id')::uuid);

-- Audit log every access to child records
CREATE TABLE child_data.access_log (
  id          uuid DEFAULT gen_random_uuid() PRIMARY KEY,
  child_id    uuid NOT NULL,
  accessor_id uuid NOT NULL,
  action      text NOT NULL,
  accessed_at timestamptz DEFAULT now()
);

Privacy Policy Requirements

The COPPA privacy notice must be separate from the general privacy policy and written in plain language understandable to parents. It must describe: (1) what information is collected from children, (2) how it is used, (3) whether it is disclosed to third parties, and (4) the parent's rights to review, delete, and withdraw consent. Consult a privacy attorney before launch.

10

Disaster Recovery

Disaster recovery for OneSummer is primarily a data recovery problem. The frontend and API are stateless and redeploy from Git in under 5 minutes. The database is the only component that requires a formal recovery procedure.

Recovery Time Objective (RTO)

Target: < 30 minutes for full service restoration during peak season. The API and frontend can be redeployed in < 5 minutes; database restoration from a recent backup is the dominant recovery time.

Recovery Point Objective (RPO)

Target: < 1 hour data loss during peak season with daily backups and Neon PITR. During off-season, RPO of 24 hours is acceptable — traffic is near-zero.

Failure Scenarios and Responses

ScenarioDetectionResponseEst. RTO
API container crash / OOM Uptime check + Sentry Fly.io auto-restarts; if persistent, roll back Docker image to previous SHA 2–5 min
Bad deployment (regression) Post-deploy smoke tests fly deploy --image ghcr.io/onesummer/api:<prev-sha> 3–5 min
Database corruption / bad migration Error spike + manual detection Use Neon PITR to restore to pre-migration timestamp; re-apply clean migration 15–30 min
Neon regional outage DB connection failure alerts Restore most recent pg_dump backup to Supabase or RDS emergency instance 30–60 min
CDN / Netlify outage Uptime check Point DNS to Cloudflare Pages fallback (keep repo connected to both) 5–10 min
Credential compromise Unusual access patterns / manual report Rotate all secrets immediately; invalidate all user sessions via Clerk dashboard; audit access logs 10–20 min

Database Recovery Runbook

# Step 1: Identify the restore point
# For Neon PITR — use the Neon console or CLI
neon branches create \
  --name recovery-attempt \
  --parent main \
  --timestamp "2025-03-15T14:30:00Z"

# Step 2: Verify data integrity on the recovery branch
psql "$RECOVERY_DATABASE_URL" -c "SELECT count(*) FROM applications;"

# Step 3: Promote the recovery branch to production
# (swap the DATABASE_URL environment variable in Fly.io)
fly secrets set DATABASE_URL="$RECOVERY_DATABASE_URL" -a onesummer-api

# Step 4: Restart the API machines
fly machines restart -a onesummer-api

# Step 5: Verify application health
curl https://api.onesummer.com/health

Annual DR Test

Run a full database recovery drill each January (before peak season begins). Restore the production database to a staging environment using a 30-day-old backup and verify application functionality. Document the test results and update this runbook with any lessons learned.

11

Launch Checklist

Complete all items before accepting real user data. Items marked with a legal or compliance tag require external review.

  • Infrastructure
  • Production domain configured with DNS pointing to Netlify; HTTPS enforced via HSTS with max-age=31536000
  • Fly.io production app deployed; min_machines = 0 confirmed; max_machines = 10 set for burst capacity
  • Neon production database provisioned; connection pooler URL tested; migrations applied clean on main branch
  • All secrets stored in GitHub Actions Secrets and Fly.io secrets; no credentials in code or committed .env files
  • Cloudflare R2 bucket created with private ACL; presigned URL generation tested end-to-end
  • Upstash Redis instance provisioned; rate limiting middleware tested and confirmed blocking at threshold
  • CI / CD
  • All three GitHub Actions workflows (ci.yml, preview.yml, deploy.yml) passing on a test PR and merge
  • Post-deploy smoke tests hitting at least: health endpoint, auth flow, DB read, DB write
  • Rollback procedure tested: deploy an intentionally broken image, confirm failure detection, execute rollback, confirm recovery
  • Dependabot enabled on the repo with weekly schedule for both npm and Docker base image
  • Security
  • Content Security Policy header validated with CSP Evaluator; no unsafe-inline or unsafe-eval
  • All API endpoints authenticated; no unauthenticated routes expose PII
  • SQL injection test suite passing; no raw string interpolation in query builder
  • Database row-level security policies verified on child_data schema
  • Penetration test or security review completed (even informal — OWASP Top 10 checklist minimum)
  • COPPA / Legal Requires Attorney Review
  • Age gate implemented and tested: users under 13 are blocked from completing profile until parental consent is verified
  • Parental consent email flow tested end-to-end: token generation, email delivery, consent recording, account activation
  • COPPA-compliant privacy notice published — written for parent audience, reviewed by privacy counsel
  • Data deletion mechanism implemented: parent can request deletion of all child data via a documented process; deletion confirmed within 10 business days
  • No third-party analytics, tracking pixels, or session recording active on any page reachable by a child account
  • Terms of Service and Privacy Policy finalized and linked from footer, registration flow, and cookie banner
  • Monitoring
  • Sentry initialized in both frontend and API; test error captured and visible in dashboard
  • Uptime monitoring configured for api.onesummer.com/health and onesummer.com; SMS/Slack alert tested
  • Logtail log drain connected to Fly.io; structured logs visible with request ID correlation
  • Daily DB backup cron job enabled and first backup confirmed in R2 storage
  • Disaster Recovery
  • Neon PITR confirmed working: restored staging to a point 1 hour prior; application loaded correctly
  • Runbook for all failure scenarios reviewed and accessible to all team members (not just the person who wrote it)
  • Emergency contact list current: Neon support, Fly.io support, domain registrar, privacy counsel
🌞
Target: Launch by January 15

To capture the full Feb–May peak season, all checklist items must be complete and the system load-tested before January 15. Use Neon branching and Fly.io preview apps to iterate rapidly on staging without risk to production. The architecture is designed to stay out of your way so you can focus on the product.