Multi-Tenant from Day One (And Why It’s Worth the Pain): Building merchi.ai Chapter 6

    Multi-Tenant from Day One (And Why It’s Worth the Pain): Building merchi.ai Chapter 6

    Merchi Team

    The Nightmare of Data Leakage in 2026

    The scariest bug in the life of a SaaS founder isn’t a site-wide crash or a broken checkout. It is the “Cross-Pollination” bug. The moment Customer A logs in and sees a glimpse of Customer B’s data. In the world of 2026, where merchi.ai handles proprietary Writing Knowledge, sensitive brand taxonomies, and pre-release product imagery for global retailers, a data leak isn’t just a technical failure; it is a terminal breach of trust. As a small team, we realised that we couldn’t rely on human diligence alone to prevent this. We needed an architectural “hard-stop.”

    In the current landscape of Agentic E-commerce, where AI agents are querying systems to make purchasing decisions, data isolation is no longer a “nice-to-have” feature. It is a core infrastructure requirement. If an agent representing a discount retailer accidentally accesses the pricing strategy or luxury brand voice of a high-end competitor stored in the same database, the legal and commercial fallout would be catastrophic. This is why we made the difficult decision to build merchi.ai as a strictly multi-tenant system from the very first commit.

    Building for multi-tenancy from day one is undoubtedly painful. It slows down initial development, complicates every database schema, and adds layers of abstraction to your authentication flow. However, “bolting it on later” is a recipe for architectural debt that most startups never recover from. By the time you have 10,000 SKUs per customer across multiple languages, trying to retroactively isolate that data is like trying to un-bake a cake.

    For merchi.ai, multi-tenancy is our security foundation. It allows us to promise our enterprise clients that their unique brand “soul”, the Writing Knowledge that makes their generated copy sound human is physically and logically isolated from every other user on the platform. This chapter explores the technical “triple-lock” system we built using Supabase, PostgreSQL, and JWT claims to make data leakage a mathematical impossibility.

    The Core Strategy: The omnipresent tenant_id

    The foundation of our isolation strategy is simple: every single table in our database, from products and runs to writing_knowledge and assets, contains a tenant_id column. There are no exceptions. If a piece of data exists in the merchi.ai ecosystem, it belongs to a specific tenant. This “flat” structure ensures that we never have to perform complex recursive joins to determine who owns a particular record.

    However, having a tenant_id column is only useful if you actually use it in your queries. In a standard application, you rely on the developer to remember to add .eq('tenant_id', current_tenant) to every single database call. But humans are fallible, especially at 2 AM during a high-traffic launch. To solve this, we implemented PostgreSQL Row Level Security (RLS) as our primary safety net. RLS is a database-level feature that acts as a final guard, operating independently of our application code.

    With RLS enabled, even if our Next.js application code has a critical bug that omits the tenant filter, the database will simply return an empty set rather than another tenant’s data. The database itself is “aware” of who is asking for the data and refuses to show anything that doesn’t match the requester’s identity. This moves the responsibility of security from the ephemeral application layer to the hardened database layer.

    Implementing this required a rigorous migration strategy. Every time we add a new feature like our Web Scraping tool or our Multi-language generation engine the first step is defining the RLS policies for the associated tables. It forces us to think about data ownership before we even write the first line of frontend code. This discipline is what allows a samll team to manage data for massive retailers with total confidence.

    The Identity Flow: From Auth to Postgres Policy

    To make RLS work, the database needs to know the tenant_id of the current user. In merchi.ai, this identity flow begins with Supabase Auth. When a user logs in, the authentication server generates a JSON Web Token (JWT). We customise this token to include the tenant_id inside the app_metadata claim. This turns the JWT into a portable, cryptographically signed proof of identity that travels with every request.

    Once the user makes a request to our API or frontend, the flow looks like this:

    1. Authentication: The user sends their JWT.
    2. Middleware Validation: Our Next.js middleware extracts the tenant_id from the JWT and validates that the user is active.
    3. Database Handshake: When the application queries the database, it passes the JWT.
    4. RLS Enforcement: PostgreSQL extracts the tenant_id from the JWT using the auth.jwt() function and compares it against the tenant_id column of the rows being accessed.

    A typical RLS policy in merchi.ai looks like this:

    CREATE POLICY "Users can only view their own tenant data" 
    ON public.products
    FOR SELECT 
    USING (
      tenant_id = (auth.jwt() -> 'app_metadata' ->> 'tenant_id')::uuid
    );
    

    This policy is transparent to the developer. When we write select * from products in our code, PostgreSQL automatically appends the equivalent of WHERE tenant_id = '...' behind the scenes. It is a “Zero-Trust” infrastructure approach: we don’t trust the application, we don’t trust the middleware, and we don’t trust the developer. We only trust the signed claim in the JWT and the policy in the database.

    This architecture is particularly robust for our Writing Knowledge system. Since brand tone and taxonomy are the most sensitive parts of our platform, RLS ensures that these configurations are never “leaked” during the assembly of an AI prompt. Even if our prompt-assembly logic were compromised, the database would still block any attempt to fetch a configuration belonging to a different tenant_id.

    The Admin Paradox: Service Roles and Background Workers

    While strict isolation is great for security, it creates a challenge for admin operations and background workers. At merchi.ai, we use Trigger.dev to handle the massive ingestion of ZIP files and CSVs. These workers often need to perform cross-tenant operations, such as calculating global usage stats or cleaning up orphaned assets in Supabase Storage.

    To handle this, we utilise the PostgreSQL Service Role. The service role key bypasses RLS entirely, allowing our administrative tasks to see the entire database. However, this is a “nuclear option” that must be handled with extreme care. We never use the service role key in our frontend or our main web-app workspace. It is strictly reserved for our isolated API workspace and our background worker environment.

    When a Trigger.dev task starts a bulk processing run for a specific tenant, it is passed the tenant_id as part of its payload. Even though the worker has the power to see everything, we still write our worker code to be “tenant-aware,” explicitly filtering every query. This gives us two layers of protection: the manual filter in the code and the architectural separation of the service role.

    Handling failures in this admin layer is where Trigger.dev’s observability shines. If an admin-level cleanup task fails, we can see exactly which tenant’s data was being processed and why. Because the service role has elevated privileges, we maintain a strict audit log of every cross-tenant query. This ensures that even our “God-mode” operations are transparent and accountable, maintaining the high security standards required by our legal and corporate counsel.

    Evolution: Scaling for the Agencies of 2026

    When we first built merchi.ai, we started with a simple “One User = One Tenant” model. This worked perfectly for individual founders and single-brand retailers. But as we move further into 2026, we’ve seen the rise of the AI-Native Agency; small, highly efficient teams that manage merchandising for dozens of different brands. These users need to switch between tenants without logging out and back in.

    We are currently migrating toward a more flexible user_tenants junction table. This architecture separates “Identity” from “Access.” A user logs in once to prove who they are, and our database then determines which tenants they have permission to manage. This allows an agency user to have “Editor” access to a fashion brand’s workspace and “Viewer” access to a hardware wholesaler’s workspace simultaneously.

    This transition adds complexity to our JWT and RLS logic. Instead of a single tenant_id in the JWT, we move toward a “Selected Tenant” pattern. The user selects which tenant they are currently working in, and our middleware updates the active session. The RLS policies then validate that the user’s ID is actually linked to that specific tenant_id in the junction table.

    This “Agency-Ready” architecture is a strategic play. By supporting multi-tenant users, we allow agencies to process 125 years of human labour for all their clients through a single merchi.ai interface. It turns our platform into a multi-tenant operating system for the entire merchandising industry, providing the scale and isolation required for the next decade of retail.

    Performance vs. Peace of Mind: The Testing Burden

    Is there a cost to this level of security? Absolutely. Every RLS policy adds a small amount of overhead to every database query. When you’re fanning out 10,000 tasks to process a bulk ZIP upload, those milliseconds add up. We’ve had to be incredibly thoughtful about indexing. If you have an RLS policy filtering by tenant_id, that column must be indexed on every single table, or your performance will fall off a cliff.

    Testing also becomes significantly more complex. Every new feature requires “Isolation Testing.” We don’t just test if a feature works; we test if it doesn’t work for someone else. Our automated test suite creates two separate tenants with similar data and ensures that Tenant A can never see Tenant B’s output. This doubles our testing volume, but it is the only way to sleep at night when you’re a small team.

    We also use PostgreSQL’s EXPLAIN ANALYZE tools to monitor the performance of our policies. Sometimes, an overly complex RLS policy (like one that requires joining three other tables to check a permission) can slow down the entire platform. We’ve learnt to keep our policies “flat” and efficient, prioritising the tenant_id check over more complex role-based access control where possible.

    The “pain” of this setup is real, but the ROI is undeniable. When an enterprise client asks us, “How do you ensure our competitors won’t see our brand voice configuration?”, we don’t give them a vague promise. We show them the PostgreSQL policy definitions. We show them the JWT claims. We show them an architecture where data isolation is baked into the very atoms of the system.

    Conclusion: Security as a Competitive Advantage

    Multi-tenancy from day one was the most difficult architectural decision we made for merchi.ai, but it has become our greatest competitive advantage. In a market flooded with “leaky” AI wrappers, we offer a fortress. By combining the cryptographic certainty of JWTs with the database-level enforcement of RLS, we’ve built a platform where trust is a technical guarantee, not a marketing slogan.

    We’ve now covered the foundation (Tech Stack), the routing (OpenRouter), the scale (Async Pipeline), and the security (Multi-tenancy). In the next chapter, we will return to the heart of the product: Writing Knowledge. We will explore the Data Schema: how we context ground AI to write in the structure that customers need.

    Ready to move your merchandising to a platform that takes security as seriously as you do? Book a Demo or Start Automating with merchi.ai for FREE.