Optimly AI · Founder & CEO
Data Engineering Intern — AI Brand Trust Registry
Status: Open — hiring now
Type: Internship, full-time (40 hrs/week); 12 weeks, with flexible start/end dates. Part-time during term considered for the right local candidate.
Location: Seattle preferred (in-person collaboration with the founder); open to US-remote with PST overlap.
Reports to: Apurva Luty, CEO/Founder, working alongside the data engineering owner.
Compensation: $45–$55/hour, depending on experience. Strong performers will be considered for a full-time return offer.
Contact: [email protected] — direct applications welcome
About Optimly
Our mission is to make AI models scrutinizable — to turn the black box into a glass box. AI systems describe the world, recommend products, and increasingly act on our behalf, and nobody outside the labs can see inside them. We're changing that.
At the center is the AI Brand Trust Registry — the verified, structured brand data substrate that AI shopping agents read from. It has scaled from 112 to ~70,000 scored brand profiles since launching in March 2026, and it powers live AI-agent traffic (~1,500 fetches/day from ChatGPT alone). We're VC-backed with paying customers and real inbound momentum.
Why this internship is exciting
The Registry is growing fast, and that scale has created a backlog of concrete data work: bringing in new structured sources, catching duplicate and malformed entities before they reach customers, and building the monitoring that tells us the data is healthy. You'll work directly with our data engineering owner on these problems — real production work that AI agents and paying brands depend on, not a throwaway side project.
What you'll work on
You'll take ownership of scoped pieces of the data pipeline, with mentorship and review from the data engineering owner:
Ingestion connectors. Build and harden ingestion for new structured sources — commercial enrichment APIs, government datasets, channel-partner catalog feeds. You'll write the connector, handle the noisy edge cases, and document the source's quirks.
Deduplication and entity-quality tooling. Brand entities are messy — duplicates, aliases, parent-child variants. We have an entity-resolution system in production; you'll build the tooling around it: candidate-pair review interfaces, dedup QA checks, and test fixtures that catch regressions as we scale past 100K entities.
Data quality monitoring. Help build the observability layer — freshness checks, confidence-distribution dashboards, and alerting that flags when something upstream breaks.
LLM-assisted data cleanup. Prototype and evaluate LLM prompts for disambiguation and structured extraction inside the pipeline, measuring accuracy against a ground-truth set.
Tech stack
Core database: PostgreSQL on Google Cloud SQL
LLM orchestration: OpenAI, Anthropic, Google, Perplexity via an AI gateway
Must-have skills
Solid SQL and PostgreSQL fundamentals. You can write joins, aggregations, and queries by hand and are comfortable digging into a schema. Bonus for JSONB or query optimization.
Python or TypeScript. Comfortable writing scripts and small services in at least one; willing to ramp into the other.
Some data ingestion or scripting experience. A class project, internship, or personal project where you pulled data from a messy API or file and cleaned it into a usable shape.
Curiosity about LLMs in pipelines — using a model for extraction or disambiguation, not just as a chat feature.
High agency. You ask good questions, flag when something looks off, and can make progress on a scoped task without constant direction.
Nice to have
Coursework or a project involving record linkage, entity matching, or knowledge graphs
Exposure to data observability or testing tools
Currently pursuing a BS/MS in CS, data science, or a related field
Master's
Entry-level or graduates
Entity resolution at scale
Deep PostgreSQL expertise
Strong Python or TypeScript
Structured data ingestion experience
Experience with LLMs
Data Engineering Certification
PostgreSQL Certification
Python Programming Certification
AWS Certified Data Analytics
Structured Data Management Certification
English
Information Technology and Services
Download MeeBoss