Data Engineer Internship

Apurva Luty

Optimly AI · Founder & CEO

Active within past week

Job description

Data Engineering Intern — AI Brand Trust Registry

Status: Open — hiring now

Type: Internship, full-time (40 hrs/week); 12 weeks, with flexible start/end dates. Part-time during term considered for the right local candidate.

Location: Seattle preferred (in-person collaboration with the founder); open to US-remote with PST overlap.

Reports to: Apurva Luty, CEO/Founder, working alongside the data engineering owner.

Compensation: $45–$55/hour, depending on experience. Strong performers will be considered for a full-time return offer.

Contact: [email protected] — direct applications welcome

About Optimly

Our mission is to make AI models scrutinizable — to turn the black box into a glass box. AI systems describe the world, recommend products, and increasingly act on our behalf, and nobody outside the labs can see inside them. We're changing that.

At the center is the AI Brand Trust Registry — the verified, structured brand data substrate that AI shopping agents read from. It has scaled from 112 to ~70,000 scored brand profiles since launching in March 2026, and it powers live AI-agent traffic (~1,500 fetches/day from ChatGPT alone). We're VC-backed with paying customers and real inbound momentum.

Why this internship is exciting

The Registry is growing fast, and that scale has created a backlog of concrete data work: bringing in new structured sources, catching duplicate and malformed entities before they reach customers, and building the monitoring that tells us the data is healthy. You'll work directly with our data engineering owner on these problems — real production work that AI agents and paying brands depend on, not a throwaway side project.

What you'll work on

You'll take ownership of scoped pieces of the data pipeline, with mentorship and review from the data engineering owner:

Ingestion connectors. Build and harden ingestion for new structured sources — commercial enrichment APIs, government datasets, channel-partner catalog feeds. You'll write the connector, handle the noisy edge cases, and document the source's quirks.
Deduplication and entity-quality tooling. Brand entities are messy — duplicates, aliases, parent-child variants. We have an entity-resolution system in production; you'll build the tooling around it: candidate-pair review interfaces, dedup QA checks, and test fixtures that catch regressions as we scale past 100K entities.
Data quality monitoring. Help build the observability layer — freshness checks, confidence-distribution dashboards, and alerting that flags when something upstream breaks.
LLM-assisted data cleanup. Prototype and evaluate LLM prompts for disambiguation and structured extraction inside the pipeline, measuring accuracy against a ground-truth set.

Tech stack

Core database: PostgreSQL on Google Cloud SQL
LLM orchestration: OpenAI, Anthropic, Google, Perplexity via an AI gateway

Must-have skills

Solid SQL and PostgreSQL fundamentals. You can write joins, aggregations, and queries by hand and are comfortable digging into a schema. Bonus for JSONB or query optimization.
Python or TypeScript. Comfortable writing scripts and small services in at least one; willing to ramp into the other.
Some data ingestion or scripting experience. A class project, internship, or personal project where you pulled data from a messy API or file and cleaned it into a usable shape.
Curiosity about LLMs in pipelines — using a model for extraction or disambiguation, not just as a chat feature.
High agency. You ask good questions, flag when something looks off, and can make progress on a scoped task without constant direction.

Nice to have

Coursework or a project involving record linkage, entity matching, or knowledge graphs
Exposure to data observability or testing tools
Currently pursuing a BS/MS in CS, data science, or a related field

More information

Minimum education level

Master's

Experience level

Entry-level or graduates

Job skills

Entity resolution at scale

Deep PostgreSQL expertise

Strong Python or TypeScript

Structured data ingestion experience

Experience with LLMs

Certifications

Data Engineering Certification

PostgreSQL Certification

Python Programming Certification

AWS Certified Data Analytics

Structured Data Management Certification

Languages

English

Company overview

Optimly AI

Information Technology and Services

Optimly helps brands track and optimize how they're represented in AI-generated answers across tools like ChatGPT, Gemini, and Perplexity. We offer a RAG-powered simulator that shows marketers how their content appears—and how to improve it.