About ZoomInfo
ZoomInfo is building the next generation go-to-market platform using high-quality GTM data, agentic workflows, and a robust intelligence layer to give sales, marketing, and revenue operations teams a competitive advantage.
About the Applied AI Team
The Applied AI team builds the intelligence layer that sits between ZoomInfo's high-quality data and the application layer through which customers engage. Using a product-led growth model, this team leverages customer engagement as input to build better recommendations, scoring, classification, and generative models.
What you will do :
Foundation Data Quality Enhancement
- Improve data quality for ZoomInfo's foundation datasets including firmographics, demographics, C-suite profiles, workforce information, titles, skill sets, scoops, intent signals, and web-extracted data
- Design and implement data validation pipelines and quality metrics to ensure high-fidelity information across millions of records
Embedding and Model Development
- Build and fine-tune embedding models using large language models (Llama) and small language models (*BERT*) for various text understanding tasks
- Develop language-agnostic clustering and classification models using vector search technologies
- Optimize embedding models for production deployment at petabyte scale
Named Entity Recognition & Data Extraction
- Build high-recall NER models to extract people, organizations, locations, and industry-specific entities from web-extracted data
- Develop robust data extraction pipelines that process diverse web content and structure unstructured information
Agentic Workflows & Evaluation
- Design and implement agentic workflows focused on web extraction, NER, and entity resolution
- Create comprehensive evaluation frameworks for agent performance and reliability
- Collaborate on agent optimization and performance tuning
Scalable Production Systems
- Deploy and maintain ML models serving millions of users daily with sub-second latency requirements
- Work with engineering teams to ensure models integrate seamlessly into ZoomInfo's platform architecture
- Monitor model performance and implement automated retraining pipelines to design cost-aware training & inference workflows
- Use integrated CI/CD and testing workflows for seamless deployment
Cross-Functional Collaboration & Prototyping
- Partner with product managers and engineering teams to translate business requirements into ML solutions
- Prototype and benchmark emerging AI/infra tech
- Present findings and technical solutions to stakeholders across the organization
What you bring:
Experience & Education
- 3 - 5 years (1+ years post-PhD) of hands-on ML/NLP experience with demonstrated impact on production systems. Preference for masters and background in Computer Science and other allied data science/engineering disciplines.
- Strong background in transformer architectures, embedding models, and vector search technologies
- Experience with named entity recognition, summarization and data extraction at scale is a plus
Technical Skills
- Proficiency in PyTorch or TensorFlow for model development and fine-tuning
- Experience with vector databases (Pinecone, Weaviate, FAISS, OpenSearch) and hybrid retrieval systems
- Strong software engineering skills in Python; familiarity with Go/Java is a plus
- Knowledge of MLOps tools: Docker, Kubernetes, GitOps, feature stores, model registries
Applied AI Expertise
- Hands-on experience with LLM fine-tuning techniques (LoRA, quantization, distillation) is a plus
- Understanding of agentic workflows and multi-agent systems
- Experience building language-agnostic ML solutions and cross-lingual models
- Knowledge of entity resolution and knowledge graph concepts
Collaboration & Communication
- Ability to work effectively in cross-functional teams and communicate technical concepts to non-technical stakeholders
- Experience mentoring junior team members and contributing to team knowledge sharing
- Strong problem-solving skills and ability to work independently with guidance from team leads
Preferred Qualifications
- Experience processing large-scale unstructured data
- Background in information retrieval and search systems
- Familiarity with MLOps concepts, A/B testing and experimental design for ML systems
- Knowledge of data quality frameworks and validation methodologies
#LI-SK
#LI-Hybrid