The primary purpose of the Data Scientist position is to provide support for research scientists in the development, validation, and deployment of high-quality machine learning as part of the Tumor Measurements Initiative. In this role, the candidates will focus on the following areas:
• Support developing and translating state-of-the-art machine learning algorithms, following best practices and standards.
• Oversee the lifecycle of AI models, encompassing training, evaluation, deployment, monitoring, and maintenance of algorithms.
• Maintain diligent records of model development experiments, data and model lineage tracking, and create comprehensive model scorecards.
• Provide rigorous testing, versioning, and documentation, ensuring impact, risk mitigation, and reproducibility.
• Collaboration with ML Ops Engineers to enable rapid experimentation and impact by creating data and ML pipelines, automated testing, rapid deployments, and model monitoring.
• Develop and support a culture responsible AI by minimizing bias, enhancing fairness, and maximizing transparency in AI models.
• Educate and train research data scientists on ML engineering best practices and responsible AI practices.
• Engage with stakeholders to gather requirements, convey AI concepts understandably, incorporate feedback, and ensure sustainable impact.
Technical Expertise
Train and deploy production quality machine learning models regularly
Apply deep learning, computer vision, and generative AI methods.
Design, develop, and maintain scalable data pipelines, feature management, data labeling, ML artifact management, and analytics
Manage data, code, and models (e.g. using Git, HuggingFace, MLFlow, or other tools.)
Work with Docker, Kubernetes, and other containerization technologies
Leverage on-premise, cloud-based, and hybrid computing environments.
Develop and implement methods and tools to quantitatively evaluate bias, fairness, and equity
Analytical Skills
Perform testing, debugging, and code quality checks.
Working with medical imaging data and understanding medical imaging workflows.
Familiarity with healthcare data standards and ontologies, such as DICOM, HL7, FHIR.
Familiarity with healthcare data privacy, such as HIPAA and/or GDPR.
Professionalism\: Oral and Written
Gather initial requirements, analyze clinical data, design and develop ML solutions, perform feasibility testing of proposed solutions, evaluate and interpret the results.
Transfer knowledge, expertise, and methodologies by proactively providing technical assistance to researchers and peers.
Concisely and clearly present technical and non-technical and progress updates in project meetings as well as external meetings, workshops, conferences, etc.
Communicate effectively and cooperatively with leaders, peers, end users and support teams when required.
Other duties as assigned
Education Required\: Bachelor's degree in Computer Science, Software Engineering, Data Science, Physics, Math & Statistics, or another related engineering discipline
Preferred Education\: Master’s Level Degree
Experience Required\: Three years of experience in machine learning engineering, data science, data engineering, and/or software engineering. With Master's degree, one years experience required. With PhD, no experience
Preferred Experience\: Experience with Azure, and proficiency in cloud-native tools and services such as Azure Arc, Azure ML and Azure Cognitive Compute (or similar).
It is the policy of The University of Texas MD Anderson Cancer Center to provide equal employment opportunity without regard to race, color, religion, age, national origin, sex, gender, sexual orientation, gender identity/expression, disability, protected veteran status, genetic information, or any other basis protected by institutional policy or by federal, state or local laws unless such distinction is required by law. http\://www.mdanderson.org/about-us/legal-and-policy/legal-statements/eeo-affirmative-action.html