Job description:
NLP Engineer / Machine Learning Engineer Document Understanding &
Knowledge Graphs
-
Overview
Were looking for a hands-on NLP/ML engineer to lead the development of an intelligent
document understanding pipeline for extracting structured data from complex, unstructured RFQ
documents (40100+ pages, in German and English).
- You will be responsible for building
scalable systems that combine document parsing, layout analysis, entity extraction, and
knowledge graph construction ultimately feeding downstream (e.g. Analytics and LLM
applications.)
-
Key Responsibilities - - - - - -
-
Design and implement document hierarchy and section segmentation pipelines using
layout-aware models (e.g., DocLayout-YOLO, LayoutLM, Donut).
-
Build multilingual entity recognition and relation extraction systems across both English
and German texts.
-
Use tools like NLTK, transformers, and spaCy to develop custom tokenization, parsing,
and information extraction logic.
-
Construct and maintain knowledge graphs representing semantic relationships
between extracted elements using graph data structures and graph databases (e.g.
Neo4j)
Integrate outputs into structured LLM-friendly formats (e.g., JSON, Mark Down) for
downstream extraction of building material elements.
-
Collaborate with product and domain experts to align on information schema, ontology,
and validation methods. What Were Looking For - - - -
-
Strong experience in NLP, document understanding, and information extraction
from unstructured/multilingual documents.
-
Proficiency in Python, with experience using libraries such as transformers, spaCy,
and NLTK.
Hands-on experience with layout-aware models like DocLayout-YOLO, LayoutLM,
Donut, or similar.
-
Familiarity with knowledge graphs and graph databases such as Neo4j, RDF