Toni Gruetze

Data Scientist, Architect, consultant

Toni is a computer scientist with years of experience on Distributed Systems (Hadoop, Spark), as well as machine learning and artificial intelligence. He organized several seminars on the topic of Data Mining with Distributed Systems. Furthermore, he collaborated with small and large companies in various projects. His strength is accompanying Data Mining projects from planning to production and scale complex analyses on cluster systems. He also published various peer-reviewed papers in the research areas of Text and Web Mining.

Germany, Berlin
Text Mining
Natural Language Processing
Data Warehouse
Distributed Computing
Data Mining
MySQL
Google Cloud Platform (GCP)
+ 15 more
Toni ist derzeit nicht verfügbar.

Jetzt Einstellen

Hauptfähigkeiten

Technische Fähigkeiten

Erfahrung in Jahren

Fertigkeit

Machine learning

10 Erfahrung in Jahren

10

Data Integration

10 Erfahrung in Jahren

10

SQL

10 Erfahrung in Jahren

10

Python

5 Erfahrung in Jahren

5

Java

10 Erfahrung in Jahren

10

Arbeitserfahrung

Data Engineer & Software Architect

2017 - aktuell

Freelancer at German Universal Bank

  • - Analyzed and aligned with business entities in a graph database - Developed a web app to explore the resulting news graph - Topics: NLP, Machine Learning, Reinforcement Learning, Data Integration - Technologies: Python, Java, JanusGraph, Kafka, Docker, GCP, Spring Boo

Data Engineer & Software Architect: Negative News Screening

2017 - 2019

German Universal Bank

  • A stream of news articles was analyzed and aligned to business entities in a graph database. Furthermore, a web app was developed to explore the resulting news graph. Tasks: - Design ETL pipelines to integrate business entities and article texts based on Apache Kafka - Implement NLP modules for news articles to find business entity mentions (named entity recognition and linking, sentiment analysis, document classification) - Create a concept for data lineage of ETL pipeline steps - Design graph data model - Optimize exploratory web app queries in graph database - Develop REST API for the graph explorer web app Impact: - Developed first cloud computing pilot and introduced it to customer - Designed successful demo for the risk management team of the supervisory board - Trained web app team to use Spring Boot and JanusGraph - Assisted Data Science team to develop advanced matching models Topics: NLP, Machine Learning, Data Integration, Data Modeling Technologies: Python, Java, Apache Kafka, GCP, Docker, spaCy, Gremlin, JanusGraph, Apache Cassandra, Elasticsearch, Spring Boot

Software Architect and Team Lead: Data Ingestion and Analysis

2016 - 2017

Commerzbank AG

  • Development of a system to build, curate, explore and analyze domain-specific knowledge graphs from structured and unstructured data sources. Topics: Distributed Computing, Duplicate Detection, NLP, Machine Learning Technologies: Scala, Apache Spark, Apache Cassandra, React, Jenkins, sbt

Researcher: Text Mining

2012 - 2017

Hasso Plattner Institue

  • The research focus laid on showing that knowledge represented in user-generated content originating from various social media services can be used to significantly improve various natural language processing and text mining tasks. Topics: Text Mining, NLP, Machine Learning Technologies: Java, Python, R, Spacy, scikit-learn, Weka, Keras, pandas, ggplot2, PostgreSQL

Team Lead: Big Data Analytics for Health Data

2009 - 2017

Hasso Plattner Institute

  • - Optimization of the medical care with a system tailored to analyze large historical treatment data from health insurances - Evaluation of different platforms with respect to their scalability - Project Partner: Elsevier Health Analytics - Topics: Big Data Analytics, Distributed Computing, Data Warehouse, Data Mining - Technologies: Java, HPCC, PostgreSQL, i2b2

Software Architect and Team Lead: Data Ingestion and Analysis

2009 - 2017

Hasso Plattner Institue

  • - Development of a system to build, curate, explore and analyze domain-specific knowledge graphs from structured and unstructured data sources - Project Partner: Commerzbank AG - Topics: Distributed Computing, Duplicate Detection, NLP, Machine Learning - Technologies: Scala, Apache Spark, Apache Cassandra, React

Researcher

2009 - 2017

Hasso Plattner Institue

  • - Processed research focused on showing that knowledge represented in user-generated content originating from various social media services can be used to significantly improve various natural language processing and text mining tasks - A selection of additional lectures and projects in cooperation with industrial partners are listed further

Lecturer: Mining Massive Datasets, Seminar

2009 - 2017

Hasso Plattner Institute

  • - Students had to approach challenging big data problems using distributed computing frameworks like Apache Spark or Apache Flink and Amazon Web Services - Topics: Big Data, Machine Learning, Data Mining - Technologies: Apache Spark, Apache Flink

Lecturer: Distributed Big Data Analytics, Seminar

2009 - 2017

Hasso Plattner Institue

  • - Each student group had to compare the performance of the two distributed computing frameworks Apache Spark and Apache Flink for one challenging big data problem (e.g., Graph Mining, Text Mining, etc.) - Topics: Distributed Computing, Big Data Analytics, Data Mining - Technologies: Apache Spark, Apache Flink

Lecturer: Mining Massive Datasets

2016 - 2016

Hasso Plattner Institute

  • Students had to approach challenging big data problems using distributed computing frameworks like Apache Spark or Apache Flink and Amazon Web Services Topics: Big Data, Machine Learning, Data Mining Technologies: Scala, Java, Apache Spark, AWS

Team Lead: Big Data Analytics for Health Data

2014 - 2015

Elsevier Health Analytics

  • Optimize the medical care with a system tailored to analyze large historical treatment data from health insurances. Evaluate different platforms with respect to their scalability. Topics: Big Data Analytics, Distributed Computing, Data Warehouse, Data Mining Technologies: Java, R, HPCC, PostgreSQL, i2b2

Lecturer: Distributed Big Data Analytics

2015 - 2015

Hasso Plattner Institue

  • Each student group had to compare the performance of the two distributed computing frameworks Apache Spark and Apache Flink for one challenging big data problem (e.g., Graph Mining, Text Mining, etc.) Topics: Distributed Computing, Big Data Analytics, Data Mining Technologies: Scala, Java, Apache Spark, Apache Flink, AWS

Software Developer

2006 - 2009

Decision Optimization

  • - Enabling preventive maintenance decisions for gene analysis equipment by training machine learning models that predict failures - Project Partner: SigmaQuest, Inc. - Topics: Machine Learning, Predictive Maintenance - Technologies: Java, Weka, Oracle

Software Developer: Preventive Maintenance

2006 - 2009

Decision Optimization

  • Enabling preventive maintenance decisions for gene analysis equipment by training machine learning models that predict failures Project Partner: SigmaQuest, Inc. Topics: Machine Learning, Predictive Maintenance Technologies: Java, Weka, Oracle

Software Developer

2005 - 2006

Siemens R&D

  • - Managing guidelines, tolerances and limits of complex steam turbines for the use in engineering and simulation applications - Topics: Information Management - Technologies: C#, Oracle

Ausbildung & Zertifikate

Doctor of Engineering - Information Systems (Dr.-Ing. / Ph.D.)

2012 - 2018

Hasso Plattner Institute, University of Potsdam

Doctor of Engineering - Information Systems (Dr.-Ing. / Ph.D.)

2012 - 2018

Hasso Plattner Institute, Univerity of Potsdam

Master of Science - IT-Systems Engineering (M.Sc.)

2009 - 2011

Hasso Plattner Institute

Master of Science - IT-Systems Engineering (M.Sc.)

2009 - 2011

Hasso Plattner Institute, University of Potsdam

Dipl.-Inf. (FH)

2003 - 2008

Hochschule Zittau/Görlitz

Sprachen

English

Native or bilingual

German

Professional working

French

Elementary

Greek

Elementary

NOCH NICHT GEFUNDEN, WAS DU SUCHST?

Dann sende uns bitte eine Nachricht. Gerne beantworten wir deine Fragen!

SCHREIB UNS