Jakub Hantabal a Laura Johanesová @ PyConSK 2024

Jakub Hantabal a Laura Johanesová

Slovakia

Jakub: I am a biomedical data scientist focusing on cancer research, currently studying a Master’s degree in Precision Cancer Medicine at University of Oxford. Passionate about popularization of science and educating the future generation of data scientists, I am a co-founder of Data Science Academy NGO, where we organize educational events and build a community of data enthusiasts. I am also a partner at MHG Consulting, a boutique life science consultancy focused on central European clients. I am always on the lookout for opportunities to positively impact society.

Laura: I am a Data Analyst and Junior Software Developer with expertise in biomedical data processing and software development for photoplethysmography analysis. Currently I am also studying at the University of Vienna. I co-founded Data Science Academy NGO and have a strong record in strategic planning, curriculum development and fostering collaborative innovation in healthcare technology.

Innovative data science education: The story of Data Science Academy Talk

English language

Jakub Hantabal a Laura Johanesová

Taking the first step in learning data science in Python can be challenging. To help people overcome this fear, we co-founded Data Science Academy - an NGO providing accessible data science education focused on building a collaborative community of data enthusiasts. Over the past year, we organized two week-long bootcamps Winter Data School and Summer Data School, as well as five shorter workshops, where we show the magic of Python and inspire people to continue their programming journey.

In this talk, we will share our story of how we set up Data Science Academy, what we do, and how we motivate people to take up data science. We will discuss why quality education and focus of attendees require a physical presence of multiple lecturers, and how to make transition to Python straightforward by drawing parallels with Excel and selecting the key concepts to explain. Lastly, we will demonstrate our unique teaching materials utilizing Jupyter Notebook, and how we use them in our workshops, and how participants on our events used them to advance their careers and set up new collaborations.

Uncovering patterns in a dataset via data visualization and accessible machine learning Workshop

English language

Jakub Hantabal a Laura Johanesová

In this workshop, we will learn how to gain insight into a dataset. We will use a dataset that everyone can relate to - Pokemon! This workshop will be entirely practical - you will code along with us in a Jupyter notebook. We will be explaining the theory and rationale behind what code we write along the way, and the audience are encouraged to ask questions along the way.

We will simulate the job of a real-world data analyst who is tasked with investigating a question: Does higher health correlate with a better attack?

To accomplish this, we will progress through the dataset from basic cleanup all the way to fitting a machine learning model. We will first clean and filter the dataset, and determine if the data is even usable. Here, we will learn how to evaluate the quality of the data (summary statistics, missing values and what to do with them (imputation), data distribution etc). We will look into the HP and Attack data and ascertain if what we’re given is sufficient to answer our question. Can we add another feature to these two? We’ll also look at additional feature selection.

We will then progress to data visualization - we’ll touch on which plot to use for which purpose and learn their syntax in Python (Histogram, scatter plot, correlation plot, box and whisker plot + violin plot, hist and rug plot). We will visualize the data in multiple ways.

Concluding the workshop, we will move on to simple machine learning - you will be introduced to supervised learning, classification and regression. We will discuss how to choose the right model to train and look at applying regression models (linear regression) and classification models (logistic regression, random forest) to our dataset. We will measure the performance of these models, and draw conclusions from their outputs.

By the end of the workshop, we will answer the initial question, as well as gain insights into the Pokemon dataset, realising how powerful Python is in accessible data analysis.