Talking Shop: An Interview with Alex Arbatskiy, Senior Software Engineer
What is your role at 7Park Data?
I’m a software engineer on the Platform Team, working closely with data science to transform raw data into predictive insights.
How does your work help 7Park’s customers?
The data we receive from our partners is unstructured and messy, without consistent naming, classification and panel characteristics. Data mining and preprocessing are essential, and my team builds machine learning, knowledge management, web scraping, and data wrangling tools so data science, product and research can create reliable, predictive insights from the messy inputs.
What’s the most challenging project you’ve worked on at 7Park?
The most recent challenge was related to cleansing data of the Email Receipt dataset. In this dataset, we have hundreds of merchants, thousands of product names, and millions of rows.
Each merchant has its own unique categorization of products and the task was to build the product taxonomy and assign a category for each item included in the email receipt. The goal of this project is to allow clients to conduct more granular analysis of purchasing trends at hundreds of merchants including Lululemon and Wayfair.
For this task, we’ve done a lot of web crawling, trained ML classifiers, cleansing, and finally connecting our feed to an an interactive dashboard used by our clients using ML was a key part of this project and we are constantly experimenting with different natural language processing (NLP) models for other projects.
What do you believe is the biggest misconception about Data, AI or ML?
The first question you need to ask yourself before building any ML model is, “Can it be solved without using ML?”
Many times the problem could be solved without using sophisticated algorithms. And even with the best algorithms, there is still a lot of art and decision-making in exploratory analysis, feature engineering, and modeling because one technique can’t be applied to all cases with the same success.
One trend I’m following is Automated Machine Learning (AutoML). It has the potential to run experiments faster and with greater sophistication, but still allow a human to play a role in the decision-making process.
What are some of the trends or industry changes that you’re paying close attention to?
In my opinion, using machine learning tools is becoming an essential skill for engineers. My team and I periodically check for new machine learning frameworks and pre-trained models. We are also monitoring what tech giants are working on and applying the latest technologies for our daily tasks.
Many companies tried to find a solution for the gap between the moment data scientists train a model and the moment when the model is deployed to production. The rise of MlOps and DataOps solved most of these problems and these tools are definitely on our technology radar.
What hobby have you picked up during quarantine?
I’m a coffee fan. Currently working from home I miss my favorite coffee shop across from the 7Park office. I recently bought a coffee machine and am perfecting my Rosetta. Latte art is not as easy as it looks. It requires a lot of practice and strict following of the algorithm; I’m lucky because those skills are very familiar for software engineers like me.