Sandro Saitta comes from a diverse background of data mining and computer science, having gained experience in various industries and in academia. He has worked at companies such as Nespresso and Expedia to innovate processes using data science. Sandro also co-founded the Swiss Association for Analytics, which serves as a hub for data scientists to connect, share and learn.
Sandro Saitta’s Introduction to Data Science
How did you begin your career in data science?
I studied computer science at university. For my Master thesis, I worked on a project to predict airborne pollen concentration. This motivated me to start a PhD in data mining (it was named that way 15 years ago). My objective was to learn the field and be ready to apply data science to solve business challenges.
What inspired you to learn more about data mining?
I was amazed by the fact that a computer can learn. The possibility to teach a computer to improve its experience on a specific task was impressive to me. I was curious to understand algorithms such as decision trees and support vector machines. It was really interesting to discover new algorithms and how they could solve business challenges. Following data mining blogs and reading books on machine learning was also an inspiration to see what others were doing.
What do feel are some common misconceptions about data mining or your work in general?
I see mainly three misconceptions about data science.
First, some people believe that data science is about tuning algorithms to make predictions. This is a tiny part of the whole data science process. Most of the effort spent by data scientists – especially in industry – is about transforming a business problem into a data mining one, pre-processing the data and deploying systems into production.
Second, there is a misconception about the concept of automating data science. This is related to the first point. Since data science is much more than selecting and tuning a machine learning algorithm, it is not yet a process that can be automated.
Third, the original skepticism around data science has evolved into unrealistic expectations. Data science and machine learning are far from what people expect in artificial intelligence. This can lead to strong disappointment for stakeholders and it is the role of Data Scientists to set realistic expectations from the very beginning of their project.
You have quite a bit of experience working both in research and industry. Can you tell a bit more about different projects you’ve worked on that have contributed to data science?
I had the chance to work on plenty of exciting data science projects. In the telco industry, I worked on a project to increase Click-Through Rate of ad campaigns using behavioral targeting. We were able to link the offline world (CRM data) with the online one (user behavior on the website). This was a new and powerful approach to get a comprehensive customer profile.
In the chemicals sector, we performed ink authentication using a data mining approach. This allowed to achieve a much higher accuracy than manually-defined rules. The main challenge was to propose a solution working on a hand-held device with limited memory and disk space.
In the online travel business, my team developed a machine learning approach to detect duplicates within a hotel database. The algorithm was using text mining and allowed fuzzy matching of character strings. Using this technique, we improved the duplicate detection rate by 10%.
What are some tools that you use to conduct your research?
On the academic side, I used a mix of Matlab and Java. In Industry, the choice of the tool mainly depends on what is currently used in the company. I worked with Matlab within SICPA as it was already used in the R&D team. At Expedia, I programmed in R for the same reason. In the case of Swisscom and Nespresso, I used SAS.
What inspired you to found the Swiss Association for Analytics?
In 2011, I was invited to speak at BAQMAR in Belgium. I discovered a team of passionate organizing analytics events. Back in Switzerland, I thought that such events were missing. At that time, there was no association dedicated to data mining and machine learning. I decided to create the Swiss Association for Analytics – with other passionate data scientists– to fill this gap. Today, there are plenty of meetups in Switzerland, which is very good for the health of data science in Switzerland.
What advice would you give to students who aspire to be data scientists?
At the time I started, there was no dedicated data science courses at university. Today’s experienced Data Scientists have a background in Mathematics, Physics and Computer Science, for example. Masters in data science are now popping up. Take the opportunity to get a formal training in the field. In addition, I would advise to read data science books. Read plenty of them. After experience and formal education, books are the best way to learn data science.