Dr. Yanjie Fu received his Ph.D. degree in Information Technology from Rutgers, the State University of New Jersey in 2016, the B.E. degree in Computer Science from University of Science and Technology of China in 2008, and the M.E. degree in Computer Science from Chinese Academy of Sciences in 2011. He is currently an Assistant Professor at the Missouri University of Science and Technology (University of Missouri-Rolla). His research interests include urban computing, data mining, big data analytics. He has research experience in industry research labs, such as Microsoft Research Asia, Huawei Research Labs, and IBM Thomas J. Watson Research Center.
Yanjie Fu’s Introduction to Data Science
How did you begin your career in data science?
In 2011, I joined Rutgers University as a PhD student. At Rutgers, I started my career in data science by working with my advisor, Prof. Hui Xiong, who is a Dean’s Research Professor at Rutgers Business School and an ACM Distinguished Scientist. I received rigorous academic training in the foundations, algorithms, and applications of data science. I also work on several data science projects including mobile recommender systems, in-App behavior analysis, and retail return and refund analysis.
What do you feel are some common misconceptions about data science or your work in general?
Big Data is often mistakenly identified as the exponential volume of data. However, Big Data is not just about “big.” Big Data should refer to the capability of exploring data that are large-volume, real-time, and heterogeneous (e.g., multi-domain, multi-source, multi-format, multi-dimension). Urban and mobile data are heterogeneous since such data are usually crowd-sourced, large-scale, geo-tagged, time-stamped, and collectively-related. It is important to develop the algorithms and tools to address the challenges of data heterogeneity in urban computing.
What inspired you to learn more about Urban Computing?
Cities are growing faster than at any time in history. However, the cities have been facing many challenges, such as traffic congestions, air and water pollution, city noises, increased gas consumption. Consequently, the problems in the cities, as they grow, are only going to be worse. With the advent of mobile, sensing, and internet technologies, large-scale urban and mobile data, such as Point of Interest data, mobile check-in data, and taxi GPS data, have been collected from buildings, vehicles, mobile devices, and human beings. These bring up the question: can we use data mining techniques to create win-win solutions that improve urban environments, human life quality, and city operations? In summer 2012, I joined the Urban Computing Group of Microsoft Research Asia to work as a research intern. The internship experience brought my research attention to urban computing, which is the main focus of my research.
What value do you hope your research into Urban Computing will eventually provide?
First, I hope my research can systematically contribute to the theory, algorithms, and applications of urban intelligence. Second, I hope my research can provide an in-depth and unique understanding of the nature and mechanism of urban phenomena, and ultimately make our cities traceable and predictable. And I believe in what I do.
What are some tools that you use to conduct your research?
I use Python and PhP to crawl and preprocess spatio-temporal socio-textual data, Python and R to develop analytic and statistical approaches, PyTorch and TensorFlow to build deep learning models, MATLAB to visualize analysis results, RShiny to create systems to demonstration our preliminary results, and Latex and Atom to write papers.
What do you look for when hiring Teaching Assistants and Research Assistants?
I like students that have strong programming skills and are hard-working, self-motivated, and passionate.
What advice would you give to students who aspire to be data scientists?
First, learn solid statistical and algorithmic foundations of data science. Second, learn how to formulate real-world problems into data mining tasks. Lastly, learn to deal with large-scale noisy data using strong programming skills and a variety of analytic tools.