Nate George is Assistant Professor of Data Science and Math at Regis University CC&IS, where he is researching the potential neural networks have in interpreting brain scans. He received his PhD in chemical engineering from the University of California, Santa Barbara. His background is in material science, working on LEDs and solar cells, but now, he primarily works in data science.
Nate George’s Introduction to Data Science
How did you begin your career in data science?
I began my data science career near the end of graduate school. I took a molecular simulations course from Professor Scott Shell at USCB. It was one of my favorite courses in grad school. We used Python, NumPy, and Fortran to simulate atomic motions on the picosecond time scale. Near the end of grad school, I tried to build an automated stock trading application. Not much good came out of it at the time, but it helped develop my coding skills quite a bit.
The next step in my data science journey involved working at a solar cell factory in Milpitas, CA. We had hundreds of gigabytes of data, and no one was really doing much with it. I wrote thousands of lines of Python code to extract-transform-load(ETL) it, process and clean it, and make some nice charts and summary statistics.
After deciding in mid-2016 to fully pursue data science, I took the Udacity Machine Learning Nanodegree (MLND). I completed the MLND in about a month because it was all I was doing (doing the nanodegree while working a job could take 6 to 12 months to complete). After Udacity, I did the Galvanize program in Denver for three months, where I worked on several small projects while applying for jobs and attending interviews.
To aid in my job hunt, I created web scraping software that scrapes dice.com for job postings related to data science. Then, I wrote code to plot the distribution of salaries and skills requested in the job postings. Additionally, I made an XGBoost model to predict the salary for jobs where no salary was listed. I had a site up displaying the results, but I have taken it down since. That project took weeks, and I actually used it as part of my application to a job.
What inspired you to learn more about data science?
A friend of mine from graduate school completed the Data Incubator. I needed to get back into the working world after failing at entrepreneurship. My skills were in LEDs, solid state chemistry, and fancy scientific equipment — all which typically are not in high demand and are in a few specific locations, most where I don’t want to live. Data science is much more general and easier to work remotely. I like programming a lot as well, and of course it’s high-paying. After speaking with my friend about it and doing some of my own research, I decided it was a good idea to pursue data science.
What do you feel are some common misconceptions about data science or your work in general?
Some people may think they are a data scientist because that is their job title. It’s a very trendy job title to have, but just because you can make a few graphs in Excel and use pivot tables doesn’t make you a data scientist. There is a hilarious stackexchange post about someone interviewing for a data scientist position, claiming the title of ‘senior data scientist,’ and saying their main tool is Excel. If you don’t know stats very well or don’t know how to use R and Python, you’re probably not a true data scientist.
Another common misconception is that you can spend a few months and suddenly become a data scientist, regardless of your background. It takes a little longer than that, honestly. You can “fake it until you make it” and get a job as a data scientist after only a few months of training, but you won’t really have a big enough knowledge base until you’ve been working on data science projects for a year or two at least.
What value do you hope your research into neural networks will eventually provide?
I am using neural networks to predict brain states from electroencephalography (EEG) data. The idea is to move towards a technology dubbed “neural lace” by Ian M. Banks in his Culture book series. Neural lace is a device that interfaces with your brain, giving you the intelligence of the entire internet at electron/photon speeds.
Other more immediate impacts would be finding out how best to design neural networks for predicting brain states. This is useful for predicting speech and visualized images from thoughts, controlling prosthetics, and could hopefully lead to understanding how to input thoughts into the brain as well. Predicting speech and images from thoughts would allow us to improve our communication speeds by orders of magnitude, and is one of the areas of neural networks I’m very interested in. It would also be a help for those with “locked-in” syndrome, like stroke victims, or people with problems like ALS, such as Stephen Hawking.
What are some tools that you use to conduct your research?
Python (it’s the top data science language according to my dice.com data of over 100k job postings), Ubuntu, Keras, Tensorflow, CUDA, and Nvidia Titan X, a beefy desktop rig with 128GB RAM and an M.2 Pci-e solid-state drive, Jupyter notebooks, IPython, and the Atom IDE. I also have a laptop with 32GB RAM and a 4GB graphics card. I don’t use Windows anymore, unless I have to for some software reason.
What do you look for when hiring Teaching Assistants and Research Assistants?
Currently I only hire unpaid research assistants. I need someone who can code in Python, preferably with some knowledge of neural networks. They have to be able to work independently on tasks with somewhat of a vague description (e.g. code a 5-layer dense neural net and get a test set accuracy on a dataset), and have an agreeable personality. They need to comprehend math and matrices so that they can write code in NumPy and Keras. Other traits I look for include a strong desire to learn more about neural networks and make an impact on the world.
What advice would you give to students who aspire to be data scientists?
First, you should think for a bit if you really want to be a data scientist. Data scientists spend the majority of their time on computers, often in solitude. And data science presents tough intellectual problems that can make your brain hurt. If you don’t like computers, working alone, or pushing your brain to new limits, maybe data science isn’t for you. However, if you like solving puzzles, math, programming, and computers, then data science is probably a good fit and you should go for it.
When learning, train from many different sources. You will have to teach yourself most new concepts eventually, but of course, you should start with guided instruction – it’s much more efficient to start out. There are a lot of universities that offer degrees in data science you should definitely consider. At Regis we offer both in-person and fully online (remote) Master’s of Science degrees in Data Science, which is accredited by the Higher Learning Commission (HLC). I also have found DataCamp and Udacity have many quality courses for data science and machine learning.
If you don’t have a math or science-related degree, you may need to get a BS or MS in data science to be taken seriously as an applicant. Alternatively, you can work extremely hard, and land an entry-level job without a math or science degree. For example, a friend of mine from Galvanize has a BA and MA in music, and was able to get a job as a data scientist. It’s an entry-level job though, and he worked extremely hard to get it. My BS and PhD in chemical engineering are credentials that were highly valued by recruiters and interviewers.
Lastly, don’t give up, and take breaks when things get frustrating. At times you will want to scream from frustration due to code not working. If you can scream, go for it! Catharsis can be useful for dealing with frustration. I’d suggest shying away from yelling at your computer in coffee shops or public places, though.