Top 100 Data Science Resources for 2018

top 100 data science resources 2018

 

Whether you’re just getting started in data science, or are gearing up to get a Masters in Data Science or Business Analytics, there’s always more to learn. This guide gathers 2018’s top data science resources on the web for learners of all stages. MastersInDataScience.com’s editors selected these resources based on their relevance to data science and how recently they have been updated. From weekly newsletters to Python tutorials, you’ll find it all below.

Table of Contents

Online Courses and Tutorials

Dataquest – Those interested in career as a data scientist, data analyst, or data engineer can learn through interactive coding challenges. Students can working on problems and real data science projects.

Udemy – Known as world’s largest online learning marketplace, Udemy offers digital, self-paced courses in any topic. Their data science selection covers everything from machine learning to data visualizations to Python programming.

Coursera – Coursera partners with universities and institutions across the world to offer online courses. Registrants have access to recorded video lectures, auto-graded and peer-reviewed assignments, and community discussion forums.

DataCamp – This website is a great resource for beginners and seasoned experts looking to advance their skills alike. DataCamp has interactive R and Python courses related to data science, statistics, and machine learning.

Udacity – Earn a nanodegree in data analysis to start a career in data science. Udacity offers interactive content like quizzes, videos, and hands-on programs through its curriculum.

EdX – This site has free online data science courses to build your skills and advance your knowledge. Learn data science, statistics, and business intelligence through courses from top universities and global partners on edX.

Cognitive Class – Learn skills in data science, AI, big data, and blockchain with practice labs and videos. Cognitive Class also offers some courses in Japanese, Spanish and Russian for international users.

Springboard – This online bootcamp covers the data science process, from statistics to machine learning and data storytelling. You can specialize in a field like machine learning or NLP and graduate the program with a portfolio ready for interviews.

Data School (YouTube) – This YouTube channel belongs to Kevin Markham, a full-time data science educator based in Washington, DC. His videos cover a range of data science topics like using open source tools, Python and R.

Siraj Raval (YouTube) – Siraj Raval’s YouTube channel features videos to inspire and educate developers to build Artificial Intelligence. He creates engaging videos teaching viewers how to use AI to make games, music, chatbots, art, and more.

IntelliPaat – IntelliPaat tutorials cover topics like big data, business intelligence, and databases. You can learn from industry experts and get certified after successfully completing a course.

Lynda – Through individual, corporate, academic and government subscriptions, members can watch Lynda’s quality video courses taught by thought leaders in the field. You can choose the course duration and skill level to suit your professional needs.

Kaggle – Kaggle is suitable for beginners wanting to get their feet wet in a number of data science specializations. You can choose a track in machine learning, pandas, data visualization, R, SQL, or deep learning.

TensorFlow – TensorFlow is an open-source machine learning software. Its website contains several in-depth tutorials on how to use the program for several basic uses.

Data Science Research

Data Science for Social Good – This website is managed by the Center for Data Science and Public Policy at the University of Chicago. It highlights socially beneficial data science projects for non-profit, government, and social organizations.

International Journal of Data Science and Analytics –  This online journal welcomes experimental and theoretical discoveries in data science and advanced analytics with their real-life applications. This is the first scientific journal in data science and analytics.

CODATA Data Science Journal – This online journal is open access and includes papers on the descriptions of data systems, their implementations and their publication, applications, and so on. It shares a  particular focus on the principles, policies and practices for open data.

Data Science Foundation Whitepapers – The Data Science Foundation features whitepapers written by members of the Foundation and are open to peer review by other members. Read about recent findings and opinions of thought leaders in the field. Take advantage of free individual membership and participate in the community.

IMB Big Data & Analytics Hub – These white papers follow recent trends in Big Data in industry. Readers can subscribe to receive new reports and white papers in their inbox.

Institute for Data Science – Fellows of the Berkeley Institute for Data Science share the results from data-intensive research on today’s major issues. The Institute has 6 working groups covering topics like Working Culture and Education.

EPSRC Centre for Doctoral Training – This website highlights current PhD data science research from the Centre. Faculty engage in research in data science areas, combining theory with application.

Data Science Blogs

Edwin Chen’s Blog – Edwin studied math and linguistics from MIT, and has a background doing quant trading at Clarium, ads at Twitter, data science at Dropbox, and stats/ML at Google. His blog holds insightful information on AI, human computation, and data.

DataScience.com – The DataScience.com blog has articles that cater to data science, business, and IT teams. Experts also have a chance to share their insight with thousands of site readers.

Data Science 101 – Ryan Swanstrom holds a PhD in Computational Science and Statistics and has been posting on his blog since 2012.He shares data science resources, news, and research.

The Shape of Data – Although now a software engineer at Google, Jesse Johnson began as a math professor. His blog explores and explains the basic ideas that underlie modern data analysis.

Planet Big Data – Planet Big Data compiles blogs about big data, Hadoop, and related topics. It includes posts by bloggers worldwide.

Big Data Blog – The conference itself presents the latest discoveries in big data, actionable insights, and showcases best practices. On the blog you can find articles on these topics that save a trip to the conference.

Chris Ablon’s Blog – Chris Ablon is a data scientist and political scientist with over 10 years experience with statistics, artificial intelligence, and software engineering. He specializes in political, social, and humanitarian efforts, and shares hundreds of notes on the topics.

NYC Data Academy Blog – This blog has articles on R, web scraping, machine learning, and meetups. It also outlines some capstone projects from participants in their training program.

DBMS2 -This blog is for people interested in database and analytic technologies, and features commentary from industry professionals. Its author, Curt Monash, has been following the industry for over 30 years.

Data36 – Tomi Mester’s blog gives insight into online data analysts’ best practices. You will find articles, online courses, and videos about data analysis, AB-testing, research, and data science.

Operational Database Management Systems – This site is a resource portal for big data, new data management technologies and data science. The editor is a professor of Database and Information Systems at Frankfurt University.

Yanir Seroussi’s Blog – Yanir Seoussi is an experienced data scientist and software engineer with a developed background in programming, computer science, machine learning, and statistics. His blog includes thoughts on topics from  isolated data problems to building production systems.

Simply Statistics – This blog is written by three biostatistics professors: Jeff Leek, Roger Peng, and Rafa Irizarry. They posting ideas, contribute to discussion of science/popular writing, link to articles, and share advice with up-and-coming statisticians.

What’s the Big Data? – Gil Press runs his own consulting practice, gPress, providing writing, research and marketing services. His blog explores big data’s impact on information technology, the business world, government agencies, and our lives.

Towards Data Science – This blog serves as a platform for thousands of people to exchange ideas and to expand our understanding of data science. You can read a diverse collection of thoughts from data scientists of diverse specializations.

Alexis Perrier’s Blog – Alexis Perrier is a data science consultant, who helps companies large and small, profit from machine learning. As a data science instructor, he shares his knowledge on focuses like linear regressions to deep learning on his blog.

Algobeans – Algobeans was created by data science enthusiasts, Annalyn from the University of Cambridge and Kenneth from Stanford University. They created the site to give everyone access to data science in simplified terms.

Ben Frederickson’s Blog – Ben Frederickson is a software developer based in Vancouver, Canada. He shares projects and findings about software development and data science on his blog.

Daniel Nee’s Blog – Daniel Nee has a background in Machine Learning and Computer Science, and works as a Data Scientist. His blog features posts about his experience, useful tools and techniques, and other interests.

Data Blogger – This blog neatly organizes its posts by category like technology, do-it-yourself, and data science. There is also a Q&A section so you can ask any data science question you may have.

Data Mining Research– Sandro Saitta started this blog as a PhD student in Switzerland. The blog began discussing data mining research issues. Now posts discuss research issues, recent applications, important events, interviews with leading actors, current trends, book reviews, and more.

Data Double Confirm – Hui Xiang Chua works as a research analyst in Singapore when she’s not sharing knowledge on her blog. Data Double Confirm documents her learning journey in data science, and is a great resource for data collection, data preparation, data visualization, to basic statistical analysis and modelling.

Data Meets Media – This unique blog blends together television, movies, and data science. Posts cover the two topics individually and also find an intersection between all of them.

DataAspirant – Saimadhu Polamuri is a self-taught data scientist, data science educator, and the founder of DataAspirant. This blog serves as a data science resource for beginners.

Data Science E-Books

Journey to Data Scientist: Interviews with More Than Twenty Amazing Data Scientists – When author Kate Strachnyi wanted to learn more about data science, she went straight to the source. In a series of more than twenty interviews, she asks leading data scientists questions about starting in the field and the future of the industry.

Learn Python the Hard Way – Newly updated for Python 3, the original and still the most popular way for total beginners to finally learn how to code. Learn Python The Hard Way takes you from absolute zero to able to read and write basic Python to then understand other books on Python.

O’Reilly Free Data Science Library – This library compiles the best data insights from O’Reilly editors, authors, and Strata speakers for you in one place, so you can dive deep into the latest of what’s happening in data science and big data.

Bayesian Reasoning and Machine Learning – The book targets students with backgrounds in computer science, engineering, applied statistics, physics, and bioinformatics that want to gain knowledge of Machine Learning. The author introduces fundamental concepts in inference using laymans terms and a low level of algebra and calculus.

Guide to Data Mining – This free book takes a learning-by-doing approach to explain basic data mining techniques. Guide to Data Mining introduces practical data mining, collective intelligence, and building recommendation systems.

Interpretable Machine Learning – This online book is about making machine learning models and their decisions interpretable. It’s suitable for machine learning practitioners, data scientists, statisticians and anyone else interested in making machine decisions more human.

The Data Science Handbook – The Data Science Handbook is a compilation of thorough interviews with 25 accomplished data scientists with their insights, stories, and advice. While this book isn’t a tutorial on data science topics, it gives practical career insight into a variety of industries.

Art of Data Science – This book describes the process of analyzing data. The authors have developed backgrounds in managing data analysts as well as conducting their own data analyses.

The Data Analytics Handbook – This Handbook takes an in-depth look at the data science industry through interviews with data scientists, data analysts, CEOs, managers, and researchers at the cutting edge of the data science industry.

Numsense! Data Science for the Layman: No Math Added – As the title implies, this book breaks down data science for people of all backgrounds, leaving out the quantitative jargon. Aspiring students, enterprising business professionals, or other eager learners can find tutorials and easy to understand explanations.

D3 Tips and Tricks v4.x – This books includes tips and tricks for using d3.js (version 4), one of the leading data visualization tools for the web. It’s aimed at getting you started and moving you forward.

Data Mining Algorithms In R – Those who know the programming language R and wish to learn more about data mining will benefit from this WikiBook. Understanding how the algorithms work will help grow your understanding of data mining.

Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management, 3rd Edition – Learn to use data mining for marketing or sales purposes. You’ll pick up advice for improving response rates to direct marketing campaigns, identifying new customer segments, and estimating credit risk.

Data Science Community

Analytics Vidhya – This knowledge and community portal serves both beginners and professionals from analytics, data science & data engineering communities to enhance their careers.  The portal is simple and serves the community through the latest blogs, discussions, machine learning hackathons, data science trainings, meetups and jobs.

Data Science Central – Data Science Central is the industry’s online resource for big data practitioners. The website provides a community experience that includes a robust editorial platform, social interaction, forum-based technical support, the latest in technology, tools and trends and industry job opportunities.

KDnuggets – KDnuggets provides a platform to connect on Business Analytics, Big Data, Data Mining, Data Science, and Machine Learning. The site has sections for news, stories, opinions, tutorials, meetups, and more.

Data Science on Reddit – For those familiar with Reddit, this subreddit provides a comfortable forum to share opinions and find others interested in data science. Discover trending topics or ask any burning data science questions.

Smart Data Collective – SmartData Collective is an editorially independent, moderated community providing enterprise leaders access to the latest trends in Business Intelligence and Data Management.This site serves as a platform for recognized, global experts to share their insights through peer contributions, custom content publishing and alignment with industry leaders.

Data Tau – Data Tau is a platform that allows users to share links to data science related resources, tutorials and projects. It’s a very simple interface that lets you also comment on shared links.

Codementor – Learn about the latest trends in Data Science. You can also find tutorials, posts, and find mentors to develop your database systems or programming languages skills for free.

Medium – Data Science – Read stories  about Data Science contributed by a diverse set of thought leaders on Medium. Discover topics that matter most to you like machine learning, big data, artificial intelligence, data visualization, and python.

D Zone: Big Data Zone – DZone.com is one of the world’s largest online communities and leading publisher of knowledge resources for software developers. This site brings together thousands of developers to read about the latest technology trends, methodologies, and best practices through shared knowledge.

Kaggle – Learn to do data science and machine learning, play with data, and connect with other data scientists. Kaggle also hosts competitions to challenge data scientists and machine learning professionals from around the world.

Cross Validated – Cross Validated is a Q&A site for people interested in statistics, machine learning, data analysis, data mining, and data visualization. Ask your own questions, or help out fellow learners with your own knowledge.

Data.world – Discover and share data, connect with interesting people, and work together to solve problems faster. When you share data, community members can help clean it, add annotations, scripts and visualizations, or just upvote and discuss it.

Analytic Bridge – This site focuses on data analytics and business intelligence. Discover classifieds, webinars, jobs and more.

Data Science Community – DataScience.Community displays the top must-read articles every day. Find jobs, look for candidates, or display a portfolio page to show off your work.

Data Science Degrees

Master of Science in Analytics (MSAn) from American University – Through a combination of collaborative online classes, self-paced coursework, and hands-on learning experiences, students become experts in evidence-based data gathering, data modeling, and quantitative analysis. Visit onlinebusiness.american.edu to learn more about Analytics@American.

Sponsored Program

Master of Information and Data Science (MIDS) from UC Berkeley – Drawing upon the social sciences, computer science, statistics, management and law, the programs prepare students to solve real-world problems by deriving insights from complex and unstructured data. Students in the program benefit from UC Berkeley’s strong ties to the Bay Area and Silicon Valley. Visit datascience.berkeley.edu to learn more about datascience@berkeley.

Sponsored Program

Master of Science in Applied Data Science from Syracuse – Featuring live online classes, interactive coursework and opportunities to network, the online learning format facilitates collaboration, problem-solving and in-depth analysis within an interdisciplinary curriculum. Students focus on pulling insights to drive business decision-making and operational processes using applications of data. Visit datascience.syr.edu to learn more about DataScience@Syracuse.

Sponsored Program

Master of Science in Business Analytics – This program helps data-driven thinkers develop or sharpen their ability to interpret complex data and guide their organizations in making more informed and actionable decisions. Through an action-oriented online learning format, students develop and hone their expertise in areas such as predictive analytics, data modeling, and information systems. Visit onlinebusiness.syr.edu to learn more about BusinessAnalytics@Syracuse.

Sponsored Program

Master of Science in Data Science from Southern Methodist University – Designed for working professionals looking to advance their careers, this program focus on statistics, computer science, strategic behavior and data visualization skills so students can drive decision-making and advance in careers across industries. The program blends live online classes, self-paced coursework and in-person learning experiences with classmates and faculty. Visit datascience.smu.edu to learn more about DataScience@SMU.

Sponsored Program

Coding With R and Python

Revolutions – Revolutions is a blog dedicated to news and information of interest to members of the R community. This blog is updated every US workday, with contributions from various authors.

How to Think Like a Computer ScientistThis book is meant to provide you with an interactive experience as you learn to program in Python. You can read the text, watch videos, and write and execute Python code.

Data School – Kevin Markham,a data scientist and teacher, has resources accessible to data scientists at all levels of knowledge and experience. You can sign up for the blog’s newsletter or browse the several blog posts available on the site.

Flowing Data – FlowingData explores how statisticians, designers, data scientists, and others use analysis, visualization, and exploration to understand data and ourselves. Learn to visualize your data like an expert with these practical how-tos for presentation, analysis, and understanding.

Mode’s Python Tutorial – Learn Python for business analysis using real-world data. No coding experience necessary. This site also has a SQL tutorial, discussion board, and career portal.

R-Bloggers – R-Bloggers.com is a blog aggregator of content contributed by bloggers who write about R. Apart from a gallery of R blog posts to browse, this site also features R tutorials and job postings.

blogR – This site includes R tips and tricks from a scientist. All R Markdown docs with full R code can be found at the author’s GitHub (link included on his blog).

Newsletters

Mode Analytics – Receive all the cutting edge news data to your inbox. Stay in the know with a regular selection of the best analytics and data science pieces, plus occasional news from Mode.

Data Elixir – Data Elixir is a weekly newsletter of curated data science news and resources from around the web. If you miss an issue, you can catch up on any week’s news on their site.

Data Science Weekly – This sign up is a free weekly newsletter featuring curated news, articles, guides, and jobs related to Data Science. The newsletter’s goal is to help you keep up with all the latest developments without the hassle of doing it yourself.

Inside Big Data – Subscribe to this free insideBIGDATA Newsletter, written by Rich Brueckner, who was recently named by Forbes as “one of the top 20 most influential people in Big Data.” You will also get insights from veteran writer and data scientist Daniel Gutierrez, plus thought leadership from many industry experts.

O’Reily Data Newsletter – Receive weekly insight from industry insiders—plus exclusive content, offers, and more on the topic of data. This newsletter has been rated 4 out of 5 starts by over 1900 reviewers.

The Data Science Round Up – with over 7,500 subscribers, this newsletter delivers the internet’s most useful data science articles. You can catch up on past issues on the site too to get a roundup of diverse articles.

Free Data Science Tools

Tableau – Tableau Software is most likely recognizable to anyone involved with data visualizations. It makes analyzing data fast and easy for users of all levels.

Bokeh – Bokeh is an interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of versatile graphics, and to extend this capability with high-performance interactivity over very large or streaming datasets.

Apache Hadoop – The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

D3.js – D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG, and CSS.

Jupyter – Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Users can clean and transform data, do numerical simulation, statistical modeling, data visualization, machine learning, and more.

OpenRefine –  OpenRefine is a powerful tool for working with messy data. This tool allows users to clean data, transforming it from one format into another, and extend it with web services and external data.

Orange – Orange is an open source machine learning and data visualization tool for novice and expert users. It includes interactive data analysis workflows with a large toolbox.

KNIME – KNIME for Data Scientists blend tools and data types seamlessly. KNIME gives fluid movement from prototyping new analytics approaches to creating production deployments for users across your global enterprise.

DataMelt – DataMelt is a free mathematics software for scientists, engineers and students. It can be used for numeric computation, statistics, symbolic calculations, data analysis and data visualization.

RapidMiner – RapidMiner is a software platform for data science teams that unites data prep, machine learning, and predictive model deployment. It eliminates the complexities of cutting edge data science by making it easy to use the latest machine learning algorithms and technologies like Tensorflow, Hadoop, and Spark.