Are you into Formula 1? Take a look at their data

Auto racing has embraced Big Data and data science.  This web page at AWS discusses their partnership with Formula 1, with each race car generating over 1.1 million data points per second that are transmitted from the car to the pits.  If you want to play around with some of the data, there is a Python module named Fast F1 That will provide access to data and has examples of some analysis you can do with the data.  Another example using this data is covered in this Medium post.

A different set of historical data on Formula 1 is also available through Kaggle.


Kaggle Competition Jigsaw Rate Severity of Toxic Comments

Jigsaw Rate Severity of Toxic Comments

The definition of a toxic comment on the Internet is subjective. Each individual may have their own bar set differently. But which comment is truly worse? Can you help determine the ‘severity’ of a comment?

In this competition, Jigsaw returns to the discussions from Wikipedia Talk pages. You will score a set of about fourteen thousand comments. Your scores will be compared to human rankings performed on comment pairs. In this way, the focus is on the severity of comment toxicity — from innocuous to outrageous, where the middle matters as much as the extremes.

Total Prizes: $50,000

Entry Deadline: January 31, 2022

Kaggle Competitions

Are you interested in testing your data science skills? Kaggle offers a no-setup, customizable, Jupyter Notebooks environment. Access free GPUs and a huge repository of community published data & code. Thousands of data sets are available for exploration and data play. Kaggle also partners with other organizations to host competitions, some for learning, such as this House Price prediction competition, and others offer cash prizes, such as this NFL Big Data Bowl 2022 that investigates special teams performance. Kaggle even offers free micro courses on Pandas, Machine Learning and more!

You can register for a free Kaggle account using any Gmail.


What is Data Science?

If you Google for a definition, you will no doubt find Venn diagrams with three circles (Venn diagrams are those figures you have seen with overlapping circles showing the relationships between things).  If you Google for “data science Venn diagram”, you will find some where folks went wild and have a dozen overlapping circles, but most have 3 overlapping circles (maybe because people who write definitions, such as academics, always seem to want to describe things as three-legged stools).

Recently Datanami had an interview with Jeffrey Ullman from Stanford who is a big name in computer science (particularly databases), and in 2020 won the Turing Award (think Nobel prize in computer science).  The article is short, but interesting, and he points out that everyone has their own diagram that emphasizes their domain!

You can find the article at this link.