Math for Data Science

Introduction to the Class - Math Self Assessment

Author

Joanna Bieri
DATA100

Important Information

Why is Math so important?

If you want to do data science or generally understand trends in the world (mathematical modeling), then one of the best things you can do I get good at a few important areas of mathematics. Data science is a discipline that sits between math and computer science with applications in almost any discipline (Art, Medicine, Engineering, Business, Economics, etc). How much math you need to know depends on what type of Data Science you want to do.

What is this class for?

The goal of this class is to go over some important areas of mathematics AND to show how to us Python programming to interact with and better understand mathematical ideas. I am expecting that everyone in the class is somewhat comfortable with high school math, but maybe needs some review. Some of the topics we cover might be review for you and some of the topics might be brand new.

Here are some things we will cover:

  • Foundational Mathematics: Algebra, Functions, Graphing, Linear Systems.
  • Differential Calculus: Limits of Functions, What is a derivative, Optimization, What is an integral.
  • Probability: Joint, Union, and Conditional Probabilities, probability Distributions.
  • Linear Algebra: Matrix and Vector calculations, Determinants, Eigenvalues, Solving large Linear Systems.

We will just do a overview of topics and focus on interpretation of results. For a deeper, more theoretical, understanding of any of these topics I strongly recommend a minor in Mathematics:

Take: MATH 121, 122, 221, 241, 311, 312.

How are these topics used in Data Science?

From: https://www.multiverse.io/en-US/blog/how-much-math-data-science

Linear algebra (MATH 241)

Some consider Linear Algebra the mathematics of data and the foundation of machine learning. Data Scientists manipulate and analyze raw data through matrices, rows, and columns of numbers or data points.

Datasets usually take the form of matrices. Data Scientists store and manipulate data inside them and they use linear algebra during the process. For example, linear algebra is a core component of data preprocessing. It’s the process of organizing raw data so that it can be read and understood by machines.

At a minimum, Data Scientists should know Matrices and Vectors and how to apply linear algebra principles to solve data problems.

Calculus

Data Scientists use calculus to analyze rates of change and relationships within datasets. These math skills help them understand how a change in one variable — such as changing customer preferences — affects another variable, like sales revenue.

Before you begin your data science journey, you should master the two main branches of calculus: differential and integral.

Differential calculus (MATH 118/119, 120, 121)

Differential calculus studies how quickly quantities change. Data Scientists should learn its foundational concepts, including limits and derivatives. Python libraries like NumPy and SymPy can speed up this learning process by performing complex calculations efficiently.

Data professionals apply differential calculus to optimize machine learning models and functions. For instance, gradient descent calculates the error between the predicted and actual results. This method allows neural networks and other types of algorithms to adjust their parameters iteratively, reducing errors and improving performance.

Integral calculus (MATH 122, 221)

Integral calculus analyzes the accumulation of quantities over a specific integral. To effectively apply this technique, you must understand definite and indefinite integrals. Familiarity with Python libraries like SciPy can also help you calculate integrals.

Data professionals use this branch of mathematics to solve many problems in data science, such as forecasting the demand for a product and analyzing revenue. Machine learning algorithms also use integral calculus to calculate probability and variance.

Probability and statistics (MATH 111, 311 and 312)

Probability and statistics go hand in hand. Data professionals use these mathematical foundations to analyze information and forecast events.

Statistics is the branch of mathematics that collects and analyzes large data sets to extract meaningful insights from them. Data Scientists use statistics to:

Collect, review, analyze, and form insights from data Identify and translate data patterns into actionable business insights Answer questions by creating experiments, analyzing and interpreting datasets Understand machine learning and predictive models Here are a few examples of statistics principles you’ll need to know to break into the data science field:

Descriptive statistics - Analyzes a dataset to summarize its main characteristics, like mean and mode Inferential statistics - Extrapolates from known data to make predictions or generalizations about a larger population Linear regression - Predicts the relationship between an dependent variable and two or more independent variables Statistical experiments - Know how to create statistical hypotheses, do A/B testing and other experiments, and form conclusions

In contrast, probability is the likelihood that an event will occur. Data professionals use this method to analyze risk, forecast trends, and predict the outcomes of business decisions.

Data Scientists need to know these basics of probability:

  • Distributions - Summarizes all the possible values in a dataset and the frequency with which they occur
  • Statistical significance - Measures the likelihood that a relationship or result isn’t random
  • Bayes’ Theorem - A mathematical formula used to calculate the likelihood of an event based on prior knowledge and the * probabilities of related events
  • Hypothesis testing - Determines whether your assumptions about a particular population or dataset are supported by evidence
  • Probability theory - Calculates the likelihood of different outcomes of random events or uncertain situations

Keep in mind that how much math you need to know may also depend on your role. For example, a junior Data Analyst focuses more on analyzing trends. Although they still need to know how to extract data and interpret information, they work less with complex mathematical concepts. Unless they need to work with machine learning algorithms, they’ll use math for data science less than a senior-level Data Scientist.

What if I am not very confident in my math skills?

An important goal of this class is to give you the confidence to learn/review math on your own. It is totally okay to forget math that you learned in the past! It is totally okay to be completely confused by something that you learned in middle school! What is NOT okay is to give up :)

This class should be a safe place for exploring and learning mathematical ideas. It is everyone’s job to ask and generously respond to questions.

Review and Self Assessment

Below is a list of problems to help you assess where you might new more review. Attempt each of the problems - then look at the answers below. Make a note of a few things:

  • Which problems gave you the least/most anxiety?
  • Which problems did you feel you understand the concepts, but just rushed the calculation?
  • Were any of the problems fun to figure out?

  1. Rounding and Significant Figures:

      1. Round 799 to the nearest ten
      1. Round 94,449 to the nearest thousand
      1. Estimate the sum of 38+99+21+14 by rounding to the nearest ten.
      1. Estimate the quotient of 48/8 by rounding to the nearest ten.
      1. How many significant figures does a number reported as 3.42100 have?
      1. How many significant figures does a number reported as 342 have?

  1. Application Problems

      1. Thirty identical chairs cost \$1680. What is the cost of one chair?
      1. A cashier receives three \$50 dollar bills to pay for a purchase of \$123. What is the change?
      1. What is the number of square yards in a field that measures 30 yards by 41 yards?

  1. Fractions and percents

      1. Reduce \(\frac{12}{20}\)
      1. Multiply \(\frac{6}{4}.\frac{2}{3}\)
      1. Find the Reciprocal \(\frac{8}{3}\)
      1. Add and Simplify \(\frac{1}{4}+\frac{-2}{5}\)
      1. Change the mixed number to an improper fraction \(2\frac{1}{5}\)
      1. Divide and simplify \(\frac{\frac{1}{3}}{\frac{2}{7}}\)
      1. Convert \(\frac{10}{15}\) to a decimal
      1. Find 75% of 40
      1. If a class has 2 teachers and 30 students, what is the student to teacher ratio?

  1. Algebra

      1. In which quadrant does the point \((2,-6)\) lie?
      1. Factor \(x^2-16\) completely
      1. Solve \(4(3x+2)-(x+5)=-3\)
      1. Simplify the exponents \(\left(\frac{2x^3y^2}{z}\right)^3\)
      1. Evaluate \(\frac{3x}{2y}.\frac{8y^2}{27x}\)
      1. Solve for x and y: \[2x+y=3\] \[x-3y=12\]
      1. Evaluate \((5\sqrt{3x})^{2}\)
      1. Simplify \(\sqrt{x^2-4x+4}\)
      1. Solve for x: \(\log_2 x = 3\)
      1. Solve for x: \(e^x=1\)

  1. Geometry and Trigonometry

      1. What is the area of a circle with radius \(r=3\) meters.
      1. What is the length of the side of a square with area 4.
      1. One complete revolution around a circle measures how many degrees? radians?
      1. If a right triangle has sides \(a=2\) and \(b=3\) then what is the length of the hypotenuse?
      1. For what values of \(x\) does the function \(y=sin(x)\) equal zero?
      1. Draw a graph of the function \(y=x^2-4\). What are the zeros?

  • Solutions 1: 800,94000,170,5,6,3
  • Solutions 2: 56, 27, 1230
  • Solutions 3: 3/5,1,3/8,-3/20,11/5,7/6,.66666,30,15:1
  • Solutions 4: fourth,(x+4)(x-4), x=-6/11, (8x6y5) / (z^3), 4y/9, x=3 and y=-3, 75x, x-2, x=8, 0
  • Solutions 5: 9pi, 2, 360 or 2pi, 5, 0,pi,2pi,3pi, ….,x=2,-2

What are some areas of math that you would like a bit of review or refresher?

What was your last math class and how did you do?

Why use Python?

We are going to use Python like a graphing calculator: to do some of the more tedious calculations for us and to show graphs and interact with applications. Here are some things it can do:

# Round numbers
round(14.9858187509,3)
14.986
# Add, Subtract, Multiply, Divide
(31+4)*2/(9-1)
8.75
# Exponents 3^2 = 3**2
3**2
9
# We need packages to do other more complicated things - here are two we will use

# Numbers
import numpy as np
# Symbols
import sympy as sp
# Logs
np.log(2)
0.6931471805599453
# Square roots
np.sqrt(4)
2.0
# Functions
np.sin(3)
0.1411200080598672
# Symbols
x = sp.symbols("x")
4*x+3

\(\displaystyle 4 x + 3\)

# Solve for 4x+3=0
sp.solve(4*x+3,x)
[-3/4]
# Plot a function
from sympy.plotting import plot
plot(4*x+3)