0%

Doing Data Science

Doing Data Science

Status: Unfinished

Statical inference
This overall process of going from the world to the data, and then from the data back to the world, is the field of statistical inference.

Observation
The characteristics of object is Observation

Modeling

  • A model is our attempt to understand and represent the nature of reality through a particular lens, be it architectural, biological, or mathematical.
  • How do you have any clue whatsoever what functional form the data should take? Truth is, it’s part art and part science.

Probability Distribution

So what is the process of doing data science?

The population is the total objects we interested in which can be represented as N.
The sample is the subset of population.
A observation can be think as the quantized or classified characteristics we capture or extract from an object.
The observe data of sample object are usually the so call data (as far as I know).
But the data themselves do not say any thing, we need some a new idea, to simplify the data, make them more comprehensible and easy to use to explain the rules of our real world. So we need the estimator or statistical estimator.

Statistical estimator is a function or rule for calculating an estimate of given quantity based on observed data. Firstly, it’s a function comes from observed data, the discovering of this function is modeling.
Secondly, it is used to calculating the estimate of given quantity.
The activities above form the process of statistical inference, start from the real world to data and then from the data back to the real world.

Modeling

Modeling is man made, aimed at eliminating the unimportant detail of the objects and abstracting information from data.
Can I say that the modeling is choosing estimator, combining and optimizing the estimators?
Before coding, we need to take a look at our data or draw a map of our data, try to understand which one is the cause, which one is affected.
In Statical Modeling people use a function to represent the relationship between data.
Greek letters represent the parameters latin letters represent data.
Modeling is part of art and port of science.
So there is no such a model that is the best.
We can just start from the simple one, take a look at the hist scatter plot, find out a simple model. Sometimes, for example, a simple model can explain 90% of data relationship while a complex one only make it 2% better to 92%.

1
2