Data science

Data Science arises as a new area that aims to materialize processes and practices to explore, analyze and generate models that enable the description and prediction from a wide range of data types. Ultimately, these processes and practices will support better performance and efficacy of the organizations and quality of life of the citizens.

Data Science models and transforms data to subsidize the decision process through computational thinking, towards data-driven decision making.

Data Scientist

Professional of the decade

Profile:

Data Science in Practice

If you torture the data long enough, it will confess. - Ronald Coase

Data management: several general or specialized platforms for all kinds of data

Data mining: several implementations of each technique

User expertise: does the data scientist need to program?

NO! (S)he needs just to think algorithmically.

Lemonade in the context of data science

Enablers:

Motivations

Data mining

Machine learning

Data science 101

Techniques, algorithms and models

How to choose between the different available techniques?

Is my data set ready for what I want to do?

How to formulate the correct question about data?

Predict and evaluate an answer

Standing over the shoulders of giants (or Ctrl+C, Ctrl+V)

Copy workflows

Use external Tutorials

Repositories of machine learning experiments

Resources

Kaggle

"Cortana Intelligence Gallery enables our growing community of developers and data scientists to share their analytics solutions".

Graph analysis

https://blog.cloudera.com/blog/2016/10/how-to-do-scalable-graph-analytics-with-apache-spark/

Regression

https://hortonworks.com/tutorial/predicting-airline-delays-using-sparkr/

Sentiment analysis

https://hortonworks.com/tutorial/sentiment-analysis-with-apache-spark/