我不知道中国的情况是怎样,但我可以谈谈美国的情况。大家心目中的软件工作师和数据科学家的职责是有分别的,但那个分界可能不是很清楚。
软件工程师是一个存在多年的职业,工作也有规范,如编写代码,要用OOP,要写Unit Test,当然要除虫了,懂得version control,要deploy分工也细。
可是数据科学家是相对较新的行业,但具体来说,要处理大量数据,当中包括很多如格式、错误资料、语言等要处理,这往往花掉了一半的精力;然后要从数据中找出讯息或知识,当中涉及不少数学、统计模型,或者机器学习的方法,可见数学能力是一个重要要求。另外也要有商业触觉,要多看新闻。
由于数据科学家需要编程,所以他做的东西也有和软件工程师重迭。我的职衔是软件工程师,但做起来像研究员,其实是数据科学家的工作。
数据科学家的工作不太规范。Harlan Harris, Sean Murphy, Marck Vaisman在去年写的《Analyzing the Analyzers》提到一个图如下:
当中的Data Developer相当于Software Developer,和Data Researcher(即Data Scientist)有点不同。但注意的是,时代还在改变,这行业还没稳定,这种图形会继续改变。
另参:What is a data scientist? 14 definitions of a data scientist! Big Data Made SimpleData scientist is a person who has the knowledge and skills to conduct sophisticated and systematic analyses of data. A data scientist extracts insights from data sets for product development, and evaluates and identifies strategic opportunities.
Other popular definitions:
1. "There's a joke running around on Twitter that the definition of a data scientist is 'a data analyst who lives in California," -- Malcolm Chisholm
2. "A data scientist is that unique blend of skills that can both unlock the insights of data and tell a fantastic story via the data," -- DJ Patil
3. "Data scientists are involved with gathering data, massaging it into a tractable form, making it tell its story, and presenting that story to others," -- Mike Loukides
4. “A data scientist is a rare hybrid, a computer scientist with the programming abilities to build software to scrape, combine, and manage data from a variety of sources and a statistician who knows how to derive insights from the information within. S/he combines the skills to create new prototypes with the creativity and thoroughness to ask and answer the deepest questions about the data and what secrets it holds,” -- Jake Porway
5. Data scientists are “analytically-minded, statistically and mathematically sophisticated data engineers who can infer insights into business and other complex systems out of large quantities of data,” -- Steve Hillion
6. "A data scientist is someone who blends, math, algorithms, and an understanding of human behavior with the ability to hack systems together to get answers to interesting human questions from data," --Hilary Mason
7. Data scientist is a "change agent." "A data scientists is part digital trendspotter and part storyteller stitching various pieces of information together." -- Anjul Bhambhri
8. "The definition of “data scientist” could be broadened to cover almost everyone who works with data in an organization. At the most basic level, you are a data scientist if you have the analytical skills and the tools to ‘get’ data, manipulate it and make decisions with it." -- Pat Hanrahan
9. "By definition all scientists are data scientists. In my opinion, they are half hacker, half analyst, they use data to build products and find insights. It’s Columbus meet Columbo – starry eyed explorers and skeptical detectives." -- Monica Rogati.
10. "A data scientist is someone who can obtain, scrub, explore, model and interpret data, blending hacking, statistics and machine learning. Data scientists not only are adept at working with data, but appreciate data itself as a first-class product." -- Daniel Tunkelang
11. An ideal data scientist is “someone who has the both the engineering skills to acquire and manage large data sets, and also has the statistician’s skills to extract value from the large data sets and present that data to a large audience.” -- John Rauser
12. Data scientist is "someone who can bridge the raw data and the analysis - and make it accessible. It's a democratising role; by bringing the data to the people, you make the world just a little bit better," --Simon Rogers
13. "A data scientist is an engineer who employs the scientific method and applies data-discovery tools to find new insights in data. The scientific method—the formulation of a hypothesis, the testing, the careful design of experiments, the verification by others—is something they take from their knowledge of statistics and their training in scientific disciplines. The application (and tweaking) of tools comes from their engineering, or more specifically, computer science and programming background. The best data scientists are product and process innovators and sometimes, developers of new data-discovery tools," -- Gil Press
14. "A data scientist represents an evolution from the business or data analyst role. The formal training is similar, with a solid foundation typically in computer science and applications, modeling, statistics, analytics and math. What sets the data scientist apart is strong business acumen, coupled with the ability to communicate findings to both business and IT leaders in a way that can influence how an organization approaches a business challenge. Good data scientists will not just address business problems, they will pick the right problems that have the most value to the organization," -- IBM researchers |