Data Scientist Position
- Analyze and model structured data using advanced statistical methods and implement algorithms and software needed to perform analyses.
- VBuild recommendation engines, spam classifiers, sentiment analyzers and classifiers for unstructured and semi-structured data.
- Cluster large amount of user generated content and process data in large-scale environments using Hadoop and Spark.
- Perform machine learning, natural language, and statistical analysis methods, such as classification, collaborative filtering, association rules, sentiment analysis, topic modeling, time-series analysis, regression, statistical inference, and validation methods.
- rive client engagements focused on Big Data and Advanced Business Analytics, in diverse domains such as product development, marketing research, public policy, optimization, and risk management; communicate results and educate others through reports and presentations.
- Perform explanatory data analyses, generate and test working hypotheses, prepare and analyze historical data and identify patterns.
- Five years of professional experience working as a Data Scientist.
- Experience with command-line scripting, data structures and algorithms and ability to work in a Linux environment, processing large amounts of data in a cloud environment.
- Masters degree or PhD from an accredited college/university in Computer Science, Statistics, Mathematics, Engineering, Bioinformatics, Physics, Operations Research, or related fields, with a minimum of two years of relevant experience (strong mathematical background with ability to understand algorithms and methods from a mathematical viewpoint and an intuitive viewpoint).
- Strong knowledge in at least one of the following fields: machine learning, data visualization, statistical modeling, data mining, or information retrieval.
- Strong data extraction and processing, using MapReduce, Pig, and/or Hive preferred.
- Proficiency in analysis (e.g. R) packages, programming languages (e.g. Java, Python, Ruby) as well as the ability to implement, maintain, and troubleshoot big data infrastructure, such as distributed processing paradigms, stream processing and databases such as Hadoop, Storm, SQL and Solr.
5 ème étage
39 Rue de la Cité
Zeta Building Unit #23, 191