Why a DNA data breach is much worse than a credit card leak

You can’t change your DNA
By Angela Chen@chengela Jun 6, 2018, 3:54pm EDT

This week, DNA testing service MyHeritage revealed that hackers had breached 92 million of its accounts. Though the hackers only accessed encrypted emails and passwords — so they never reached the actual genetic data — there’s no question that this type of hack will happen more frequently as consumer genetic testing becomes more and more popular. So why would hackers want DNA information specifically? And what are the implications of a big DNA breach?

One simple reason is that hackers might want to sell DNA data back for ransom, says Giovanni Vigna, a professor of computer science at UC Santa Barbara and co-founder of cybersecurity company Lastline. Hackers could threaten to revoke access or post the sensitive information online if not given money; one Indiana hospital paid $55,000 to hackers for this very reason. But there are reasons genetic data specifically could be lucrative. “This data could be sold on the down-low or monetized to insurance companies,” Vigna adds. “You can imagine the consequences: One day, I might apply for a long-term loan and get rejected because deep in the corporate system, there is data that I am very likely to get Alzheimer’s and die before I would repay the loan.”

Read more

The (Data Science) Notebook: A Love Story by David Wallace

Computational notebooks for data science have exploded in popularity in recent years, and there’s a growing consensus that the notebook interface is the best environment to communicate the data science process and share its conclusions. We’ve seen this growth firsthand; notebook support in Mode quickly became one of our most adopted features since launched in 2016.

This growth trend is corroborated by Github, the world’s most popular code repository. The amount of Jupyter (then called iPython) notebooks hosted on Github has climbed from 200,000 in 2015, to almost two million today. Data from the nbestimate repository shows that the number of Jupyter notebooks hosted on GitHub is growing exponentially:

This trend begs a question: What’s driving the rapid adoption of the notebook interface as the preferred environment for data science work?

Inspired by an Analog Ancestor

The notebook interface draws inspiration (unsurprisingly) from the research lab notebook. In academic research, the methodology, results, and insights from experiments are sequentially documented in a physical lab notebook. This style of documentation is a natural fit for academic research because experiments must be intelligible, repeatable, and searchable.

Read more

Resources for Data Science Job Seekers

February 12, 2018 | Sadavath Sharma — Analyst

Getting your first job in data science can be a full-time job all on its own. Simply finding a job post worth applying to can be a chaotic pursuit (though we’ve tried to make that part easier with the Mode Analytics Data Jobs Board (edited). Once you’ve found a job posting that looks like it could be a fit, you need to make sure you stand out from the crowd of other applicants.

As a data science job applicant, there are two stages to your search. First, you need to get an interview. To do that, you need documentation that you can fill the role. This is where your resume, your portfolio, and (unfortunately) your online presence come in. There are serious issues with looking up candidates on search engines, which range from creating unconscious bias to opening up murky legal situations, but it happens (not here at Mode though). For better or worse, it’s worth taking a quick look at your name’s search results to get a sense for what people might find.

Read more

Thinking in SQL vs Thinking in Python

July 7, 2016 | Benn Stancil — Chief Analyst at Mode

Over the years, I’ve used a variety of languages and tools to analyze data. As I think back on my time using each tool, I’ve come to realize that each encourages a different mental framework for solving analytical problems. Being conscious of these frameworks—and the behaviors they promote—can be just as important as mastering the technical features of a new language or tool.

I was first introduced to data analysis about ten years ago as a college student (my time studying the backs of baseball cards notwithstanding). In school, and later as a economics researcher, I worked almost exclusively in two tools—Excel and R—which both worked well with CSVs or Excel files downloaded from government data portals or academic sources.

Read more

Simple Analytics is Good for Business

A research paper by renowned consultancy Aberdeen Group reveals that “[data] complexity is often best answered with simplicity”. Several new surveys conducted by the group reveal some interesting findings with regards to the costs and benefits of using an integrated tool for data preparation, querying and visualization, as opposed to the “assembly line” approach of dividing these tasks between various proprietary DW, ETL and visualization tools.

Read more

Bringing Augmented Reality to Real Eyeglasses

If reality isn’t cutting it for you, just hold on; engineers are working on augmenting it. At least, they hope to show you more than what would normally be before your eyes, by adding systems to ordinary eyeglasses that would display images and data to enhance your experience.

“I believe in full augmentation,” says Ulrich Simon, vice president of corporate research and technology at Carl Zeiss, in Jena, Germany. An example of full augmentation, he says, might be a surgeon who looks at a patient he’s about to operate on, and sees the MRI image of the patient overlaid on her body.

Read more

Google Is About to Supercharge Its TensorFlow Open Source AI

GOOGLE’S FREE OPEN source framework TensorFlow is about to get more powerful.

Last year Google opened TensorFlow to the entire world. This meant that any individual, company, or organization could build their own AI applications using the same software that Google does to fuel everything from photo recognition to automated email replies. But there was a catch.

Read more

The dplyr package for R

dplyr: A Grammar of Data Manipulation

A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

When working with data you must:
Figure out what you want to do.
Precisely describe what you want in the form of a computer program.
Execute the code.

Read more

ggvis data visualization

ggvis is a data visualization package for R which lets you:
Declaratively describe data graphics with a syntax similar in spirit to ggplot2.
Create rich interactive graphics that you can play with locally in Rstudio or in your browser.
Leverage shiny’s infrastructure to publish interactive graphics usable from any browser (either within your company or to the world).
ggvis combines the best of R (e.g. every modelling function you can imagine) and the best of the web (everyone has a web browser). Data manipulation and transformation are done in R, and the graphics are rendered in a web browser, using Vega. For RStudio users, ggvis graphics display in a viewer panel, which is possible because RStudio is a web browser.

Read more

How to load Shiny into RStudio

Shiny is an R package that makes it easy to build interactive web applications (apps) straight from R.

To install the shiny packages please open your Rstudio environment and type the usual command into your shell:

install.packages(“shiny”)

To run Shiny type:

library(shiny)

The Shiny package comes with eleven built-in examples. Each example is self-contained and demonstrates how Shiny works.

The first example you may want to try is called Hello Shiny, an example plot of a R dataset with a configurable number of bins. Users will be able to change the number of bins by moving a slider bar. The application will immediately respond to the input.

To run Hello Shiny, type:

runExample(“01_hello”)