Welcome to my Blog

The Data Post

Start Reading

Why a DNA data breach is much worse than a credit card leak

You can’t change your DNA
By Angela Chen@chengela Jun 6, 2018, 3:54pm EDT

This week, DNA testing service MyHeritage revealed that hackers had breached 92 million of its accounts. Though the hackers only accessed encrypted emails and passwords — so they never reached the actual genetic data — there’s no question that this type of hack will happen more frequently as consumer genetic testing becomes more and more popular. So why would hackers want DNA information specifically? And what are the implications of a big DNA breach?

One simple reason is that hackers might want to sell DNA data back for ransom, says Giovanni Vigna, a professor of computer science at UC Santa Barbara and co-founder of cybersecurity company Lastline. Hackers could threaten to revoke access or post the sensitive information online if not given money; one Indiana hospital paid $55,000 to hackers for this very reason. But there are reasons genetic data specifically could be lucrative. “This data could be sold on the down-low or monetized to insurance companies,” Vigna adds. “You can imagine the consequences: One day, I might apply for a long-term loan and get rejected because deep in the corporate system, there is data that I am very likely to get Alzheimer’s and die before I would repay the loan.”

Read more

MIT fed an AI data from Reddit, and now it only thinks about murder

Norman is a disturbing demonstration of the consequences of algorithmic bias
By Bijan Stephen Jun 7, 2018, 11:11am EDT

For some, the phrase “artificial intelligence” conjures nightmare visions — something out of the ’04 Will Smith flick I, Robot, perhaps, or the ending of Ex Machina — like a boot smashing through the glass of a computer screen to stamp on a human face, forever. Even people who study AI have a healthy respect for the field’s ultimate goal, artificial general intelligence, or an artificial system that mimics human thought patterns. Computer scientist Stuart Russell, who literally wrote the textbook on AI, has spent his career thinking about the problems that arise when a machine’s designer directs it toward a goal without thinking about whether its values are all the way aligned with humanity’s.

A number of organizations have sprung up in recent years to combat that potential, including OpenAI, a working research group that was founded (then left) by techno-billionaire Elon Musk to “to build safe [AGI], and ensure AGI’s benefits are as widely and evenly distributed as possible.” What does it say about humanity that we’re scared of general artificial intelligence because it might deem us cruel and unworthy and therefore deserving of destruction? (On its site, Open AI doesn’t seem to define what “safe” means.)

This week, researchers at MIT unveiled their latest creation: Norman, a disturbed AI. (Yes, he’s named after the character in Hitchcock’s Psycho.) They write:

Read more

Saudi Arabia’s Global and Regional Economy Shifts in 2018

By Sahil Bali

Sаudі Arаbіа іѕ аn Arab ѕtаtе іn Wеѕtеrn Aѕіа соnѕtіtutіng thе bulk оf thе Arabian Peninsula. Wіth a lаnd area оf аррrоxіmаtеlу 2,150,000 km2 (830,000 ѕԛ mі), Sаudі Arаbіа іѕ gеоgrарhісаllу the second-largest state in thе Arаb wоrld аftеr Algeria. Sаudі Arаbіа is home tо the rеlіgіоn’ѕ 2 mоѕt ѕасrеd mоѕԛuеѕ: Mаѕjіd аl-Hаrаm, іn Mесса, the dеѕtіnаtіоn оf thе аnnuаl Hаjj pilgrimage, аnd Mеdіnа’ѕ Masjid an-Nabawi, thе burial ѕіtе оf the рrорhеt Muhammad. Rіуаdh, thе саріtаl, is a ѕkуѕсrареr-fіllеd metropolis. Saudi Arаbіа іѕ a dеѕеrt соuntrу encompassing most оf the Arаbіаn Peninsula, wіth thе Rеd Sеа and Pеrѕіаn Gulf соаѕtlіnеѕ. Nаturе hаѕ gіftеd this Arаb state wіth rісh оіl resources, соntrоllіng thе second lаrgеѕt оіl reserves in thе world.

Read more

9 Useful R Data Visualization Libraries for Any Discipline

By Asha Hill — Customer Success Analyst at Mode

If you’ve visited the CRAN repository of R packages lately, you might have noticed that the number of available packages has now topped a dizzying 12,550. This means there are packages for practically any data visualization task you can imagine, from visualizing cancer genomes to graphing the action of a book.

For new R coders, or anyone looking to hone their R data viz chops, CRAN’s repository may seem like an embarrassment of riches—there are so many data viz packages out there, it’s hard to know where to start.

To provide one path through the labyrinth, today we’re giving an overview of 9 useful interdisciplinary R data visualization packages. We’ve noted the ones you can take for a spin without the hassle of running R locally, using Mode R Notebooks.

Read more

Python for Big Data Analytics and the Role of R

Two Popular Open-Source Programming Languages to Consider for Your Data Science Toolkit
R and Python are two very popular open-source programming languages for data analysis. Frequently, users debate as to which tool is more valuable, however both languages offer key features and can be used to complement one another. A common perception is that R offers more depth when it comes to data analysis, data modeling and machine learning, but Python is easier to learn and tends to present graphs in a slightly more polished way.1,2 Using the interface Python offers for calling R allows users to reap the benefits of both of these powerful, popular tools for data science. Even if you choose not to combine the two, the different ways in which these two languages are valuable make them both important parts of a data science toolkit.

Why Python?
Read more

The (Data Science) Notebook: A Love Story by David Wallace

Computational notebooks for data science have exploded in popularity in recent years, and there’s a growing consensus that the notebook interface is the best environment to communicate the data science process and share its conclusions. We’ve seen this growth firsthand; notebook support in Mode quickly became one of our most adopted features since launched in 2016.

This growth trend is corroborated by Github, the world’s most popular code repository. The amount of Jupyter (then called iPython) notebooks hosted on Github has climbed from 200,000 in 2015, to almost two million today. Data from the nbestimate repository shows that the number of Jupyter notebooks hosted on GitHub is growing exponentially:

This trend begs a question: What’s driving the rapid adoption of the notebook interface as the preferred environment for data science work?

Inspired by an Analog Ancestor

The notebook interface draws inspiration (unsurprisingly) from the research lab notebook. In academic research, the methodology, results, and insights from experiments are sequentially documented in a physical lab notebook. This style of documentation is a natural fit for academic research because experiments must be intelligible, repeatable, and searchable.

Read more

Resources for Data Science Job Seekers

February 12, 2018 | Sadavath Sharma — Analyst

Getting your first job in data science can be a full-time job all on its own. Simply finding a job post worth applying to can be a chaotic pursuit (though we’ve tried to make that part easier with the Mode Analytics Data Jobs Board (edited). Once you’ve found a job posting that looks like it could be a fit, you need to make sure you stand out from the crowd of other applicants.

As a data science job applicant, there are two stages to your search. First, you need to get an interview. To do that, you need documentation that you can fill the role. This is where your resume, your portfolio, and (unfortunately) your online presence come in. There are serious issues with looking up candidates on search engines, which range from creating unconscious bias to opening up murky legal situations, but it happens (not here at Mode though). For better or worse, it’s worth taking a quick look at your name’s search results to get a sense for what people might find.

Read more

Thinking in SQL vs Thinking in Python

July 7, 2016 | Benn Stancil — Chief Analyst at Mode

Over the years, I’ve used a variety of languages and tools to analyze data. As I think back on my time using each tool, I’ve come to realize that each encourages a different mental framework for solving analytical problems. Being conscious of these frameworks—and the behaviors they promote—can be just as important as mastering the technical features of a new language or tool.

I was first introduced to data analysis about ten years ago as a college student (my time studying the backs of baseball cards notwithstanding). In school, and later as a economics researcher, I worked almost exclusively in two tools—Excel and R—which both worked well with CSVs or Excel files downloaded from government data portals or academic sources.

Read more

101+ Infographic Tools And Resources

The below is a roundup of helpful tools, resources, and articles to create infographics that people love. Broken down into categories according to each stage of the infographic creation process, from brainstorming to distribution, so you can skip to the categories you might be most interested in.

Infographic Tools & Resources for Ideas/Inspiration

  1. Why ideas are the most important piece of an infographic: Find out what makes a great infographic idea.
  2. 16 exercises to come up with great infographic ideas: Our favorite tips and tricks.
  3. 5 ways to know if your idea will work: Framework to vet your ideas.
  4. Alltop: An aggregator of the Internet’s most popular stories.
  5. Answer the Public: Visualizations of the questions people ask Google.
  6. Brainpickings: An inventory of cross-disciplinary interestingness, spanning art, science, design, history, philosophy, and more.
  7. BuzzSumo: Insights on the most-shared content on any topic.
  8. Dadaviz: Charts on a variety of subjects.
  9. Daily Infographic: Design inspiration updated each day.
  10. Read more

Data Will Save Music

The writing is on the wall.
The music industry is dying.
Nobody buys music.
It’s the Wild West.
The last one might be true. But the rest? Not exactly.

In the Wild West, the winner of the shootout was always the one who was armed the best and able to take the best shot. Nowadays, artists and executives need to have that same kill or be killed attitude. It’s time to upgrade the arsenal.

Leonardo da Vinci left us with a quote that we can use to bridge the gap of this analogy:

“Principles for the Development of a Complete Mind: Study the science of art. Study the art of science. Develop your senses — especially learn how to see. Realize that everything connects to everything else.”

Science + Art. That’s the future of the music (and entertainment) industry.

Read more