Welcome to my Blog

The Data Post

Start Reading

What is the future of analytics?

6 trends shaping the future of data analytics

Organizations are demanding more from their data analytics efforts, wanting immediate insights that will help drive business decisions. In response, many are adopting new technologies such as machine learning, deep learning and natural language processing. Eric Mizell, vice president of global solution engineering at Kinetica, discusses what role these technologies will plan in shaping the future of data analytics.

The algorithmic economy comes of age

“Organizations are dealing with a tsunami of data,” Mizell stresses. “The speed, size, shape of data generated by newer sources such as sensors, mobile apps, social media, machine logs, and connected devices far outpaces the ability of current systems and humans to comprehend, draw insights, and act on data. Organizations should look at algorithmic approaches such as machine learning, deep learning, and natural language processing (NLP) to automate insight discovery at scale.” 

Data and analytics architectures evolve

“Data and analytics architectures must evolve for the hybrid world,” Mizell says. “Cloud and on-premise, data in motion and at rest, transactional and analytic databases, in-memory and spinning disk, real-time and batch, AI and BI – all need to co-exist and interoperate. Organizations must look to bring together workload-specific, complementary analytic solutions to analyze all data, gain insights, and act. They must look at open, standard-based solutions that use APIs, micro-services, programming languages, and connectivity to seamlessly integrate with existing infrastructure and deliver business value while preserving existing investments.”

Need for speed

“From high-speed Internet to 5G networks to high-speed trading, ‘speed’ is a critical element for business success,” Mizell confirms. “Customers demand instant gratification and enterprises need fresh data and real-time insights to deliver business value. Gone are the days of nightly batch processing and waiting for hours or days to get answers to critical business questions. Technology executives need to build real-time analytic pipelines to simultaneously ingest, analyze, visualize, and act on data in motion and at rest and deliver fresh, timely insights to capitalize on fast-moving business opportunities.”

GPUs vs. CPUs

“CPUs have been the workhorse of business applications for decades,” Mizell says. “However, big data’s volume, variety, and velocity–coupled with shrinking insight shelf life–require organizations to investigate other technologies to address the compute bottleneck. GPUs, with thousands of processing cores per chip vs. 16 to 32 for CPUs, have emerged as the “go to” alternative to process complex data at scale. Organizations must investigate GPU-based analytic technologies that deliver performance, flexibility, and ease-of-use to modernize the analytic infrastructure.”

From control to collaboration

“Data and analytics must be pervasively available across the organization for maximum business value,” Mizell stresses. “Everyone in an organization– data scientists, business analysts, and business users regardless of their technical skills–must have fast, easy, and self-service access to data and analytics for data-insight-driven decision making. Organizations need to adopt analytic technologies that democratize analytics, data science, and machine learning to establish a data-insight-driven culture. Analytic technologies must be flexible to balance analytic innovation with guard rails of security, scalability, and availability.”

The impact of the Internet of Things

“The Internet of Things (IoT) will fundamentally transform how organizations do data and analytics,” Mizell says. “With the nexus of people, devices, and data, IoT will have a profound impact on every industry and every line of business. Organizations will have to figure out how to sense, interpret, and respond to data in motion and rest, in real-time and at scale. Organizations must evolve their data and analytics architectures to seamlessly ingest the tsunami of IoT data, combine it with data at rest for contextual insights, and act in real-time and at scale to maximize business value. They must look at analytic solutions that deliver exponential scale and flexibility to manage the IoT data cost-effectively.”

This article first appeared on “Information-Management.com

Why a DNA data breach is much worse than a credit card leak

You can’t change your DNA
By Angela Chen@chengela Jun 6, 2018, 3:54pm EDT

This week, DNA testing service MyHeritage revealed that hackers had breached 92 million of its accounts. Though the hackers only accessed encrypted emails and passwords — so they never reached the actual genetic data — there’s no question that this type of hack will happen more frequently as consumer genetic testing becomes more and more popular. So why would hackers want DNA information specifically? And what are the implications of a big DNA breach?

One simple reason is that hackers might want to sell DNA data back for ransom, says Giovanni Vigna, a professor of computer science at UC Santa Barbara and co-founder of cybersecurity company Lastline. Hackers could threaten to revoke access or post the sensitive information online if not given money; one Indiana hospital paid $55,000 to hackers for this very reason. But there are reasons genetic data specifically could be lucrative. “This data could be sold on the down-low or monetized to insurance companies,” Vigna adds. “You can imagine the consequences: One day, I might apply for a long-term loan and get rejected because deep in the corporate system, there is data that I am very likely to get Alzheimer’s and die before I would repay the loan.”

Read more

MIT fed an AI data from Reddit, and now it only thinks about murder

Norman is a disturbing demonstration of the consequences of algorithmic bias
By Bijan Stephen Jun 7, 2018, 11:11am EDT

For some, the phrase “artificial intelligence” conjures nightmare visions — something out of the ’04 Will Smith flick I, Robot, perhaps, or the ending of Ex Machina — like a boot smashing through the glass of a computer screen to stamp on a human face, forever. Even people who study AI have a healthy respect for the field’s ultimate goal, artificial general intelligence, or an artificial system that mimics human thought patterns. Computer scientist Stuart Russell, who literally wrote the textbook on AI, has spent his career thinking about the problems that arise when a machine’s designer directs it toward a goal without thinking about whether its values are all the way aligned with humanity’s.

A number of organizations have sprung up in recent years to combat that potential, including OpenAI, a working research group that was founded (then left) by techno-billionaire Elon Musk to “to build safe [AGI], and ensure AGI’s benefits are as widely and evenly distributed as possible.” What does it say about humanity that we’re scared of general artificial intelligence because it might deem us cruel and unworthy and therefore deserving of destruction? (On its site, Open AI doesn’t seem to define what “safe” means.)

This week, researchers at MIT unveiled their latest creation: Norman, a disturbed AI. (Yes, he’s named after the character in Hitchcock’s Psycho.) They write:

Read more

Saudi Arabia’s Global and Regional Economy Shifts in 2018

By Sahil Bali

Sаudі Arаbіа іѕ аn Arab ѕtаtе іn Wеѕtеrn Aѕіа соnѕtіtutіng thе bulk оf thе Arabian Peninsula. Wіth a lаnd area оf аррrоxіmаtеlу 2,150,000 km2 (830,000 ѕԛ mі), Sаudі Arаbіа іѕ gеоgrарhісаllу the second-largest state in thе Arаb wоrld аftеr Algeria. Sаudі Arаbіа is home tо the rеlіgіоn’ѕ 2 mоѕt ѕасrеd mоѕԛuеѕ: Mаѕjіd аl-Hаrаm, іn Mесса, the dеѕtіnаtіоn оf thе аnnuаl Hаjj pilgrimage, аnd Mеdіnа’ѕ Masjid an-Nabawi, thе burial ѕіtе оf the рrорhеt Muhammad. Rіуаdh, thе саріtаl, is a ѕkуѕсrареr-fіllеd metropolis. Saudi Arаbіа іѕ a dеѕеrt соuntrу encompassing most оf the Arаbіаn Peninsula, wіth thе Rеd Sеа and Pеrѕіаn Gulf соаѕtlіnеѕ. Nаturе hаѕ gіftеd this Arаb state wіth rісh оіl resources, соntrоllіng thе second lаrgеѕt оіl reserves in thе world.

Read more

9 Useful R Data Visualization Libraries for Any Discipline

By Asha Hill — Customer Success Analyst at Mode

If you’ve visited the CRAN repository of R packages lately, you might have noticed that the number of available packages has now topped a dizzying 12,550. This means there are packages for practically any data visualization task you can imagine, from visualizing cancer genomes to graphing the action of a book.

For new R coders, or anyone looking to hone their R data viz chops, CRAN’s repository may seem like an embarrassment of riches—there are so many data viz packages out there, it’s hard to know where to start.

To provide one path through the labyrinth, today we’re giving an overview of 9 useful interdisciplinary R data visualization packages. We’ve noted the ones you can take for a spin without the hassle of running R locally, using Mode R Notebooks.

Read more

Python for Big Data Analytics and the Role of R

Two Popular Open-Source Programming Languages to Consider for Your Data Science Toolkit
R and Python are two very popular open-source programming languages for data analysis. Frequently, users debate as to which tool is more valuable, however both languages offer key features and can be used to complement one another. A common perception is that R offers more depth when it comes to data analysis, data modeling and machine learning, but Python is easier to learn and tends to present graphs in a slightly more polished way.1,2 Using the interface Python offers for calling R allows users to reap the benefits of both of these powerful, popular tools for data science. Even if you choose not to combine the two, the different ways in which these two languages are valuable make them both important parts of a data science toolkit.

Why Python?
Read more

The (Data Science) Notebook: A Love Story by David Wallace

Computational notebooks for data science have exploded in popularity in recent years, and there’s a growing consensus that the notebook interface is the best environment to communicate the data science process and share its conclusions. We’ve seen this growth firsthand; notebook support in Mode quickly became one of our most adopted features since launched in 2016.

This growth trend is corroborated by Github, the world’s most popular code repository. The amount of Jupyter (then called iPython) notebooks hosted on Github has climbed from 200,000 in 2015, to almost two million today. Data from the nbestimate repository shows that the number of Jupyter notebooks hosted on GitHub is growing exponentially:

This trend begs a question: What’s driving the rapid adoption of the notebook interface as the preferred environment for data science work?

Inspired by an Analog Ancestor

The notebook interface draws inspiration (unsurprisingly) from the research lab notebook. In academic research, the methodology, results, and insights from experiments are sequentially documented in a physical lab notebook. This style of documentation is a natural fit for academic research because experiments must be intelligible, repeatable, and searchable.

Read more

Resources for Data Science Job Seekers

February 12, 2018 | Sadavath Sharma — Analyst

Getting your first job in data science can be a full-time job all on its own. Simply finding a job post worth applying to can be a chaotic pursuit (though we’ve tried to make that part easier with the Mode Analytics Data Jobs Board (edited). Once you’ve found a job posting that looks like it could be a fit, you need to make sure you stand out from the crowd of other applicants.

As a data science job applicant, there are two stages to your search. First, you need to get an interview. To do that, you need documentation that you can fill the role. This is where your resume, your portfolio, and (unfortunately) your online presence come in. There are serious issues with looking up candidates on search engines, which range from creating unconscious bias to opening up murky legal situations, but it happens (not here at Mode though). For better or worse, it’s worth taking a quick look at your name’s search results to get a sense for what people might find.

Read more

Thinking in SQL vs Thinking in Python

July 7, 2016 | Benn Stancil — Chief Analyst at Mode

Over the years, I’ve used a variety of languages and tools to analyze data. As I think back on my time using each tool, I’ve come to realize that each encourages a different mental framework for solving analytical problems. Being conscious of these frameworks—and the behaviors they promote—can be just as important as mastering the technical features of a new language or tool.

I was first introduced to data analysis about ten years ago as a college student (my time studying the backs of baseball cards notwithstanding). In school, and later as a economics researcher, I worked almost exclusively in two tools—Excel and R—which both worked well with CSVs or Excel files downloaded from government data portals or academic sources.

Read more

101+ Infographic Tools And Resources

The below is a roundup of helpful tools, resources, and articles to create infographics that people love. Broken down into categories according to each stage of the infographic creation process, from brainstorming to distribution, so you can skip to the categories you might be most interested in.

Infographic Tools & Resources for Ideas/Inspiration

  1. Why ideas are the most important piece of an infographic: Find out what makes a great infographic idea.
  2. 16 exercises to come up with great infographic ideas: Our favorite tips and tricks.
  3. 5 ways to know if your idea will work: Framework to vet your ideas.
  4. Alltop: An aggregator of the Internet’s most popular stories.
  5. Answer the Public: Visualizations of the questions people ask Google.
  6. Brainpickings: An inventory of cross-disciplinary interestingness, spanning art, science, design, history, philosophy, and more.
  7. BuzzSumo: Insights on the most-shared content on any topic.
  8. Dadaviz: Charts on a variety of subjects.
  9. Daily Infographic: Design inspiration updated each day.
  10. Read more