Traits of meaningful data.

Sanyukta Suman
Analytics Vidhya
Published in
3 min readNov 16, 2020

--

Photo by Markus Spiske on Unsplash

The process of data analysis to be fruitful, the data ought to have certain attributes; low quality data will give us scarce insight. The higher the quality of the data, the more noteworthy the potential for revelation.

Take a few moments to build a list of the characteristics that you want think can make data most useful for analysis.

According to the Stephen Few, the author of Now you see it. The characteristics of ideal data should be:

  1. High Volume
  2. Historical
  3. Consistent
  4. Multivariate
  5. Atomic
  6. Clean
  7. Dimensionally structured

Let’s see each of these characteristic, one at a time:

  1. High Volume

The more information that is available to us, the more likely it is that we will have what we need while pursuing specific questions or just searching for patterns that are important.

2. Historical

When choosing data, much insight is gained from examining how information has changed through time. The more historical information that is available, the more we can make sense of the present by seeing the evolving pattern. Even when we focus on what is going on right now with the data, to know its background story help the analyst gain more insight from the data.

3. Consistent

Things change over time, and when they do, data also changes with the situation. A good example of this is ever changing data of stock market. If data such as revenue have not been adjusted to reflect these changes, an examination of data will be complicated and incomprehensible. It is usually best to constrained the data which reflects the purpose of the problem definition.

4. Multivariate

We can examine two types of data- quantitative and qualitative variables. Variables is an aspect of something that changes (meaning vary). Variables are two types- quantitative — expressed as numbers and categorical — expressed as words. When trying to figure the answer for our proposed question, we need to expand the number of variable we are examining. The more variable we have as a data, the richer our opportunity to make sense of data.

5. Atomic

Most of the study includes information that has been aggregated at a far more summerized or generalized level. At times, however, we need information at the finest level of detail possible. For eg, if we are a text analyst, we spend much of our time analyzing the data, translating sentences to number formats, but we forget about the emotional component of sentences. Therefore, atomic means to specify the data down to the lowest level, so that we understand what's going on.

6. Clean

The quality of our research can never be higher than the quality of our data. We cannot draw a reliable conclusion that depends on unformatted, dirty data. Successful business decisions cannot be made with inaccurate, incomplete or misleading data. People need data that they can trust to be reliable and clean so that business goals and objectives can be further explored.

7. Dimensionally structured

I have tried to understand the data that is expressed in unfamiliar dimensions. It is frustrating, discouraging, and sometime waste of time, if data is not well structured. Human senses are constrained to view the world in three dimensional perspective, the tools and graph which we have implemented so far is only comprehensible when it is presented in three dimensional structure. When data is structured, it is easier to understand or make the software understand it.

--

--

Sanyukta Suman
Analytics Vidhya

Engineer + Loves Computer Vision, ML, Programming, Robotics and Technology. https://sanyuktasuman.com.np