How does data mining affect privacy?

You're casually browsing an online retailer one day, looking for new office furniture. The next thing you know, ads for that brand — showing the exact pieces you were looking at — start appearing on other websites you visit. A coincidence? Not quite. 

Many websites, including e-commerce sites, keep track of what you do and view when you're online. The sites can then use that information to advertise to you. Sometimes, you see marketing for products you've shown interest in. Other times, you're shown products the company thinks you might be interested in based on your past behavior.

Websites keep track of your activity online

If you feel unsettled by the idea that companies keep tabs on you, you're not alone. Tracking and targeting have become pertinent issues in recent years. In 2012, the New York Times published a story about how retailers use data to make predictions about customers. One big-name retailer even ventured predictions about whether or not its customers were pregnant — and sent them coupons for baby supplies. Brands, the article noted, must tread a fine line between privacy invasion and smart marketing. Their job is to use analytics on your shopping habits and demographics to present targeted, personalized offers you’ll genuinely want — in other words, brands need to be helpful, not invasive.

How do retailers and other companies get your information, anyway? Tracking consumer behavior and collecting information about customers is just part of the process. People provide information about themselves all the time through their internet behavior and transactions — and even some offline activity. Extracting that information is called data mining. Although data mining has its benefits, it also raises concerns about privacy and control.

Bottom line: You can keep your private data private and limit the information that gets into the hands of others.

What is data mining?

Who is doing the data mining, and what they're doing with the data, matters

With data mining, companies transform raw data into information they can use, which is why this activity is also called knowledge discovery in databases. Data mining has grown considerably in recent years.

On its own, data mining isn't "good" or "bad." What a company does with the information it gathers is what matters. The process can be used to predict meteorological trends, for example. It can also help companies make decisions that reduce waste or improve efficiencies. Companies can use data mining to detect fraud or to filter out spam.

Another important factor is who is doing the data mining. In the wrong hands, data mining can be risky for consumers. Bad actors can sell the information they glean or can use the information to blackmail or threaten others. In some cases, data mining that seems innocuous can actually pose a privacy concern. Social media companies, for example, collect a lot of data about their users. The companies can mine that data to pull out critical details about the individuals who use the networks, then sell that information.

Data mining relies on three disciplines to identify patterns and trends:

  • Statistics
  • Machine learning
  • Artificial intelligence (AI)

Statistics uses numbers to study the relationships between data, while machine learning relies on algorithms that make predictions based on what they've learned from a data set. AI provides the "what's next" component of data mining, which gives companies or individuals an idea of what makes the data relevant or valuable.

How does data mining work?

Data mining relies on algorithms to transform raw data into usable information. The methods it uses include the following: 

1. Association rules

Data mining is usually used to better understand people's buying and consumption habits

Association rules look for relationships between a data set's variables. The rules can frequently occur across a data set and often allow companies to understand how different things relate. One of the more common uses of association rules in data mining is to better understand people's buying and consumption habits. 

For example, it might be the case that female customers between the ages of 30 and 40 are more likely to purchase shampoo and conditioner at the same time. With that information, a retailer can promote conditioner to customers looking at shampoo or vice versa. 

Beyond shopping, there are other circumstances when using association rules might be beneficial. For instance, doctors can use association rules when diagnosing patients. A medical provider can compare a patient's current symptoms to similar symptoms experienced by others — and then record the patient’s diagnosis. As more and more cases are diagnosed, the algorithm can update itself to better reflect such associations, rendering the practice more and more accurate over time. Note that in this case, because of strict healthcare privacy laws, patient data would need to remain anonymous.

Recommendation engines also often rely on association rules. Services like Netflix and Hulu recommend shows and movies to viewers based on their watch histories and those of users with similar viewing habits.

2. Decision tree

A decision tree creates a visualization to assess a set of decisions. This tree-shaped chart is commonly used to develop regression or classification models. For example, a decision tree can help a retailer predict sales performance by evaluating a product's price as it relates to the average income of the target consumer. 

Another use of a decision tree in data mining occurs when financial institutions evaluate loan applications. A decision tree would classify applicant data  to determine whether the individual is a "safe" borrower, a "risky" borrower, or someone the bank shouldn't lend to at all.

3. K nearest neighbor (KNN)

KNN is a machine-learning process that classifies data points based on their association to other data and their proximity to that data. The method works under the assumption that similar data points will be located near each other. After calculating the distance between data points, KNN assigns each a category based on the average.

Recommendation engines might also use KNN to suggest titles to users. For example, if a Netflix viewer watched a horror movie, Netflix might recommend other titles to that person that fall under the horror genre. But if the horror film was directed by a female director, using KNN, Netflix might take the process a step further and recommend other movies directed by women.

4. Neural networks

Neural networks can improve market products, predict better prices, or increases sales figues

Neural networks are a form of AI that process data by imitating the connectivity of a human brain. The networks consist of layers of nodes, each consisting of a bias, weights, inputs, and outputs. When the output exceeds a certain threshold, it activates the node, sending data to the next layer. While the mechanisms behind this technology may sounds complex and confusing, here’s what you need to know in the context of data mining:  A neural network looks for patterns in big data sets. The information gleaned by the networks can help a company market products to consumers, predict optimal prices, or improve sales figures.

5. Data mining process

Before a company can use a data-mining algorithm or technique, it needs to get its hands on data to mine. For this reason, certain businesses specialize in gathering people's data. These companies might collect information from websites, voter polls, or public records. To transform the raw data into usable information, they usually follow this process:

Data collection

An individual or company that's going to perform data mining must first access relevant data. In some cases, data is freely given by individuals. For example, people give away a lot of information when they sign up for social media accounts. They often fill in their profiles with their location, age and birthday, alma mater, and children's names. 

In other circumstances, someone who wants to perform data mining needs to dig deeper to get relevant information. They might sort through public records or buy a data set from a company that tracks consumer behavior. 

Data management

Collected data can be saved in the cloud or on in local storage

Collected data must be stored and managed, and this often happens in the cloud or on a local server. 

Data analysis

Companies usually analyze the data using the algorithms or by the methods described above. How the data gets analyzed depends on a business’ goals and the reason for gathering data in the first place.

Data sorting

After analysis, a data miner needs to sort the data based on the results. Sorting the data allows the data miner to make sense of it and determine how to use it.

Data presentation

Once sorting is complete, the next step is to present the data. At this point, it's be cleaned up and organized in a way that makes its purpose clear. For example, a retailer might decide to price its new product at $20 based on data analysis. A marketer might send a consumer a coupon for baby formula based on an analysis of that individual's past purchases.

Data mining vs. big data

Big refers to the size, the speed, and the variety of available data

Big data, data breaches, and data mining all relate to the same topic, but they aren't the same things. The term "big data" is typically used to refer to an amount of data that is large enough to make it challenging, if not impossible, to analyze using traditional methods. The "big" in big data refers to the size of the data, the speed at which it travels, and the variety of available data types. 

The rise in popularity of internet-connected devices such as smart refrigerators, smart thermostats, and sensors with radio-frequency identification (RFID) means that a lot of data travels back and forth very quickly. There's a lot of different types of data floating around out there. Data mining techniques help companies process and make sense of big data.

A data breach can occur due to a vulnerability in big data or an issue with data mining. Legitimate forms of data mining aren't examples of data breaches. However, a bad actor can get access to data illegitimately and use data mining to clean it up for nefarious use.

How companies use data mining

Data mining has been widely adopted by organizations in various industries, from healthcare to retail and education to finance. Some of the ways companies use data mining include:

1. Education

Schools often use data mining to evaluate their student populations. A university or college can use it to see which courses are most in-demand and popular. The school can then add more sections of the most popular courses to its schedule while reducing or eliminating courses that have lower levels of enrollment. 

Schools can also use data mining to assess how students perform or evaluate what conditions lead to better performance. It could be that students enrolled in online courses end up with higher grades than students in on-campus courses, for example. The information allows a school to make decisions about the types and formats of classes it offers. 

2. Marketing

Data mining for marketing - upselling or cross-selling, creating customer loyalty programs, targeting particular demographics, email marketing, social media outreach

One of the biggest uses of data mining is in marketing. Thanks to the internet, companies collect a wealth of data about current and potential customers. A business can then use that data to target specific consumers or shape a particular customer's experience. Some of the ways a company might use data mining for marketing include:

  • Upselling or cross-selling
  • Creating customer loyalty programs
  • Targeting particular demographics
  • Email marketing
  • Social media outreach

3. Fraud detection

Data mining can help a company detect instances of fraud and take action. For example, data mining can flag fake or suspicious social media accounts, allowing a network to delete or shut down those accounts. Data mining can detect suspicious behavior or fraudulent social media practices such as buying followers, sending spam comments, or harassing other users. 

Financial institutions also often mine data to spot fake accounts or suspicious behavior. A credit card company might notice that a customer has suddenly and atypically rung up thousands of dollars in charges in just a few hours. The credit card company can then temporarily freeze the customer's account while it reaches out to them to verify whether fraud has occurred.

4. Operational improvement

Organizations can also use data mining to improve their operations. For example, data mining can help a company see where it's inefficient or where there tend to be slow-downs or delays. Thanks to data mining, a company can streamline its payroll or improve other processes to save time and money and better serve customers.

Data mining and consumer risk

Negligence in data collection can result in harm to companies and consumers

Data mining can be a positive thing on the whole, but it does pose some risks for consumers, particularly when it comes to who has access to their data and the information gleaned by data mining. For example, when retailers use data mining to analyze customer behavior, they can learn things that customers prefer to keep private. For instance, the retailer mentioned in the 2012 New York Times story used data to determine that a teenage girl was pregnant based on her purchase history. When the store sent the girl coupons for baby gear, her father, who did not know she was pregnant, saw the coupons and became upset.

In addition to companies collecting information about consumers that those consumers might prefer to keep to themselves, there's also the risk of the mined data falling into the wrong hands. Companies often collect hundreds of pieces of information about consumers, from their birthdays to their preferred brands. If that information were to fall into the wrong hands, it could be potentially disastrous for the affected consumers. 

Data mining privacy issues

Data mining can create multiple privacy issues, depending on the type of data collected. For example, if a medical practice were to mine patient data to help improve diagnosis and treatment, it would need to ensure that identifying patient information was not connected to the symptoms or conditions tracked.

Some consumers might find themselves wondering why companies need so much information about them. The more personal information a company has about people, the more attractive its data can be to bad actors. That increases the likelihood of a company being targeted for a data breach. 

There's also the question of what companies do with the data they collect and how much they know about consumers. When a person uses a credit card to buy something online, they might not realize how much information gets sent to the card company. The card issuer sees where the consumer purchased the item, the type of store they shopped at, and how much they spent. Some card issuers might use that information to target a consumer with ads or coupons without the consumer fully understanding what's going on.

As more everyday objects become connected to the "internet of things" and more objects collect data, there's the issue of what's being done with the data. There have been reports of fertility-tracking apps sharing confidential information with users' employers, for instance. Other apps might sell consumer data to third-party companies for marketing purposes. 

How to protect yourself from data mining risks

Companies should minimize data collection, and consumers can use restraint when sharing

The best way to protect your data and your customers' data from data mining risks is not to put the information out there in the first place. That can be easier said than done in a world that's become ever-more connected, though. Some consumers try to disconnect themselves from their data by not creating accounts at certain websites or with certain companies. 

As a consumer, if you do need to sign up for an account or provide personal information, one way to protect yourself is to provide as little detail as possible. If you don't need to reveal your birthday or income level, don't. The same is true for your contact information or details about your personal life. Even the little things, like your pet's names or your favorite color, could potentially be used by a bad actor to figure out a password or get unauthorized access to your account.

It's also worthwhile to familiarize yourself with any data protection laws. The Global Data Protection Regulation (GDPR) went into effect in 2018. It's one of the strictest privacy laws and applies to people who live in the European Union. The GDPR dictates what companies who collect data about EU residents can do with the data. It also outlines the notifications people need to receive. Most importantly, under the GDPR, people have to consent to having their data collected. 

Finally, carefully review the privacy policy of any website you visit. Knowing what a company is doing and why it's collecting your data can help you decide if it's worth the potential invasion of privacy.

How Box can help keep data safe

While you'll need to exercise restraint as you share information online, you can rest assured that any content stored in Box stays secure. With malware deep scans, frictionless security, and all the benefits of Box Shield, your files are protected while you get work done.

What's more, if you're managing business in the Content Cloud, Box Shield integrates with your existing security portfolio and guards any personal or confidential information you collect about your customers. Classify files based on security settings, detect suspicious behavior, get alerts about malware attacks, and take action to protect the privacy of your users or customers.

Keep your information secure with Box Shield

Learn more about what Box has to offer

No matter the task at hand, your content is valuable and needs to be protected. Box Shield gives you the tools you need to keep information safe.

Contact us today and discover a single, secure platform built for the way you work.

**While we maintain our steadfast commitment to offering products and services with best-in-class privacy, security, and compliance, the information provided in this blogpost is not intended to constitute legal advice. We strongly encourage prospective and current customers to perform their own due diligence when assessing compliance with applicable laws.

Free 14-day trial.
No risk.

Box free trial includes native e‑signatures, lets you securely manage, share and access your content from anywhere.

Try for free