First-Party Data Collection: What You Can Track and What You Cannot

Every time someone visits your website, they leave traces. Pages viewed. Buttons clicked. Items added to a cart. This is first-party data — information you collect directly from your own visitors, on your own site. It’s the most valuable data you have, and unlike third-party tracking, it doesn’t require shady workarounds or invasive surveillance.

But first-party data collection has boundaries. Some things you can track freely. Others require explicit consent. And a few things you simply shouldn’t track at all, no matter how useful they seem. In this guide, I’ll walk you through what counts as first-party data, what you can legally collect, and where the lines are drawn.

First-Party vs Second-Party vs Third-Party Data

Before diving into what you can track, let’s clarify the three types of data. This distinction matters because privacy regulations treat them very differently.

Data Type	Who Collects It	How It’s Collected	Examples	Privacy Risk
First-party	You, on your website	Directly from your visitors	Pageviews, form submissions, purchase history	Low (if handled correctly)
Second-party	A trusted partner	Shared through a data agreement	Co-marketing data, partner audience insights	Medium
Third-party	External companies	Cross-site tracking, cookies, ad networks	Browsing history across sites, ad profiles	High

First-party data is the gold standard for privacy-first analytics. You collected it. You know the context. Your visitors gave it to you — either explicitly (by filling out a form) or implicitly (by browsing your site). Third-party data, by contrast, is collected by someone else and often aggregated without meaningful consent.

The shift away from third-party cookies is making first-party data even more important. As browsers block cross-site tracking and regulations tighten, businesses that rely on first-party data are better positioned. For more context on this shift, see our guide on how tracking works without cookies.

What Counts as First-Party Data?

First-party data is any information collected directly through your own channels. It falls into two categories: behavioural data (what people do on your site) and declared data (what people tell you).

Behavioural Data (Observed)

Pageviews — which pages were visited, in what order, and for how long (see why pageviews still matter)
Sessions — groups of interactions within a time window (learn about sessions in web analytics)
Click events — button clicks, link clicks, downloads
Scroll depth — how far down a page someone reads
Referral source — how the visitor arrived (search, social, direct)
Device and browser type — screen size, operating system, browser version
Approximate location — country or city based on IP (without storing the IP)
Cart actions — items added, removed, or abandoned

Declared Data (Given by the Visitor)

Email address — from newsletter sign-ups or account creation
Name and contact details — from forms, checkouts, or support requests
Preferences — language, notification settings, product interests
Survey responses — feedback, NPS scores, product reviews
Purchase history — what someone bought, when, and for how much

Key Takeaway: Behavioural data is collected passively as visitors browse. Declared data is given actively when visitors fill out forms, make purchases, or create accounts. Both are first-party data, but they have different consent requirements.

What You Can Track Without Consent

Under most privacy frameworks — including GDPR — you can collect certain data without asking for consent, provided you do it properly. The key requirement is that the data must be anonymous or aggregated, meaning it cannot be tied back to a specific individual.

Here’s what typically falls into the consent-free zone:

Data Point	Consent Needed?	Conditions
Aggregate pageview counts	No	No individual identification
Referral source (aggregated)	No	Not linked to individual sessions
Browser/device type (aggregated)	No	Not used to fingerprint individuals
Country-level location	No	Derived without storing IP addresses
Page performance metrics	No	Technical data, not personal
Anonymous event counts	No	E.g., “42 people clicked the signup button”

This is exactly how privacy-first analytics tools like Plausible, Umami, and Fathom operate. They collect aggregate statistics without identifying individual visitors. No cookies. No IP storage. No consent banner needed.

For a deeper look at whether you need a cookie banner, check our guide: Do you actually need a cookie banner?

What Requires Consent

The moment you start collecting data that can identify — or could be used to identify — an individual, you enter consent territory. Under GDPR, this is called personal data. Under other frameworks like the Australian Privacy Act, similar rules apply.

Data that requires consent includes:

Email addresses — even for analytics purposes
Full IP addresses — these are personal data under GDPR
User IDs or account identifiers — anything that links activity to a person
Cookies that track individuals — persistent cookies, session cookies for tracking
Device fingerprints — combining browser, screen, and OS data to identify someone
Cross-site tracking data — following users between different websites
Location data (precise) — GPS or neighbourhood-level location

Warning: Storing full IP addresses counts as personal data under GDPR — even if you never look at them. Either hash, truncate, or discard IPs entirely. Most privacy-first tools do this automatically.

What You Should Never Track

Some data is either illegal to collect, or so risky that no analytics benefit justifies it:

Health information — unless you’re a health provider with appropriate safeguards
Racial or ethnic origin
Political opinions or religious beliefs
Biometric data — facial recognition, fingerprint data
Data about children — under-13s (or under-16s in some jurisdictions) require parental consent
Passwords or payment card numbers — never log these in analytics events

Under GDPR Article 9, these are called special category data and have the strictest protections. Even accidental collection — like a form field that captures health information in free text — can create liability.

How Privacy-First Tools Handle First-Party Data

Privacy-first analytics tools are designed to collect useful first-party data while staying within legal boundaries. Here’s how they do it:

Technique	What It Does	Used By
IP hashing/discarding	Converts or removes IP addresses before storage	Plausible, Fathom, Umami
No cookies	Avoids persistent identifiers entirely	Plausible, Fathom, Umami, GoatCounter
Session estimation	Uses hashed, rotating identifiers instead of cookies	Plausible, Fathom
Aggregation	Stores only totals, not individual records	GoatCounter
Data minimisation	Collects only what’s needed for the report	All privacy-first tools

The result is that you get meaningful analytics — traffic sources, top pages, visitor counts, and trends — without ever storing data that could identify a specific person. For a complete guide to setting this up properly, read our privacy-compliant tracking guide.

Practical Advice for Small Businesses

If you’re running a small business website, here’s a straightforward approach to first-party data collection:

Use a cookie-free analytics tool — Plausible, Fathom, or Umami will give you traffic stats without consent requirements
Keep form data separate from analytics — your email sign-up form and your analytics tool don’t need to share data
Don’t track what you won’t use — if you’ll never look at scroll depth data, don’t collect it
Write a clear privacy policy — explain what you collect and why, even if consent isn’t required
Set data retention limits — don’t keep analytics data forever. 12–24 months is usually enough
Review your forms — make sure you’re not accidentally collecting data you don’t need

I worked with a small e-commerce client in Brisbane who was collecting 23 data points on their checkout form. After an audit, we reduced it to 8. Conversion rates went up, privacy risk went down, and they had cleaner data in their analytics. Less really is more.

The Future of First-Party Data

As third-party cookies disappear and privacy regulations expand, first-party data becomes your most reliable source of insight. Businesses that build strong first-party data practices now will have a significant advantage.

This doesn’t mean tracking more. It means tracking better. Collecting the right data, with clear consent where needed, and turning it into action. A small, clean dataset you actually use is worth more than a massive data warehouse you never open.

Frequently Asked Questions

Is first-party data always compliant with GDPR?

Not automatically. First-party data can include personal data (like email addresses), which requires a legal basis for processing under GDPR. Anonymous, aggregated first-party data — like pageview counts — doesn’t require consent. But the moment you can link data to an individual, GDPR applies.

Can I use first-party data for remarketing?

Yes, but only with proper consent. If someone gives you their email address via a form and you want to send them marketing emails, you need explicit opt-in consent. The data may be first-party, but using it for marketing beyond its original purpose requires additional permission.

What’s the difference between first-party data and first-party cookies?

First-party data is the information itself — pageviews, purchases, form submissions. First-party cookies are a mechanism for collecting some of that data. A first-party cookie is set by your domain (not a third party), but it can still track individuals and may require consent depending on its purpose.

Do privacy-first analytics tools collect first-party data?

Yes. Tools like Plausible, Fathom, and Umami collect first-party behavioural data — pageviews, referral sources, device types. However, they do it without cookies and without storing personal identifiers. The data they collect is anonymous by design, which is why they don’t require consent banners.

How long should I keep first-party data?

GDPR’s data minimisation principle says you should keep data only as long as necessary. For analytics, 12–24 months is a common retention period. For declared data like customer emails, keep it as long as the relationship is active plus any legal retention requirements. Set automatic deletion policies where possible.

The Bottom Line

First-party data collection is the foundation of ethical, effective analytics. You can learn a tremendous amount about your visitors — what they want, where they struggle, what makes them convert — without ever crossing a privacy line. The key is knowing the difference between anonymous behavioural data and personal identifiers, and treating each appropriately.

Start with a privacy-first tool. Track what matters. Ask for consent when you need personal data. And remember: the goal isn’t to know everything about your visitors. It’s to know enough to serve them better.