Understanding Your Results

Data Quality Dimensions Glossary

Definitions, scoring methodology, and industry context for each of the seven data quality dimensions MarketingSoda Refine evaluates.

Overview

MarketingSoda Refine™ evaluates your HubSpot data across seven quality dimensions. This glossary defines each dimension, explains how it is scored, and provides context for why each one matters for B2B marketing and sales operations.


The Seven Dimensions

1. Completeness

Definition: The degree to which all expected fields in a contact record are populated with values.

How it is scored: For each contact, the scoring engine checks 14 standard properties (first name, last name, email, phone, job title, company, city, state, country, LinkedIn URL, seniority, department, created date, last modified date). The completeness score reflects the percentage of these fields that contain non-empty values.

Why it matters: Incomplete records cannot be properly segmented for campaigns, personalized for outreach, or routed to the right sales rep. A contact missing a job title cannot be prioritized by seniority. A contact missing a company name cannot be matched to an account.

Industry context: Completeness is one of the six primary dimensions defined by the Data Management Association (DAMA) framework and ISO 8000 international standard.


2. Accuracy

Definition: The degree to which data correctly represents the real-world entity it describes.

How it is scored: The scoring engine performs cross-reference checks on field values, including email format validation, internal consistency between related fields, and pattern detection for obviously incorrect values.

Why it matters: Inaccurate data leads to bounced emails, failed phone connections, incorrect company attribution, and flawed reporting. When sales reps act on inaccurate data, they waste time and damage credibility.

Industry context: Accuracy is universally recognized across DAMA, ISO 8000, ISO/IEC 25012, and Gartner frameworks as a fundamental quality dimension.


3. Freshness

Definition: How recently a contact record has been created or updated, indicating whether the data is likely still current.

How it is scored: The scoring engine evaluates the lastmodifieddate property. Records updated within the last 3 months score highest. Records not updated in 6+ months receive progressively lower scores. Very stale records (12+ months without updates) score near zero for freshness.

Why it matters: B2B contact data decays at approximately 34% per year. Job titles change (65.8% annual change rate), people switch companies, phone numbers disconnect (42.9% annual change), and email addresses become invalid (37.3% annual change). Stale data means you are reaching out to the wrong people.

Industry context: Called "Timeliness" in DAMA and ISO 8000, and "Currentness" in ISO/IEC 25012.


4. Validity

Definition: The degree to which data values conform to expected formats, ranges, and patterns.

How it is scored: The scoring engine checks whether field values match expected formats. For example: does the email field contain a properly formatted email address? Does the phone field contain a plausible phone number? Are dates in recognizable formats?

Why it matters: Invalid data fails at the point of use. Malformed emails bounce. Invalid phone numbers waste calling time. Incorrectly formatted addresses cannot be geocoded or used for direct mail. Invalid data also breaks automation rules that depend on specific field formats.

Industry context: Validity is defined in DAMA, ISO 8000, and maps to "Compliance" in ISO/IEC 25012.


5. Consistency

Definition: The degree to which data values follow standardized patterns and formats across the entire database.

How it is scored: The scoring engine evaluates whether the same type of information is represented the same way across all contacts. For example: are state fields consistently abbreviated ("CA") or spelled out ("California")? Are country names always in the same format? Are phone numbers formatted consistently?

Why it matters: Inconsistent data breaks segmentation and reporting. If half your contacts use "United States" and the other half use "US" or "USA," any list filter or report on country will produce incomplete results. Inconsistency also causes duplicate detection to fail.

Industry context: Consistency is defined in all major data quality frameworks (DAMA, ISO 8000, ISO/IEC 25012) and maps to "Matching/Linking" in Gartner's framework.


6. Uniqueness

Definition: The degree to which each contact in the database represents a distinct individual, without duplicates.

How it is scored: The scoring engine compares email addresses across all contacts in the scanned sample. The uniqueness score equals (unique emails / total emails) * 100. Contacts sharing the same email address are counted as duplicates.

Why it matters: Duplicate contacts waste enrichment credits, create confusing experiences for sales reps (multiple reps contacting the same person), inflate contact count metrics, and cause incorrect reporting. Marketing emails sent to duplicates can trigger spam filters and damage sender reputation.

Industry context: Uniqueness is defined in DAMA and ISO 8000 frameworks. In Gartner's framework, it maps to "Matching/Merging" capabilities.

Note: The free audit checks for email-based duplicates only. MarketingSoda Refine's full platform (coming soon) uses a five-layer matching engine that detects duplicates missed by email-only matching, including fuzzy name matching, phonetic matching, nickname resolution, domain matching, and probabilistic scoring.


7. Enrichment Coverage

Definition: The degree to which contact records contain professional enrichment data beyond basic identity fields.

How it is scored: The scoring engine checks for the presence of enrichment-specific fields: LinkedIn URL, job title, company name, seniority level, and department. Higher coverage of these fields results in a higher score.

Why it matters: Enrichment data enables better targeting, segmentation, and personalization. Without job titles, you cannot segment by role. Without company names, you cannot do account-based marketing. Without seniority levels, you cannot prioritize executive contacts. Enrichment coverage directly affects the quality of your marketing campaigns and sales outreach.

Industry context: Enrichment Coverage is a novel dimension introduced by MarketingSoda Refine. It is not present in traditional data quality frameworks (DAMA, ISO 8000, ISO/IEC 25012) because those frameworks predate the modern B2B data enrichment ecosystem. It can be considered an extension of Completeness that specifically measures externally-sourced professional data.


Scoring Scale

All dimensions use the same 0-100 scoring scale:

Score RangeLetter GradeMeaning
90-100A (Excellent)Dimension is in top shape
75-89B (Good)Minor gaps, generally healthy
60-74C (Fair)Notable issues affecting performance
40-59D (Poor)Significant problems requiring attention
0-39F (Failing)Critical issues, immediate action needed

Severity Levels

When a dimension scores below 80, it is flagged as an issue with a severity level:

SeverityScore RangeWhat It Means
HighBelow 40This dimension is critically impacting your operations
Medium40-59Noticeable negative effects on campaigns and reporting
Low60-79Room for improvement but not causing major problems

Composite Score

The composite score is the average of all individual contact scores. Each contact receives a composite score based on the weighted average of its applicable dimension scores, and the overall composite is the mean across all contacts.


Additional Glossary Terms

Composite Score

The overall data quality score (0-100) representing the average quality across all scanned contacts and all applicable dimensions.

Letter Grade

A single-letter summary of the composite score: A (90+), B (75-89), C (60-74), D (40-59), F (below 40).

Grade Distribution

A breakdown showing how many contacts received each letter grade. Displayed as a bar chart in the report.

Top Issues

The dimensions with the lowest scores (below 80), ranked from worst to best. Up to five are highlighted in the report.

Recommendations

Specific, actionable steps to improve each flagged dimension, prioritized by expected impact.

Records Scanned

The total number of HubSpot contacts analyzed during the scan (up to 500 for the free audit).

Report ID

A unique identifier (UUID) for your report, used in the report URL. Report IDs are random and non-sequential to prevent guessing.

Report Expiry

Reports are available for 12 months from the scan date. After expiry, the report shows an "expired" message with an option to run a new scan.