🔒

Your data never leaves your device. This tool runs entirely in your browser. No files are uploaded, transmitted, or stored on any server — not even ours. You can verify this by opening your browser’s Network tab while processing a file: zero data requests are made.

Use offline anytime

For added peace of mind, download a standalone version of this tool — one file, no internet required, opens in any browser.

⬇ Download

Replace sensitive columns with realistic substitute values

Everything runs in your browser — no data is uploaded or sent anywhere.

1 · Upload

Drop any CSV, TSV, or Excel file. Multi-sheet files let you choose which tab to use.

2 · Choose columns

Sensitive columns are auto-flagged. Toggle each on or off, and choose what replacement type to use for each column.

3 · Download

Get the anonymized file, plus an optional mapping reference that links anonymized values back to originals.

Upload Your File

Drop your file here or click to browse

Supports CSV, TSV, XLS, XLSX

Build a realistic dataset from scratch

No real data required — everything runs in your browser.

1 · Name your columns

Type column names or pick a preset template — data types are detected automatically.

2 · Set row count

Choose how many rows to generate. Preview a sample before downloading.

3 · Download

Export as CSV or Excel. Useful for demos, testing pipelines, or training data.

Quick Add from Column List

Type your column names separated by commas — the tool will detect the right data type for each one automatically.

or press Cmd/Ctrl + Enter

Try: sales transactions · support tickets · web analytics · patient records

Or Start with a Template (optional)

Load a pre-built column schema, then tweak as needed.

Refine Columns

Edit names, change types, or tweak options. Sample values update live.

Rows to generate:

No columns defined yet. Add a column below or choose a template above.

🔒 Full access required. This module is available in the full version of DataPHIX. Learn more →

What are you trying to do?

Choose the option that fits your situation.

🔒 Full access required. This module is available in the full version of DataPHIX. Learn more →

Compare two files and surface every difference

Budget vs. actuals, this month vs. last, CRM vs. survey — nothing leaves your browser.

1 · Upload two files

Drop File A and File B — CSV, TSV, or Excel. They should share at least one common key column.

2 · Choose a match key

Pick the column that uniquely identifies each record. The tool scores all candidates and recommends the best one.

3 · Review & export

See matched, different, and unmatched rows with variance analysis. Export a full Excel report.

A File A

Your reference file — the one you consider authoritative.

Drop File A here or click to browse

Supports CSV, TSV, XLS, XLSX

B File B

The file you want to check against File A.

Drop File B here or click to browse

Supports CSV, TSV, XLS, XLSX

Upload

Choose Tool

Configure

Download

📐

This file looks like it might be in wide format — you may need to reshape it before merging or reconciling.

Wide format means each time period, category, or measurement is its own column (e.g. Jan_Hours, Feb_Hours, Mar_Hours). Most merge and reconcile tools expect long format, where there is one row per observation and a single column for the value.

📊 Wide (before)

Name	Jan_Hrs	Feb_Hrs	Mar_Hrs
Alice	120	130	110
Bob	90	100	95

→
Reshape:
Wide→Long

📋 Long (after)

Name	Month	Hours
Alice	Jan_Hrs	120
Alice	Feb_Hrs	130
Bob	Jan_Hrs	90

💡 Use the Reshape: Wide → Long tool below to fix this automatically. Then download and re-upload before merging.

What do you want to do?

Pick a tool — after previewing the result you can apply it and run another tool on the same data.

Fix Header Row

Choose which row is the real header and remove title rows or metadata rows above it. Optionally merge two-row headers into single column names.

e.g. rows 1–2 are a report title, row 3 is the actual column header

Remove Duplicates

Find and remove repeated rows. Choose which columns define a "duplicate" or check all columns at once.

e.g. same email submitted twice

Standardize Values

Auto-scans your data and flags issues — trim spaces, fix capitalization, normalize dates, clear null variants, and more.

e.g. "N/A", "n/a", "NA" → blank

Filter Rows

Keep only the rows that match a condition. Exports a clean filtered file — original is unchanged.

e.g. only rows where Region = "West"

Reshape: Wide → Long

Turn multiple column headers into rows. Useful when each column represents a time period, question, or category.

e.g. Jan / Feb / Mar columns → one row each

Combine Columns

Merge two or more columns into one. Choose a separator and optionally apply title case to the result.

e.g. First Name + Last Name → Full Name

Add Computed Column

Create a new column from two existing columns using maths: add, subtract, multiply, divide, or percentage.

e.g. Actuals ÷ Forecast × 100 → Delivery%

Group & Aggregate

Roll up rows by grouping columns and summing, averaging, or counting numeric columns. Converts weekly rows to monthly totals, for example.

e.g. Weekly timesheets → Monthly hours by Person + Project

Standardize Values

Your data is scanned automatically. Review detected issues, queue fixes, and apply them all in one pass.

📅 Target date format:

Issues detected

Add fix for any column

Column

Fix type

Fix plan

No fixes queued yet.
Add fixes from the issue list or manually.

Reshape: Wide → Long (Unpivot)

Convert columns that represent time periods, categories, or repeated measurements into individual rows — making the data ready for merging, filtering, or charting.

📖 What does Wide → Long mean?

WIDE FORMAT (before)

Each time period or category is its own column. Easy to read but hard to filter or join to other data.

Name	Jan_Hrs	Feb_Hrs	Mar_Hrs
Alice	120	130	110
Bob	90	100	95

→

LONG FORMAT (after)

One row per observation. Filter by Month, join to other data, build charts — all easy now.

Name	Month	Hours
Alice	Jan_Hrs	120
Alice	Feb_Hrs	130
Bob	Jan_Hrs	90
Bob	Feb_Hrs	100

✅ After reshaping: you can filter by Month = "Jan_Hrs", or join to a forecast file that also has a Month column.

How to use it: Check the ID Columns that stay fixed (e.g. Name, Department) and the Value Columns to unpivot (e.g. Jan_Hrs, Feb_Hrs, Mar_Hrs). Each value column becomes a separate row with a variable name and its value.

ID Columns (stay as columns)

These repeat on every output row

Value Columns (become rows)

Each becomes one row per original record

Variable column name

Value column name

Result

Each row below represents one duplicate group where conflicting values were resolved. For each conflicting field the chosen value and its source row are shown.

These duplicate groups had no conflicting values — all copies agreed or only differed by blank vs. filled. The identifier values listed here are rows you could safely combine in your source system.

▶ Prepare for export

Include/exclude columns, rename them, or drag the order using ↑↓. Changes apply only at download time — your working data is unchanged.

👆 Click a workflow tab above to get started!

Common Data Concerns

Common data quality issues and how to resolve them before merging or reconciling.

Formatting Issues

Trailing and leading whitespace

"Sarah Chen" and "Sarah Chen " are not equal to a computer. Extra spaces are invisible on screen but break every exact match — the most common cause of silently dropped rows in a merge.

Fix: Clean & Shape → Standardize → Trim whitespace

Inconsistent capitalisation

"OPERATIONS", "Operations", and "operations" are three different values in an exact match. Common when data comes from multiple input systems or manual entry.

Fix: Clean & Shape → Standardize → Fix capitalisation

Mixed date formats

One system exports 01/15/2024, another exports 2024-01-15. When sorted or compared as text, dates produce wrong results. Numeric comparisons on text dates fail silently.

Fix: Clean & Shape → Standardize → Normalise dates to ISO (YYYY-MM-DD)

Numbers stored as text

A column containing "1,200" or "$450.00" looks numeric but is a string. Sums return zero, comparisons fail, and the column sorts alphabetically instead of numerically.

Fix: Clean & Shape → Standardize → Strip currency/comma symbols, or use a Computed Column to cast values

Structure Issues

Wide format (pivoted data)

Data where time periods or categories are spread across columns — e.g., Jan, Feb, Mar as separate column headers. This format is readable but cannot be filtered, grouped, or joined. Most BI tools and merge operations require long format.

Wide (before)

Name	Jan	Feb
Alice	120	130

→

Long (after)

Name	Month	Hours
Alice	Jan	120
Alice	Feb	130

Fix: Clean & Shape → Reshape (Unpivot)

Header row is not row 1

Some exports include report titles, export dates, or blank rows above the actual column headers. When the header row is row 3 instead of row 1, every column name imports as Column1, Column2, etc.

Fix: Clean & Shape → Fix Header Row

Mismatched identifiers across systems

HR uses EMP-001. Workday uses W-001. Workfront uses the employee's full name. When systems use different ID schemes for the same entity, joins silently drop rows — no error, just missing data.

The permanent fix is a cross-reference lookup table. Short of that:

Fix: Compare & Reconcile → Enable fuzzy matching, or Merge → Enrich with a lookup table

Data Quality Issues

Blank vs. null vs. zero

An empty cell, the text "NULL", and the number 0 are three distinct values. Aggregating a column with blanks skews averages. Joining on a null key matches nothing. Most systems export nulls inconsistently.

Fix: Clean & Shape → Filter to inspect blank rows; Standardize to replace null patterns

Duplicate rows

Duplicate records inflate counts and sums. They are often introduced by exports that include sub-total rows, by multi-sheet copies of the same data, or by incomplete deduplication upstream.

Fix: Clean & Shape → Deduplicate

Missing derived columns

A merge target file needs a column that doesn't exist in the source — for example, a full name built from first and last, or a variance computed from budget and actuals. Attempting to join on a column that doesn't exist fails silently or throws an error.

Fix: Clean & Shape → Combine Columns or Computed Column

Pre-Merge Checklist

Consider these items before running a merge or reconciliation.

Leading and trailing spaces removed — "trimming" means stripping invisible whitespace that clings to the start or end of a value (e.g. " Smith" vs "Smith"). Two values that look identical on screen can fail to match because one has a hidden space. Use Clean & Shape → Standardize → Clean up spaces to fix this. Capitalisation should also be consistent — "new york" and "New York" will not match unless normalised.
Dates are in a consistent format (preferably ISO: YYYY-MM-DD)
Numeric columns do not contain currency symbols or commas
Wide-format data has been reshaped to long format
Column headers are in row 1, not row 2 or 3
The join key exists in both files under the same column name
The join key is unique in at least one of the two files
Blank or null key values have been removed or filled
Duplicate rows have been removed from both files
Any required computed or combined columns have been added

Session Log

All operations performed in this session

# Time	Tab	Operation	Rows in	Rows out	Elapsed
No operations logged yet

DataPHIX

Upload Your File

Choose Which Columns to Protect

Output Row Count

Review Changes

Original

Anonymized

Ready to Download

Optional: Additional Files

Quick Add from Column List

Or Start with a Template (optional)

Refine Columns

Preview

Download Your Dataset

What are you trying to do?

Combine versions of the same dataset

Column Overview

Duplicate Row Handling

Combined File Ready

Combine data from different sources

A File A

B File B

Choose a Matching Key

Columns to Compare

Clean & Shape a File

What do you want to do?

Remove Duplicates

Standardize Values

Reshape: Wide → Long (Unpivot)

Combine Columns

Fix Header Row

Add Computed Column

Group & Aggregate

Filter Rows

Result

Common Data Concerns

Formatting Issues

Structure Issues

Data Quality Issues

Pre-Merge Checklist

Session Log

Report Builder