DATAPHIX
Loading tools…

DataPHIX

Anonymize sensitive data, generate synthetic datasets, clean & shape files, merge multiple sources, and compare files — all in your browser

Demo Version Learn More →
Anonymize Data
🔒
Your data never leaves your device. This tool runs entirely in your browser. No files are uploaded, transmitted, or stored on any server — not even ours. You can verify this by opening your browser’s Network tab while processing a file: zero data requests are made.
Use offline anytime
For added peace of mind, download a standalone version of this tool — one file, no internet required, opens in any browser.
⬇ Download

Replace sensitive columns with realistic substitute values

Everything runs in your browser — no data is uploaded or sent anywhere.

1 · Upload
Drop any CSV, TSV, or Excel file. Multi-sheet files let you choose which tab to use.
2 · Choose columns
Sensitive columns are auto-flagged. Toggle each on or off, and choose what replacement type to use for each column.
3 · Download
Get the anonymized file, plus an optional mapping reference that links anonymized values back to originals.

Upload Your File

Drop your file here or click to browse

Supports CSV, TSV, XLS, XLSX

Create New Data Set

Build a realistic dataset from scratch

No real data required — everything runs in your browser.

1 · Name your columns
Type column names or pick a preset template — data types are detected automatically.
2 · Set row count
Choose how many rows to generate. Preview a sample before downloading.
3 · Download
Export as CSV or Excel. Useful for demos, testing pipelines, or training data.

Quick Add from Column List

Type your column names separated by commas — the tool will detect the right data type for each one automatically.

or press Cmd/Ctrl + Enter
Try: sales transactions  ·  support tickets  ·  web analytics  ·  patient records

Or Start with a Template (optional)

Load a pre-built column schema, then tweak as needed.

Refine Columns

Edit names, change types, or tweak options. Sample values update live.

No columns defined yet. Add a column below or choose a template above.
Merge Files
🔒 Full access required. This module is available in the full version of DataPHIX. Learn more →

What are you trying to do?

Choose the option that fits your situation.

Compare & Reconcile
🔒 Full access required. This module is available in the full version of DataPHIX. Learn more →

Compare two files and surface every difference

Budget vs. actuals, this month vs. last, CRM vs. survey — nothing leaves your browser.

1 · Upload two files
Drop File A and File B — CSV, TSV, or Excel. They should share at least one common key column.
2 · Choose a match key
Pick the column that uniquely identifies each record. The tool scores all candidates and recommends the best one.
3 · Review & export
See matched, different, and unmatched rows with variance analysis. Export a full Excel report.

A File A

Your reference file — the one you consider authoritative.

Drop File A here or click to browse

Supports CSV, TSV, XLS, XLSX

B File B

The file you want to check against File A.

Drop File B here or click to browse

Supports CSV, TSV, XLS, XLSX

Clean & Shape
1
Upload
2
Choose Tool
3
Configure
4
Download

What do you want to do?

Pick a tool — after previewing the result you can apply it and run another tool on the same data.

Fix Header Row
Choose which row is the real header and remove title rows or metadata rows above it. Optionally merge two-row headers into single column names.
e.g. rows 1–2 are a report title, row 3 is the actual column header
Remove Duplicates
Find and remove repeated rows. Choose which columns define a "duplicate" or check all columns at once.
e.g. same email submitted twice
Standardize Values
Auto-scans your data and flags issues — trim spaces, fix capitalization, normalize dates, clear null variants, and more.
e.g. "N/A", "n/a", "NA" → blank
Filter Rows
Keep only the rows that match a condition. Exports a clean filtered file — original is unchanged.
e.g. only rows where Region = "West"
Reshape: Wide → Long
Turn multiple column headers into rows. Useful when each column represents a time period, question, or category.
e.g. Jan / Feb / Mar columns → one row each
Combine Columns
Merge two or more columns into one. Choose a separator and optionally apply title case to the result.
e.g. First Name + Last Name → Full Name
Add Computed Column
Create a new column from two existing columns using maths: add, subtract, multiply, divide, or percentage.
e.g. Actuals ÷ Forecast × 100 → Delivery%
Group & Aggregate
Roll up rows by grouping columns and summing, averaging, or counting numeric columns. Converts weekly rows to monthly totals, for example.
e.g. Weekly timesheets → Monthly hours by Person + Project
👆 Click a workflow tab above to get started!

Common Data Concerns

Common data quality issues and how to resolve them before merging or reconciling.

Formatting Issues

Trailing and leading whitespace

"Sarah Chen" and "Sarah Chen " are not equal to a computer. Extra spaces are invisible on screen but break every exact match — the most common cause of silently dropped rows in a merge.

Fix: Clean & Shape → Standardize → Trim whitespace
Inconsistent capitalisation

"OPERATIONS", "Operations", and "operations" are three different values in an exact match. Common when data comes from multiple input systems or manual entry.

Fix: Clean & Shape → Standardize → Fix capitalisation
Mixed date formats

One system exports 01/15/2024, another exports 2024-01-15. When sorted or compared as text, dates produce wrong results. Numeric comparisons on text dates fail silently.

Fix: Clean & Shape → Standardize → Normalise dates to ISO (YYYY-MM-DD)
Numbers stored as text

A column containing "1,200" or "$450.00" looks numeric but is a string. Sums return zero, comparisons fail, and the column sorts alphabetically instead of numerically.

Fix: Clean & Shape → Standardize → Strip currency/comma symbols, or use a Computed Column to cast values

Structure Issues

Wide format (pivoted data)

Data where time periods or categories are spread across columns — e.g., Jan, Feb, Mar as separate column headers. This format is readable but cannot be filtered, grouped, or joined. Most BI tools and merge operations require long format.

Wide (before)
NameJanFeb
Alice120130
Long (after)
NameMonthHours
AliceJan120
AliceFeb130
Fix: Clean & Shape → Reshape (Unpivot)
Header row is not row 1

Some exports include report titles, export dates, or blank rows above the actual column headers. When the header row is row 3 instead of row 1, every column name imports as Column1, Column2, etc.

Fix: Clean & Shape → Fix Header Row
Mismatched identifiers across systems

HR uses EMP-001. Workday uses W-001. Workfront uses the employee's full name. When systems use different ID schemes for the same entity, joins silently drop rows — no error, just missing data.

The permanent fix is a cross-reference lookup table. Short of that:
Fix: Compare & Reconcile → Enable fuzzy matching, or Merge → Enrich with a lookup table

Data Quality Issues

Blank vs. null vs. zero

An empty cell, the text "NULL", and the number 0 are three distinct values. Aggregating a column with blanks skews averages. Joining on a null key matches nothing. Most systems export nulls inconsistently.

Fix: Clean & Shape → Filter to inspect blank rows; Standardize to replace null patterns
Duplicate rows

Duplicate records inflate counts and sums. They are often introduced by exports that include sub-total rows, by multi-sheet copies of the same data, or by incomplete deduplication upstream.

Fix: Clean & Shape → Deduplicate
Missing derived columns

A merge target file needs a column that doesn't exist in the source — for example, a full name built from first and last, or a variance computed from budget and actuals. Attempting to join on a column that doesn't exist fails silently or throws an error.

Fix: Clean & Shape → Combine Columns or Computed Column

Pre-Merge Checklist

Consider these items before running a merge or reconciliation.

  1. Leading and trailing spaces removed — "trimming" means stripping invisible whitespace that clings to the start or end of a value (e.g. " Smith" vs "Smith"). Two values that look identical on screen can fail to match because one has a hidden space. Use Clean & Shape → Standardize → Clean up spaces to fix this. Capitalisation should also be consistent — "new york" and "New York" will not match unless normalised.
  2. Dates are in a consistent format (preferably ISO: YYYY-MM-DD)
  3. Numeric columns do not contain currency symbols or commas
  4. Wide-format data has been reshaped to long format
  5. Column headers are in row 1, not row 2 or 3
  6. The join key exists in both files under the same column name
  7. The join key is unique in at least one of the two files
  8. Blank or null key values have been removed or filled
  9. Duplicate rows have been removed from both files
  10. Any required computed or combined columns have been added