Package: irelink 0.0.1

Christopher T. Kenny

irelink: Fast Probabilistic Record Linkage

Performs fast, scalable probabilistic record linkage and deduplication using the Fellegi-Sunter model. Records lacking a shared unique identifier are compared across configurable dimensions using exact, fuzzy, and distance-based comparisons, with model parameters estimated via unsupervised Expectation-Maximization. Multiple SQL backends are supported through 'DBI', enabling execution from laptop-scale ('DuckDB') through to distributed engines. This package is a translation of the Python 'splink' library by Linacre et al. into idiomatic R.

Authors:Christopher T. Kenny [aut, cre, cph], Robin Linacre [cph], Sam Lindsay [cph], Theodore Manassis [cph], Tom Hepworth [cph], Andy Bond [cph], Ross Kennedy [cph], UK Ministry of Justice [cph]

irelink_0.0.1.tar.gz
irelink_0.0.1.zip(r-4.7)irelink_0.0.1.zip(r-4.6)irelink_0.0.1.zip(r-4.5)
irelink_0.0.1.tgz(r-4.6-any)irelink_0.0.1.tgz(r-4.5-any)
irelink_0.0.1.tar.gz(r-4.7-any)irelink_0.0.1.tar.gz(r-4.6-any)
irelink_0.0.1.tgz(r-4.6-emscripten)
manual.pdf |manual.html
card.svg |card.png
irelink/json (API)
NEWS

# Install 'irelink' in R:
install.packages('irelink', repos = c('https://christopherkenny.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/christopherkenny/irelink/issues

Pkgdown/docs site:https://christophertkenny.com

Datasets:
  • fake_1000 - Splink Fake 1000: Deduplication Benchmark
  • fake_1000_labels - Splink Fake 1000: Clerical Pairwise Labels
  • fake_20 - Fake 20: Minimal Deduplication Example
  • febrl4a - FEBRL 4a: Record Linkage Original Records
  • febrl4b - FEBRL 4b: Record Linkage Duplicate Records

On CRAN:

Conda:

5.46 score 4 stars 20 scripts 106 exports 26 dependencies

Last updated from:51c3d60282. Checks:9 OK. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-x86_64OK254
source / vignettesOK586
linux-release-x86_64OK257
macos-release-arm64OK1304
macos-oldrel-arm64OK1340
windows-develOK2829
windows-releaseOK2949
windows-oldrelOK2963
wasm-releaseOK115

Exports:block_from_labelsblock_oncl_andcl_array_intersectcl_array_min_distancecl_array_subsetcl_columns_reversedcl_cosinecl_customcl_damerau_levenshteincl_date_diffcl_dobcl_elsecl_emailcl_exactcl_first_last_namecl_forename_surnamecl_geo_distancecl_jaccardcl_jarocl_jaro_winklercl_levelscl_levenshteincl_literalcl_namecl_notcl_nullcl_numeric_diffcl_orcl_pct_diffcl_postcodecl_soundexcl_time_diffcl_zip_codedayshoursil_accuracyil_array_elementil_attachil_block_onil_cast_to_stringil_cleanupil_cleanup_allil_clusteril_cluster_confusion_matrixil_comparator_scoreil_comparator_threshold_chartil_compareil_compare_recordsil_comparison_vectorsil_completenessil_confusion_matrixil_constrain_mil_constraintsil_count_pairsil_deterministic_linkil_dmetaphoneil_errorsil_estimate_emil_estimate_m_from_columnil_estimate_m_from_labelsil_estimate_prioril_estimate_uil_find_blocking_belowil_find_matchesil_graph_metricsil_largest_blocksil_loadil_metaphoneil_modelil_nullifil_parametersil_phonetic_chartil_precision_recallil_prior_mil_prior_prevalenceil_priorsil_profileil_regex_extractil_register_tfil_rocil_saveil_score_missing_edgesil_score_patternsil_soundexil_specil_string_similarityil_substril_suggest_blockingil_tf_chartil_training_historyil_transformil_try_parse_dateil_try_parse_timestampil_unlinkablesil_waterfallil_weightsis_il_modelis_il_speckmlabels_from_columnmiminutesmonthssecondsyears

Dependencies:clicpp11DBIduckdbfarverggplot2gluegtableisobandlabelinglifecyclemagrittrpillarpkgconfigR6RColorBrewerrlangS7scalesstringdisttibbletidyselectutf8vctrsviridisLitewithr

Advanced Workflows

Rendered fromadvanced.Rmdusingknitr::rmarkdownon Jun 19 2026.

Last update: 2026-05-20
Started: 2026-04-10

Deduplicating 50k Synthetic Records

Rendered fromdeduplicate-50k.Rmdusingknitr::rmarkdownon Jun 19 2026.

Last update: 2026-05-20
Started: 2026-04-10

Deduplication with Evaluation

Rendered fromdeduplication.Rmdusingknitr::rmarkdownon Jun 19 2026.

Last update: 2026-05-20
Started: 2026-03-28

Getting Started

Rendered fromirelink.Rmdusingknitr::rmarkdownon Jun 19 2026.

Last update: 2026-05-20
Started: 2026-03-27

Linking Banking Transactions

Rendered fromtransactions.Rmdusingknitr::rmarkdownon Jun 19 2026.

Last update: 2026-05-20
Started: 2026-04-10

Record Linkage Across Datasets

Rendered fromrecord-linkage.Rmdusingknitr::rmarkdownon Jun 19 2026.

Last update: 2026-05-20
Started: 2026-03-28

Translating from fastLink

Rendered fromfrom_fastLink.Rmdusingknitr::rmarkdownon Jun 19 2026.

Last update: 2026-05-20
Started: 2026-05-20

Translating from Splink

Rendered fromfrom_splink.Rmdusingknitr::rmarkdownon Jun 19 2026.

Last update: 2026-05-20
Started: 2026-03-27

Readme and manuals

Help Manual

Help pageTopics
Plot Accuracy Metrics Across Thresholdsautoplot.il_accuracy
Plot Batch Comparator Scoresautoplot.il_comparator_score
Quick Plot for Scored Pairsautoplot.il_compared
Plot Comparison Vector Distributionautoplot.il_comparison_vectors
Plot Column Completenessautoplot.il_completeness
Plot Blocking Rule Pair Countsautoplot.il_count_pairs
Quick Match-Weights Plot for a Modelautoplot.il_model
Plot Precision–Recall Curveautoplot.il_precision_recall
Plot Column Value Profilesautoplot.il_profile
Plot ROC Curveautoplot.il_roc
Comparator Score Bar Chartautoplot.il_string_similarity
Plot EM Training Historyautoplot.il_training_history
Plot Unlinkables Curveautoplot.il_unlinkables
Derive Blocking Rules from Labeled Pairsblock_from_labels
Create a Training-Time Blocking Ruleblock_on
Combine Comparison Conditions with ANDcl_and
Array Intersection Comparisoncl_array_intersect
Pairwise Array Minimum Distance Comparisoncl_array_min_distance
Array Subset Comparisoncl_array_subset
Swap Detection for Two Columnscl_columns_reversed
Cosine Similarity Comparisoncl_cosine
Custom SQL Comparisoncl_custom
Damerau-Levenshtein Edit-Distance Comparisoncl_damerau_levenshtein
Date Difference Comparisoncl_date_diff
Date of Birth Comparisoncl_dob
Catch-All Else Levelcl_else
Email Address Comparisoncl_email
Exact Equality Comparisoncl_exact
First Name and Last Name Comparison with Swap Detectioncl_first_last_name
Forename and Surname Comparison with Swap Detectioncl_forename_surname
Geographic Distance Comparisoncl_geo_distance
Jaccard Set Similarity Comparisoncl_jaccard
Jaro String Similarity Comparisoncl_jaro
Jaro-Winkler String Similarity Comparisoncl_jaro_winkler
Compose Custom Comparison Levelscl_levels
Levenshtein Edit-Distance Comparisoncl_levenshtein
Literal Value Comparisoncl_literal
Personal Name Comparisoncl_name
Negate a Comparison Conditioncl_not
Null / Missing Value Levelcl_null
Numeric Absolute Difference Comparisoncl_numeric_diff
Combine Comparison Conditions with ORcl_or
Numeric Percentage Difference Comparisoncl_pct_diff
Postcode Comparisoncl_postcode
Soundex Phonetic Comparisoncl_soundex
Time Difference Comparisoncl_time_diff
ZIP Code Comparisoncl_zip_code
Create a Duration in Daysdays
Splink Fake 1000: Deduplication Benchmarkfake_1000
Splink Fake 1000: Clerical Pairwise Labelsfake_1000_labels
Fake 20: Minimal Deduplication Examplefake_20
FEBRL 4a: Record Linkage Original Recordsfebrl4a
FEBRL 4b: Record Linkage Duplicate Recordsfebrl4b
Create a Duration in Hourshours
Accuracy Metrics Across Thresholdsil_accuracy
Array Element Column Transformil_array_element
Attach a Saved Model to Fresh Datail_attach
Add a Prediction Blocking Ruleil_block_on
Cast to String Column Transformil_cast_to_string
Remove Model-Owned Temporary Tables from Databaseil_cleanup
Remove All irelink Temporary Tables from a Databaseil_cleanup_all
Cluster Scored Pairs into Entitiesil_cluster
Cluster-Level Confusion Matrix for Deduplicationil_cluster_confusion_matrix
Batch String Similarity Scoresil_comparator_score
Comparator Score Threshold Chartil_comparator_threshold_chart
Add a Comparison Layer to a Specificationil_compare
Compare Two Individual Recordsil_compare_records
Comparison Vector Distributionil_comparison_vectors
Column Completeness Across Datasetsil_completeness
Confusion Matrix at a Thresholdil_confusion_matrix
Add a Fixed Matched-Class Constraintil_constrain_m
Inspect Model Constraintsil_constraints
Count Candidate Pairs Under Blocking Rulesil_count_pairs
Deterministic Record Linkageil_deterministic_link
Identify Prediction Errorsil_errors
Train Parameters via Expectation-Maximizationil_estimate_em
Estimate Match (m) Parameters from a Label Columnil_estimate_m_from_column
Estimate Match (m) Parameters from Labeled Datail_estimate_m_from_labels
Estimate the Prior Match Probabilityil_estimate_prior
Estimate Non-Match (u) Parametersil_estimate_u
Find Blocking Rules Below a Pair-Count Thresholdil_find_blocking_below
Find Matches for New Recordsil_find_matches
Compute Graph Metrics for Clustersil_graph_metrics
Identify the Largest Blocking Binsil_largest_blocks
Load a Saved Modelil_load
Create a Linkage Modelil_model
Replace a Value with NA Column Transformil_nullif
Extract Model Parametersil_parameters
Phonetic Match Chartil_phonetic_chart
Compute Precision-Recall Curve Datail_precision_recall
Add a Matched-Class Comparison Prioril_prior_m
Add a Prevalence Prioril_prior_prevalence
Inspect Model Priorsil_priors
Profile Column Value Distributionsil_profile
Regex Extraction Column Transformil_regex_extract
Register Pre-Computed Term Frequency Tablesil_register_tf
Compute ROC Curve Datail_roc
Save a Model to Diskil_save
Score Missing Edges Within Clustersil_score_missing_edges
Score Comparison Patternsil_score_patterns
Create an Empty Linkage Specificationil_spec
Compute String Similarity Scoresil_string_similarity
Extract a Substring Column Transformil_substr
Suggest Blocking Rulesil_suggest_blocking
Term Frequency Adjustment Chartil_tf_chart
Extract EM Training Historyil_training_history
Compose Multiple Transforms into a Chainil_transform
Try-Parse Date Column Transformil_try_parse_date
Try-Parse Timestamp Column Transformil_try_parse_timestamp
Compute Unlinkable Recordsil_unlinkables
Extract Waterfall Data for a Single Pairil_waterfall
Extract Match Weights by Comparison Levelil_weights
Test if an Object is an irelink Modelis_il_model
Test if an Object is an irelink Specificationis_il_spec
Create a Distance in Kilometreskm
Derive Pairwise Labels from a Ground-Truth Columnlabels_from_column
Create a Distance in Milesmi
Create a Duration in Minutesminutes
Create a Duration in Monthsmonths
Phonetic Transform Functionsil_dmetaphone il_metaphone il_soundex phonetic
Score Record Pairs from a Trained Modelpredict.il_model
Print an irelink Modelprint.il_model
Print an irelink Specificationprint.il_spec
Create a Duration in Secondsseconds
Summarize an irelink Modelsummary.il_model
Create a Duration in Yearsyears