Initial development release, translating Python's splink probabilistic record linkage engine into idiomatic R.
il_spec(), il_compare(), and il_block_on() define the linkage model declaratively: which fields to compare, how to compare them, and which blocking rules to apply.il_model() binds a spec to one or two datasets and a DBI connection for dedupe, link, or link_and_dedupe, and accepts in-memory data frames, dbplyr::tbl_lazy references, or existing table-name strings.predict() scores all candidate pairs above a match-probability threshold (or an evidence-only match-weight threshold via threshold_match_weight).il_cluster() resolves scored pairs into entity clusters via connected components (using igraph) or single-best-link (with source_dataset for cross-source filtering).cl_exact(), cl_levenshtein(), cl_damerau_levenshtein(), cl_jaro(), cl_jaro_winkler(), cl_jaccard(), cl_cosine().cl_numeric_diff(), cl_pct_diff(), cl_geo_distance().cl_date_diff() for date proximity with days(), months(), and years() helpers, and cl_time_diff() for sub-day precision with seconds(), minutes(), and hours() helpers.cl_array_intersect(), cl_array_subset(), cl_array_min_distance().cl_name(), cl_first_last_name() / cl_forename_surname() (both accept a companion column for the surname and handle first/last swap detection), cl_dob(), cl_email(), cl_domain(), cl_soundex(), cl_zip_code() (exact, 5-digit ZIP+4, and 3-digit Sectional Center Facility prefix levels), cl_postcode().cl_levels(), cl_and(), cl_or(), cl_not(), cl_null(), cl_else(), cl_literal(), cl_custom(), cl_columns_reversed().term_frequency = TRUE for Fellegi-Sunter term-frequency adjustments.il_transform() composes multiple R functions into a chainable transform with SQL-side nesting (e.g. TRIM(LOWER(col))).il_substr(), il_regex_extract(), il_nullif(), il_cast_to_string(), il_try_parse_date(), il_array_element().tolower, toupper, trimws.il_soundex(), il_metaphone(), il_dmetaphone() (usable as R functions and SQL macros).il_block_on() and block_on() for equality-based and custom SQL blocking rules, with per-column transform support via formula syntax (col ~ transform, e.g. first_name ~ il_substr(1, 3)) or a named-list .transform for programmatic construction..explode parameter for array-valued blocking columns (generates UNNEST subqueries for DuckDB/PostgreSQL).il_count_pairs() estimates candidate-pair counts, including cumulative totals and percent-of-cartesian summaries across rule combinations.il_suggest_blocking() ranks candidate blocking rules by pair-reduction, coverage, and balanced score.il_find_blocking_below() finds blocking rule combinations below a pair count ceiling.block_from_labels() measures per-column recall from labeled pairs.il_largest_blocks() identifies the blocking keys that generate the most records and pairs, respecting blocking transforms.il_estimate_u() estimates non-match probabilities by sampling random pairs, with optional chunked estimation through chunk_size and early stopping through min_count_per_level.il_estimate_em() runs the Fellegi-Sunter EM algorithm with configurable max_iterations, convergence, fix_u, fix_m, fix_prior, derive_prior, estimate_without_tf, and estimator_mode parameters.estimator_mode = "dependency-aware" fits log-linear matched and unmatched comparison-pattern distributions over aggregated gamma counts, preserving missing comparison states as explicit pattern levels.il_estimate_prior() sets the prior match probability from deterministic matching rules, counting unique blocked pairs across overlapping rules.il_prior_prevalence() and il_prior_m() add regularizing custom priors for EM, il_constrain_m() adds explicit fixed matched-class constraints, and il_priors() / il_constraints() expose the stored metadata.il_estimate_m_from_labels() and il_estimate_m_from_column() initialize parameters from ground-truth labels.predict() supports both threshold (match probability) and threshold_match_weight (evidence-only log2 Bayes factor) filtering.match_weight, prior-inclusive total_match_weight, and posterior match_probability.predict(type = "weights") returns match weights on the log2 Bayes-factor scale, and greedy = TRUE adds deterministic one-to-one post-processing for link models.include_fields = TRUE joins all source columns into the scored output.collect = FALSE returns an il_compared_lazy object backed by a model-scoped in-database table.il_score_missing_edges() enumerates and scores unscored within-cluster pairs.il_score_patterns() scores compatible comparison-pattern tables, including dependency-aware pattern tables larger than the table used for fitting.il_deterministic_link() performs single-table exact-match deduplication without training.il_find_matches() scores a set of probe records against existing data.profile_sql = TRUE on predict() attaches lightweight SQL timing metadata to collected predictions or lazy prediction objects.il_parameters() and il_weights() expose the learned m/u parameters.il_waterfall() decomposes a pair's match weight into per-comparison contributions.il_training_history() tracks parameter convergence across EM iterations.il_completeness() and il_profile() summarize data quality, and il_profile() accepts raw SQL expressions as column definitions (e.g., "city || left(first_name, 1)").il_unlinkables() identifies records that cannot be linked under any blocking rule.il_accuracy(), il_precision_recall(), and il_roc() evaluate performance against labeled data.il_errors() surfaces false positives and false negatives.il_graph_metrics() computes node degree, node centrality, cluster density, cluster centralization, and bridge detection.il_comparison_vectors() returns the gamma pattern distribution from a trained model.il_compare_records() scores one explicit record pair against a spec without fitting a full model, and il_string_similarity() computes 5 string similarity metrics for a single pair.il_comparator_score() computes batch string similarity across a DataFrame with SQL-side scoring on DuckDB/PostgreSQL.il_comparator_threshold_chart() visualizes match rates at multiple similarity thresholds.il_phonetic_chart() produces a Soundex agreement heatmap.il_tf_chart() visualizes model-specific term frequency distributions with labeled most/least common values.il_register_tf() registers pre-computed term frequency tables in the database and returns the updated model.autoplot() methods for il_model, il_compared, il_training_history, il_accuracy, il_roc, il_precision_recall, il_unlinkables, il_completeness, il_count_pairs, il_profile, il_string_similarity, il_comparator_score, and il_comparison_vectors.fake_1000: 1,000 records (250 entities) for deduplication.fake_1000_labels: 3,176 pairwise labels for evaluation.fake_20: minimal 20-record example.febrl4a / febrl4b: 5,000-record cross-table linkage benchmark from FEBRL.dbplyr::tbl_lazy references and existing table names, in addition to in-memory data frames.il_save() and il_load() support both RDS files and Splink settings JSON.il_attach() reattaches a saved model to different data or connections.il_cleanup() removes temporary tables owned by a single model, making it safe for shared DBI connections with multiple live models.il_cleanup_all() removes all package-owned temporary tables from a connection for exploratory sessions and failed runs.stringdist.profile_sql = TRUE on il_estimate_u(), il_estimate_prior(), and predict() records lightweight SQL timing metadata for performance investigation.