| Title: | XOR Pattern Detection and Visualization |
|---|---|
| Description: | Provides tools for detecting XOR-like patterns in variable pairs in two-class data sets. Includes visualizations for pattern exploration and reporting capabilities with both text and HTML output formats. |
| Authors: | Jorn Lotsch [aut, cre] (ORCID: <https://orcid.org/0000-0002-5818-6958>), Alfred Ultsch [aut] |
| Maintainer: | Jorn Lotsch <[email protected]> |
| License: | GPL-3 |
| Version: | 0.1.0 |
| Built: | 2026-06-07 09:38:13 UTC |
| Source: | https://github.com/jornlotsch/detectxor |
Provides tools for detecting XOR-like patterns in variable pairs in two-class data sets. Includes visualizations for pattern exploration and reporting capabilities with both text and HTML output formats.
Core Features:
Statistical detection using chi-square tests and Kendall's tau
Spaghetti plots and xy plot for pattern visualization
Main Functions:
detect_xor: Core detection algorithm
generate_spaghetti_plot_from_results: Line plots
generate_xy_plot_from_results: Plot for pattern visualization
Jorn Lotsch <[email protected]>
Methodological foundations:
Pattern detection in machine learning
Statistical dependency measures (Kendall's tau)
Useful links:
Report bugs at https://github.com/JornLotsch/detectXOR/issues
Related packages:
# Basic workflow with included dataset data(XOR_data) # Detect XOR patterns results <- detect_xor(XOR_data, class_col = "class") # Generate visualizations generate_spaghetti_plot_from_results( results$results_df, XOR_data, class_col = "class" ) generate_xy_plot_from_results( results$results_df, XOR_data, class_col = "class" )# Basic workflow with included dataset data(XOR_data) # Detect XOR patterns results <- detect_xor(XOR_data, class_col = "class") # Generate visualizations generate_spaghetti_plot_from_results( results$results_df, XOR_data, class_col = "class" ) generate_xy_plot_from_results( results$results_df, XOR_data, class_col = "class" )
Identifies XOR-shaped relationships between variables using statistical tests and pattern detection.
detect_xor( data, class_col = "class", check_tau = TRUE, compute_axes_parallel_significance = TRUE, p_threshold = 0.05, tau_threshold = 0.3, abs_diff_threshold = 20, split_method = "quantile", max_cores = 1, extreme_handling = "winsorize", winsor_limits = c(0.05, 0.95), scale_data = TRUE, use_complete = TRUE )detect_xor( data, class_col = "class", check_tau = TRUE, compute_axes_parallel_significance = TRUE, p_threshold = 0.05, tau_threshold = 0.3, abs_diff_threshold = 20, split_method = "quantile", max_cores = 1, extreme_handling = "winsorize", winsor_limits = c(0.05, 0.95), scale_data = TRUE, use_complete = TRUE )
data |
Data frame containing features and class column |
class_col |
Name of class column (default: "class") |
check_tau |
Logical - compute classwise tau coefficients (default: TRUE) |
compute_axes_parallel_significance |
Logical - compute Wilcoxon tests (default: TRUE) |
p_threshold |
Significance threshold (default: 0.05) |
tau_threshold |
Tau coefficient threshold (default: 0.3) |
abs_diff_threshold |
Absolute difference threshold for patterns (default: 20) |
split_method |
Method for splitting data ("quantile" or "range") (default: "quantile") |
max_cores |
Maximum cores for parallel processing (default: NULL = automatic) |
extreme_handling |
Method for handling extreme values; options include "winsorize" or "none" (default: "winsorize") |
winsor_limits |
Numeric vector of length 2 specifying lower and upper quantiles for winsorization (default: c(0.05, 0.95)) |
scale_data |
Logical; whether to scale/standardize the data before analysis (default: TRUE) |
use_complete |
Logical; whether to use only complete cases (default: TRUE) |
This function performs an analysis to detect XOR-like patterns in pairwise variable relationships within two-class data sets. The analysis pipeline includes:
Data preprocessing (winsorization, scaling, complete cases)
Tile pattern analysis using chi-squared tests
Classwise Kendall tau correlation analysis
Group-wise Wilcoxon significance tests
The function automatically handles parallel processing when multiple cores are available and returns both a summary data frame and detailed results for further analysis.
List containing:
results_df |
Data frame with detection results for all variable pairs |
pair_list |
Detailed analysis results for each variable pair |
generate_spaghetti_plot_from_results for spaghetti plot visualization, generate_xy_plot_from_results for scatter plot visualization, generate_xor_reportConsole for console reporting, generate_xor_reportHTML for HTML report generation, XOR_data for example dataset
# Load example data data(XOR_data) # Run XOR detection results <- detect_xor(data = XOR_data, class_col = "class") # View summary of detected patterns print(results$results_df["xor_shape_detected"]) # Generate visualizations spaghetti_plot <- generate_spaghetti_plot_from_results( results = results, data = XOR_data, class_col = "class" ) print(spaghetti_plot) xy_plot <- generate_xy_plot_from_results( results = results, data = XOR_data, class_col = "class" ) print(xy_plot) # Generate console report (doesn't write files) generate_xor_reportConsole(results, XOR_data, "class", show_plots = FALSE) # View detailed results for detected pairs detected_pairs <- results$results_df[results$results_df$xor_shape_detected == TRUE, ] print(detected_pairs)# Load example data data(XOR_data) # Run XOR detection results <- detect_xor(data = XOR_data, class_col = "class") # View summary of detected patterns print(results$results_df["xor_shape_detected"]) # Generate visualizations spaghetti_plot <- generate_spaghetti_plot_from_results( results = results, data = XOR_data, class_col = "class" ) print(spaghetti_plot) xy_plot <- generate_xy_plot_from_results( results = results, data = XOR_data, class_col = "class" ) print(xy_plot) # Generate console report (doesn't write files) generate_xor_reportConsole(results, XOR_data, "class", show_plots = FALSE) # View detailed results for detected pairs detected_pairs <- results$results_df[results$results_df$xor_shape_detected == TRUE, ] print(detected_pairs)
Creates connected line plots for variable pairs showing XOR patterns.
generate_spaghetti_plot_from_results( results, data, class_col, scale_data = TRUE )generate_spaghetti_plot_from_results( results, data, class_col, scale_data = TRUE )
results |
Either a data frame from |
data |
Original dataset containing variables and classes |
class_col |
Character string specifying the name of the class column |
scale_data |
Logical indicating whether to scale variables before plotting (default: TRUE) |
This function creates spaghetti plots (connected line plots) for variable pairs that have been flagged as showing XOR patterns by detect_xor(). The function automatically handles both original and rotated XOR patterns, applying the appropriate coordinate transformation when necessary.
The function accepts either the full results object returned by detect_xor() or just the results_df component extracted from it. Variable pairs are separated using "||" as the delimiter in plot labels.
If no XOR patterns are detected, an empty plot with an appropriate message is returned.
To save the plot, use ggplot2::ggsave() or other standard R plotting save methods.
Returns a ggplot object. No files are saved automatically.
detect_xor for XOR pattern detection, generate_xy_plot_from_results for scatter plots
# Using full results object (recommended) data(XOR_data) results <- detect_xor(data = XOR_data, class_col = "class") spaghetti_plot <- generate_spaghetti_plot_from_results( results = results, data = XOR_data, class_col = "class" ) # Display the plot print(spaghetti_plot) # Save the plot if needed # ggplot2::ggsave("my_spaghetti_plot.png", spaghetti_plot) # Using extracted results_df (also works) xy_plot <- generate_spaghetti_plot_from_results( results = results$results_df, data = XOR_data, class_col = "class" )# Using full results object (recommended) data(XOR_data) results <- detect_xor(data = XOR_data, class_col = "class") spaghetti_plot <- generate_spaghetti_plot_from_results( results = results, data = XOR_data, class_col = "class" ) # Display the plot print(spaghetti_plot) # Save the plot if needed # ggplot2::ggsave("my_spaghetti_plot.png", spaghetti_plot) # Using extracted results_df (also works) xy_plot <- generate_spaghetti_plot_from_results( results = results$results_df, data = XOR_data, class_col = "class" )
Creates a report with formatted table and plots for XOR pattern detection results.
generate_xor_reportConsole( results, data, class_col, scale_data = TRUE, show_plots = TRUE, quantile_lines = c(1/3, 2/3), line_method = "quantile" )generate_xor_reportConsole( results, data, class_col, scale_data = TRUE, show_plots = TRUE, quantile_lines = c(1/3, 2/3), line_method = "quantile" )
results |
Either a data frame from |
data |
Original dataset containing variables and classes. |
class_col |
Character specifying the class column name. |
scale_data |
Logical indicating whether to scale variables in plots. Default: TRUE. |
show_plots |
Logical indicating whether to display plots. Default: TRUE. |
quantile_lines |
Numeric vector of quantiles for reference lines in XY plots. Default: c(1/3, 2/3). |
line_method |
Method for boundary calculation ("quantile" or "range"). Default: "quantile". |
Invisibly returns a list containing the formatted table and plots (if generated).
detect_xor for XOR pattern detection,
generate_xor_reportHTML for HTML report generation
Creates an HTML report with formatted table and plots for XOR pattern detection results.
generate_xor_reportHTML( results, data, class_col, output_file = "xor_detection_report.html", open_browser = TRUE, scale_data = TRUE, quantile_lines = c(1/3, 2/3), line_method = "quantile" )generate_xor_reportHTML( results, data, class_col, output_file = "xor_detection_report.html", open_browser = TRUE, scale_data = TRUE, quantile_lines = c(1/3, 2/3), line_method = "quantile" )
results |
Either a data frame from |
data |
Original dataset containing variables and classes. |
class_col |
Character specifying the class column name. |
output_file |
Character specifying the output HTML file name. Default: "xor_detection_report.html". |
open_browser |
Logical indicating whether to open the report in browser automatically. Default: TRUE. |
scale_data |
Logical indicating whether to scale variables in plots. Default: TRUE. |
quantile_lines |
Numeric vector of quantiles for reference lines in XY plots. Default: c(1/3, 2/3). |
line_method |
Method for boundary calculation ("quantile" or "range"). Default: "quantile". |
Invisibly returns the file path of the generated HTML report.
detect_xor for XOR pattern detection,
generate_xor_reportConsole for text-based report generation
Creates scatterplots with decision boundaries for variable pairs showing XOR patterns.
generate_xy_plot_from_results( results, data, class_col, scale_data = TRUE, quantile_lines = c(1/3, 2/3), line_method = "quantile" )generate_xy_plot_from_results( results, data, class_col, scale_data = TRUE, quantile_lines = c(1/3, 2/3), line_method = "quantile" )
results |
Either a data frame from |
data |
Original dataset containing variables and classes |
class_col |
Character string specifying the name of the class column |
scale_data |
Logical indicating whether to scale variables before plotting (default: TRUE) |
quantile_lines |
Numeric vector of length 2 specifying quantiles for reference lines (default: c(1/3, 2/3)) |
line_method |
Character string specifying the boundary calculation method, either "quantile" or "range" (default: "quantile") |
This function creates scatter plots for variable pairs that have been flagged as showing XOR patterns by detect_xor(). The plots include dashed reference lines that help visualize the decision boundaries used in XOR pattern detection.
The function automatically handles both original and rotated XOR patterns, applying the appropriate coordinate transformation when necessary. Variable pairs are separated using "||" as the delimiter in plot labels.
The line_method parameter controls how reference lines are calculated:
"quantile": Lines are placed at the specified quantiles of the data distribution
"range": Lines divide the data range into three equal parts
If no XOR patterns are detected, an empty plot with an appropriate message is returned.
To save the plot, use ggplot2::ggsave() or other standard R plotting save methods.
Returns a ggplot object. No files are saved automatically.
detect_xor for XOR pattern detection, generate_spaghetti_plot_from_results for spaghetti plots
# Using full results object (recommended) data(XOR_data) results <- detect_xor(data = XOR_data, class_col = "class") xy_plot <- generate_xy_plot_from_results( results = results, data = XOR_data, class_col = "class" ) # Display the plot print(xy_plot) # Using different boundary method xy_plot_range <- generate_xy_plot_from_results( results = results, data = XOR_data, class_col = "class", line_method = "range" ) # Save the plot if needed # ggplot2::ggsave("my_xy_plot.png", xy_plot) # Using extracted results_df (also works) xy_plot_df <- generate_xy_plot_from_results( results = results$results_df, data = XOR_data, class_col = "class" )# Using full results object (recommended) data(XOR_data) results <- detect_xor(data = XOR_data, class_col = "class") xy_plot <- generate_xy_plot_from_results( results = results, data = XOR_data, class_col = "class" ) # Display the plot print(xy_plot) # Using different boundary method xy_plot_range <- generate_xy_plot_from_results( results = results, data = XOR_data, class_col = "class", line_method = "range" ) # Save the plot if needed # ggplot2::ggsave("my_xy_plot.png", xy_plot) # Using extracted results_df (also works) xy_plot_df <- generate_xy_plot_from_results( results = results$results_df, data = XOR_data, class_col = "class" )
Simulated classification dataset containing 400 observations with 5 features demonstrating XOR patterns, linear class differences, and random noise.
data("XOR_data")data("XOR_data")
A data frame with 400 rows and 6 variables:
Binary class labels (1 or 2)
Normally distributed with subtle class difference (delta mu=0.25)
High-variance normal distribution (sigma=3) with moderate class separation (delta mu=-0.7)
XOR pattern component 1 (mu=3 vs 10 between classes)
XOR pattern component 2 (mu=3 vs 10 between classes)
Uniform noise (1-10)
Synthetic data generated with rnorm() and runif()
data(XOR_data) str(XOR_data) summary(XOR_data)data(XOR_data) str(XOR_data) summary(XOR_data)