================== Grouped Data Fitting ================== Overview -------- nbragg supports analysis of grouped/gridded data, enabling you to: - Fit spatially-resolved measurements (e.g., imaging data on a 2D grid) - Analyze sequential measurements (e.g., scans along a line) - Process multiple samples or regions of interest simultaneously - Visualize parameter variations across your dataset All grouped fitting operations support parallel processing and provide intuitive visualization tools. Supported Data Structures -------------------------- Three types of grouped data are supported: **2D Grids** Regular spatial grids (e.g., detector pixels, scan positions) - Indexed by ``(row, col)`` tuples - Visualized as 2D heatmaps/colormeshes - Example: ``result.plot_parameter_map("thickness")`` shows a spatial map **1D Arrays** Linear sequences of measurements - Indexed by integers (0, 1, 2, ...) - Visualized as line plots - Example: parameter evolution along a scan direction **Named Groups** Arbitrary collections with custom identifiers - Indexed by descriptive strings ("sample_A", "roi_center", etc.) - Visualized as bar charts - Example: comparing different sample conditions Loading Grouped Data --------------------- Basic Usage ~~~~~~~~~~~ Load data using glob patterns:: import nbragg data = nbragg.Data.from_grouped( signal="path/to/signal_*.csv", openbeam="path/to/openbeam_*.csv", L=10, # sample-detector distance in meters tstep=10e-6 # time step in seconds ) The indices are automatically extracted from filenames. For 2D data, use naming like: - ``signal_x0_y0.csv``, ``signal_x0_y1.csv``, ... - ``signal_row0_col0.csv``, ``signal_row0_col1.csv``, ... For 1D data: - ``signal_0.csv``, ``signal_1.csv``, ... - ``signal_pixel_0.csv``, ``signal_pixel_1.csv``, ... Alternative Loading Methods ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **From folders** (loads all CSV files):: data = nbragg.Data.from_grouped( signal="path/to/signal_folder/", openbeam="path/to/openbeam_folder/", L=10, tstep=10e-6 ) **From file lists**:: signal_files = ["sig1.csv", "sig2.csv", "sig3.csv"] openbeam_files = ["ob1.csv", "ob2.csv", "ob3.csv"] data = nbragg.Data.from_grouped( signal=signal_files, openbeam=openbeam_files, L=10, tstep=10e-6 ) Data Attributes ~~~~~~~~~~~~~~~ Grouped data objects have these attributes:: data.is_grouped # True for grouped data data.indices # List of string indices: ["(0,0)", "(0,1)", ...] data.group_shape # Shape tuple: (3, 3) for 3x3 grid, (5,) for 5-element array data.groups # Dict mapping indices to dataframes Fitting Grouped Data -------------------- Basic Fitting ~~~~~~~~~~~~~ Fitting works identically to single datasets, but processes all groups:: from nbragg import TransmissionModel, CrossSection, materials # Define model xs = CrossSection(iron=materials["Fe_sg229_Iron-alpha"]) model = TransmissionModel(xs, vary_basic=True) # Fit all groups (automatically parallelized) result = model.fit( data, n_jobs=4, # Number of parallel workers progress_bar=True, # Show progress wlmin=1.5, wlmax=5.0 ) Parallel Processing ~~~~~~~~~~~~~~~~~~~ Grouped fitting uses true multiprocessing to achieve significant speedup on multi-core systems. The ``n_jobs`` parameter controls parallelization: - ``n_jobs=1``: Sequential processing - ``n_jobs=4``: Use 4 parallel workers (good default) - ``n_jobs=8``: Use 8 parallel workers - ``n_jobs=-1``: Use all available CPUs **Typical speedup** (32 groups, 500 wavelength points): - 2 workers: ~1.7x faster - 4 workers: ~2.5x faster - 8 workers: ~3.3x faster The ``backend`` parameter selects the parallelization strategy: - ``backend="loky"`` (default): True multiprocessing with separate processes. Each worker reconstructs the model independently, enabling full CPU parallelism. - ``backend="threading"``: Thread-based parallelism. Limited by Python's GIL but lower overhead for very fast fits. - ``backend="sequential"``: No parallelism. Useful for debugging. Example with explicit backend:: # True multiprocessing (default, recommended) result = model.fit(data, n_jobs=4, backend="loky") # Sequential for debugging result = model.fit(data, backend="sequential") .. note:: The first batch of fits has initialization overhead (~1-3s total) as each worker loads NCrystal. Subsequent fits are fast. For small datasets (<10 groups), sequential processing may be faster due to this overhead. Result Structure ~~~~~~~~~~~~~~~~ The ``fit()`` method returns a ``GroupedFitResult`` object:: result.results # Dict mapping indices to individual ModelResult objects result.indices # List of group indices result.group_shape # Original data shape Accessing Results ----------------- Individual Group Results ~~~~~~~~~~~~~~~~~~~~~~~~ Access results using flexible indexing:: # Using tuples (for 2D grids) r = result[(1, 2)] # Using strings (spaces optional) r = result["(1,2)"] # No spaces r = result["(1, 2)"] # With spaces # Using integers (for 1D arrays) r = result[5] r = result["5"] # Using names (for named groups) r = result["sample_A"] Each individual result is a standard lmfit ``ModelResult`` with all normal methods:: r.params['thickness'].value # Parameter value r.params['thickness'].stderr # Parameter error r.success # Fit success flag r.redchi # Reduced chi-square r.plot() # Plot fit Summary Statistics ~~~~~~~~~~~~~~~~~~ Get a comprehensive summary of all fits:: summary_df = result.summary() This returns a pandas DataFrame with columns: - ``index``: Group identifier - ``success``: Fit success flag - ``redchi``, ``chisqr``: Fit statistics - ``nfev``, ``nvarys``: Evaluation count and parameter count - ````: Value for each fitted parameter - ``_err``: Error for each fitted parameter In Jupyter notebooks, display a formatted HTML report:: from IPython.display import HTML, display display(HTML(result.fit_report())) Plotting Individual Results ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Plot any individual group's fit:: import matplotlib.pyplot as plt fig, ax = plt.subplots() result.plot(index=(1, 1), ax=ax) plt.show() For multi-stage fits, view the stages progression:: stages_table = result.stages_summary(index=(1, 1)) print(stages_table) Plot the total cross-section:: result.plot_total_xs(index=(1, 1), plot_dspace=True) Visualizing Parameter Maps --------------------------- The ``plot_parameter_map()`` method is the primary visualization tool. It automatically detects the appropriate plot type based on your data structure. Basic Parameter Maps ~~~~~~~~~~~~~~~~~~~~ Plot any fitted parameter:: result.plot_parameter_map("thickness") For 2D grids, this creates a heatmap. For 1D arrays, a line plot. For named groups, a bar chart. Customizing Appearance ~~~~~~~~~~~~~~~~~~~~~~ Control the visualization with keyword arguments:: import matplotlib.pyplot as plt fig, ax = plt.subplots(figsize=(10, 8)) result.plot_parameter_map( "thickness", ax=ax, cmap="viridis", # Colormap (for 2D) plot_errors=False, # Show/hide error map vmin=0.9, # Color scale minimum vmax=1.1, # Color scale maximum ) plt.title("Sample thickness variation") plt.tight_layout() plt.show() Plotting Errors ~~~~~~~~~~~~~~~ Visualize parameter uncertainties:: fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6)) # Values result.plot_parameter_map("thickness", ax=ax1) ax1.set_title("Thickness values") # Errors result.plot_parameter_map("thickness", plot_errors=True, ax=ax2) ax2.set_title("Thickness errors") plt.show() Filtering with Queries ~~~~~~~~~~~~~~~~~~~~~~ Use pandas query syntax to filter which groups to display:: # Show only successful fits result.plot_parameter_map( "thickness", query="success == True" ) # Show only good fits (reduced chi-square < 2) result.plot_parameter_map( "thickness", query="success == True and redchi < 2.0" ) # Complex queries result.plot_parameter_map( "thickness", query="redchi < 1.5 and thickness > 0.8 and thickness < 1.2" ) Explicit Plot Type Selection ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Override automatic detection if needed:: # Force 2D heatmap (if you have 2D data but want line plot, this won't work well) result.plot_parameter_map("thickness", kind="pcolormesh") # Force 1D line plot result.plot_parameter_map("thickness", kind="line") # Force bar chart result.plot_parameter_map("thickness", kind="bar") Saving and Loading ------------------ Saving Results ~~~~~~~~~~~~~~ Save all grouped fit results to a single file:: result.save("grouped_results.json") With optional model saving:: result.save( "grouped_results.json", model_filename="model.json" ) Compact format (faster but less readable):: result.save("grouped_results.json", compact=True) Loading Results ~~~~~~~~~~~~~~~ Load saved grouped results:: from nbragg.models import GroupedFitResult result = GroupedFitResult.load("grouped_results.json") Load with a model:: result = GroupedFitResult.load( "grouped_results.json", model_filename="model.json" ) Or pass a model instance:: result = GroupedFitResult.load( "grouped_results.json", model=my_model ) Combining Datasets ------------------ Add Multiple Measurements ~~~~~~~~~~~~~~~~~~~~~~~~~ Combine data from multiple measurement runs:: data1 = nbragg.Data.from_grouped( signal="run1/signal_*.csv", openbeam="run1/ob_*.csv", L=10, tstep=10e-6 ) data2 = nbragg.Data.from_grouped( signal="run2/signal_*.csv", openbeam="run2/ob_*.csv", L=10, tstep=10e-6 ) # Combine (adds counts, properly propagates errors) combined = data1 + data2 .. warning:: Both datasets must have identical indices and group structure. Mismatched indices raise a ``ValueError``. Advanced Features ----------------- Multi-Stage Fitting ~~~~~~~~~~~~~~~~~~~ Use Rietveld-type staged refinement:: # Define stages model.stages = { 'basic': ['norm', 'thickness'], 'background': 'background', 'all': 'all' } # Fit with stages result = model.fit(data, stages='all', n_jobs=4) # View stage progression for any group result.stages_summary(index=(1, 1)) See :doc:`advanced_fitting` for detailed information on multi-stage fitting. Custom Index Ordering ~~~~~~~~~~~~~~~~~~~~~ Control the order of indices during loading:: data = nbragg.Data.from_grouped( signal=signal_files, openbeam=ob_files, indices=["center", "edge", "corner"], # Custom order L=10, tstep=10e-6 ) Working with Subsets ~~~~~~~~~~~~~~~~~~~~ Extract and fit specific groups:: # Get subset of data (example - would need custom method) # subset_indices = ["(0,0)", "(1,1)", "(2,2)"] # subset_data = data.subset(subset_indices) # Fit only the subset # subset_result = model.fit(subset_data) Best Practices -------------- File Naming Conventions ~~~~~~~~~~~~~~~~~~~~~~~ For automatic index extraction: **2D grids** - Use clear row/column notation:: signal_x0_y0.csv, signal_x0_y1.csv, signal_x0_y2.csv signal_x1_y0.csv, signal_x1_y1.csv, signal_x1_y2.csv or:: signal_row0_col0.csv, signal_row0_col1.csv, ... **1D arrays** - Use sequential numbering:: signal_0.csv, signal_1.csv, signal_2.csv, ... or:: signal_pixel_0.csv, signal_pixel_1.csv, ... **Named groups** - Use descriptive names:: signal_sample_A.csv, signal_sample_B.csv signal_center.csv, signal_edge.csv, signal_corner.csv Performance Tips ~~~~~~~~~~~~~~~~ 1. **Use appropriate n_jobs**: Start with ``n_jobs=4`` for most systems, increase for large datasets 2. **Consider dataset size**: For <10 groups, ``backend="sequential"`` may be faster due to initialization overhead 3. **Limit wavelength range**: Use ``wlmin`` and ``wlmax`` to focus on relevant regions 4. **Pre-filter failed fits**: Use queries in visualizations to hide bad fits 5. **Save intermediate results**: Save after fitting to avoid recomputation 6. **Monitor speedup**: For best efficiency, ensure each fit takes >100ms; faster fits have more overhead Memory Management ~~~~~~~~~~~~~~~~~ For very large datasets: - Fit in batches if memory is limited - Use ``compact=True`` when saving to reduce file size - Clear unwanted results from memory:: del result # Free memory Common Patterns --------------- Imaging Analysis ~~~~~~~~~~~~~~~~ Typical workflow for spatially-resolved imaging data:: import nbragg import matplotlib.pyplot as plt # Load 2D grid data data = nbragg.Data.from_grouped( signal="imaging/signal_x*_y*.csv", openbeam="imaging/ob_x*_y*.csv", L=10, tstep=10e-6 ) # Setup model xs = nbragg.CrossSection(iron=nbragg.materials["Fe_sg229_Iron-alpha"]) model = nbragg.TransmissionModel(xs, vary_basic=True) # Fit all positions result = model.fit(data, n_jobs=4, progress_bar=True) # Create parameter maps fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(12, 10)) result.plot_parameter_map("thickness", ax=ax1) result.plot_parameter_map("thickness", plot_errors=True, ax=ax2) result.plot_parameter_map("norm", ax=ax3) result.plot_parameter_map("norm", plot_errors=True, ax=ax4) plt.tight_layout() plt.show() # Save results result.save("imaging_results.json") Line Scan Analysis ~~~~~~~~~~~~~~~~~~ For measurements along a single direction:: # Load 1D array data data = nbragg.Data.from_grouped( signal="scan/position_*.csv", openbeam="scan/ob_*.csv", L=10, tstep=10e-6 ) # Fit result = model.fit(data, n_jobs=4) # Plot parameter evolution fig, ax = plt.subplots(figsize=(10, 6)) result.plot_parameter_map("thickness", ax=ax) ax.set_xlabel("Position along scan") ax.set_ylabel("Thickness (cm)") plt.show() Multi-Sample Comparison ~~~~~~~~~~~~~~~~~~~~~~~ Comparing different samples or conditions:: # Load named groups data = nbragg.Data.from_grouped( signal="samples/*_signal.csv", openbeam="samples/*_ob.csv", L=10, tstep=10e-6 ) # Fit result = model.fit(data, n_jobs=-1) # Compare parameters fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6)) result.plot_parameter_map("thickness", ax=ax1) ax1.set_title("Thickness comparison") result.plot_parameter_map("norm", ax=ax2) ax2.set_title("Normalization comparison") plt.show() # Get summary table summary = result.summary() print(summary[['index', 'thickness', 'thickness_err', 'redchi']]) See Also -------- - :doc:`basic_usage` - Fundamental concepts - :doc:`advanced_fitting` - Multi-stage fitting strategies - :doc:`save_load` - Saving and loading results - :doc:`model_parameters` - Parameter control For a complete worked example, see the `grouped fits tutorial notebook `_.