Technical Overview & Methodology

This document provides a technical reference for the Combs Powder Indexing software. It details the program's algorithms, search parameters, and methodologies for scientists and researchers familiar with powder X-ray diffraction techniques.

Core Methodology

The goal of ab initio powder indexing is to determine the unit cell parameters ($a, b, c, \alpha, \beta, \gamma$) from a list of observed diffraction peak positions ($2\theta$). This program implements a system-specific, exhaustive search algorithm to solve this problem.

The core principle is to assume that a small subset of the most intense, low-angle reflections must correspond to simple crystal planes with low-integer Miller indices $(hkl)$. The program solves the indexing equations by taking the exact number of unknown parameters for a given crystal system and solving a system of linear equations using that many observed peaks.

To linearize the indexing equations, all peak positions are first converted from $2\theta$ to Q-space, where $Q = 1/d^2$. The relationship between $Q$, the Miller indices, and the reciprocal cell parameters ($A, B, C, D, E, F$) is given by the general quadratic form: $$ Q_{hkl} = Ah^2 + Bk^2 + Cl^2 + Dkl + Ehl + Fhk $$ The software solves for these reciprocal parameters (or a subset of them, depending on the crystal system) and then converts them back to the real-space cell parameters for the final solution. Each trial cell is immediately refined and scored against the full peak list.

Quick Start Guide

Follow these steps for a standard indexing routine on a single-phase powder pattern.

Load Data File: Use the Select Data File button. Supported formats include .xy, .xrdml, .ras, etc.
Find Peaks: On the Peaks tab, adjust the Min peak (%), Radius (pts), and Points sliders to accurately capture your experimental peaks.
Review & Refine Peaks: Critically examine the peak list. Edit 2θ positions for accuracy, delete spurious peaks (noise, Kα2 shoulders if *not* stripping), and add any missed reflections using Ctrl + Click on the chart. A clean list of 15-20 peaks is ideal.
Set Parameters: On the Parameters tab:
- Select the appropriate X-ray Radiation Preset (e.g., Cu Ka).
- Decide whether to enable Strip K-alpha2. This choice updates the Ka1 Wavelength field automatically. The default is OFF, using the average Ka wavelength.
- Set a chemically sensible Max Volume (Å³) to constrain the search.
- Set the 2θ Error (°) appropriate for your instrument's resolution (e.g., 0.02° for synchrotron, 0.05° for lab data).
- The Refine Zero-Point Error checkbox is enabled by default and is highly recommended.
- Select the crystal systems to search. Start with higher symmetries first.
Start Indexing: Click Start Indexing.
Analyze Solutions: On the Solutions tab, review the results. Sort by M(20) and F(20) scores. Click on high-scoring solutions to visually compare the calculated (blue) and observed (red) peak markers on the chart. A good solution will also have plausible space group suggestions.

The User Interface

The application is divided into the Controls Panel (left) and the Results Area (right).

Controls Panel

This panel contains all inputs and controls, organized into three primary tabs.

1. Peaks Tab

Peak Finding Sliders: Control the background subtraction (Radius), data smoothing (Points), and peak detection threshold (Min peak, logarithmic scale).
2θ Range Sliders: Define the angular range for peak detection, useful for excluding noisy regions.
Peak Table: A list of all detected peaks. 2θ Obs (°) values can be manually edited for precision.

2. Parameters Tab

Radiation Preset: Select the X-ray source used (e.g., Cu Ka, Co Ka). This sets the appropriate wavelengths.
Ka1 Wavelength (Å): Displays the wavelength being used for calculations. If a preset is selected, this value automatically updates based on the Strip K-alpha2 setting (shows Ka1 if ON, average Ka if OFF). It is editable only when Custom preset is chosen.
Strip K-alpha2: If checked, applies a Rachinger algorithm to numerically remove the Kα2 component from the data before peak finding and analysis. This also sets the Ka1 Wavelength field to the pure Kα1 value. Default is OFF.
Max Volume (Å³): The upper limit for acceptable unit cell volume. A key parameter for constraining the search.
Impurity Peaks: Number of allowed un-indexed peaks among the first 20 when calculating M(20).
2θ Error (°): Tolerance for matching calculated to observed peaks.
Refine Zero-Point Error: Controls whether a full, unconstrained zero-point refinement is applied to the final reported solution.
Crystal Systems to Search: Checkboxes for Cubic, Tetragonal, Hexagonal, Orthorhombic, Monoclinic, and Triclinic systems.

3. Solutions Tab

Solutions Table: Lists all valid solutions found, with their system, parameters, volume, M(20) score, and F(20) score. Click a row to select it for visual analysis on the chart.
When the search is finished, you can filter the solutions based on symmetry by enabling/disabling crystal systems.

Results Area

Chart: Displays the diffraction pattern, observed peaks (red ticks), and calculated peaks for the selected solution (blue ticks). A good fit shows excellent alignment between red and blue ticks.
Chart Interaction: Zoom (mouse wheel), Pan (click-drag), Reset Zoom (right-click), Add Peak (Ctrl + Click).

Peak Finding in Detail

Accurate peak positions are the most critical input for successful indexing. The program uses a multi-step process to identify peaks from raw data.

The Algorithm Steps

Kα2 Stripping (Optional): If Strip K-alpha2 is checked, the Rachinger algorithm is applied to the raw intensity data first.
Background Subtraction: A "rolling ball" algorithm estimates and subtracts the background signal from the (potentially stripped) data. The Radius slider controls the size of the virtual ball.
Data Smoothing: A Savitzky-Golay filter is applied to the background-subtracted data to reduce noise while preserving peak shape. The Points slider controls the smoothing window size.
Peak Detection: The algorithm identifies local maxima in the smoothed data that are above the Min peak (%) threshold.
Position Refinement: To find a precise, sub-pixel peak position, the algorithm performs a 5-point least-squares quadratic fit (based on Savitzky-Golay coefficients) on the data points surrounding the detected maximum. This is more accurate than a simple 3-point parabola and robust against noise. If a peak is too close to the data's edge, it falls back to a 3-point fit.

Practical Advice

Start with default values and visually inspect the results.
If weak but clear peaks are missed, lower the Min peak (%). If picking up noise, increase it.
For patterns with a broad amorphous background, increase the Radius.
If data is very noisy, increase smoothing Points, but avoid over-smoothing, which can merge or shift peaks.
Always manually curate the final peak list. Remove artifacts and known impurity peaks. If not stripping Kα2, be sure to delete the Kα2 shoulders manually.
Kα2 stripping can simplify patterns but may introduce small artifacts. If indexing fails with stripping ON, try turning it OFF and manually removing the α2 peaks.

Indexing Algorithm and Search Parameters

The program employs a dedicated search routine for each crystal system. This is an "exhaustive" or "brute-force" trial method that iterates through combinations of low-angle peaks and low-integer Miller indices. The number of peaks required to generate a trial cell depends on the number of unknown lattice parameters.

System-by-System Search Logic

The search algorithm's goal is to solve a system of linear equations of the form $Q_{obs} = \sum P_i \cdot H_i$, where $Q_{obs}$ are the $1/d^2$ values from the observed peaks, $H_i$ are terms derived from the trial Miller indices (e.g., $h^2, k^2, l^2$), and $P_i$ are the reciprocal lattice parameters (e.g., $A=1/a^{*2}, B=1/b^{*2}, ...$) we want to find.

Cubic (1 parameter, $A=1/a^2$): Solves a 1x1 system: $$ Q_{obs, 1} = (h_1^2 + k_1^2 + l_1^2) \cdot A $$ The program iterates through the first 10 observed peaks, assigning each one a single trial $(hkl)$ vector (where $h,k,l$ are integers up to 8) to find a trial $a$-value.
Tetragonal & Hexagonal (2 parameters, $P_1, P_2$): Solves a 2x2 system of equations using pairs of peaks from the first 10: $$ Q_{obs, 1} = H_{1,a} \cdot P_1 + H_{1,c} \cdot P_2 $$ $$ Q_{obs, 2} = H_{2,a} \cdot P_1 + H_{2,c} \cdot P_2 $$ It assigns pairs of trial $(hkl)$ vectors (where $h,k,l$ are integers up to 5) to solve for $P_1$ and $P_2$, which are then converted to $a$ and $c$.
Orthorhombic (3 parameters, $A=1/a^2, B=1/b^2, C=1/c^2$): Solves a 3x3 system using triplets of peaks from the first 10 observed peaks ($C(10, 3) = 120$ combos): $$ Q_{obs, 1} = h_1^2 A + k_1^2 B + l_1^2 C $$ $$ Q_{obs, 2} = h_2^2 A + k_2^2 B + l_2^2 C $$ $$ Q_{obs, 3} = h_3^2 A + k_3^2 B + l_3^2 C $$ It systematically tests all combinations of 3 $(hkl)$ vectors drawn from the 50 simplest orthorhombic reflections ($C(40, 3) = 9,880$ combos).
Approx. 2.35 million trials.
Monoclinic (4 parameters, $A, B, C, D$): Solves a 4x4 system for the four reciprocal parameters.

It iterates through all combinations of 4 peaks from the first 10 (210 combos). For each peak quadruplet, it tests 100,000 randomly sampled HKL-quadruplets drawn from a basis set of 50 reflections. Finally, it tests all 24 permutations of assigning the peaks to the HKLs.

This defines a total search space of $\binom{10}{4} \times 100,000 \times 24 \approx \textbf{504 million combinations}$. This search covers approximately 43% of all possible HKL combinations from the 50-reflection basis set.
Parameters to configure in `indexMonoclinic`: MAX_PEAKS (10), MAX_BASIS (50), and HKL_SAMPLE_SIZE (100000).
Note: Search is constrained to cell edges between 2 and 50 Å, and a $\beta$ angle between 90° and 160°.
Triclinic (6 parameters, $A, B, C, D, E, F$): Solves the full 6x6 system for all reciprocal parameters. This is the most computationally demanding search.

It iterates through all combinations of 6 peaks from the first 10 (210 combos). For each peak sextuplet, it tests 500,000 randomly sampled HKL-sextuplets drawn from a basis set of 50 reflections. Finally, it tests all 720 (6!) permutations.

This defines a total search space of $\binom{10}{6} \times 500,000 \times 720 \approx \textbf{75.6 billion combinations}$. Due to the massive size of the 50-reflection basis space (>15 million combinations), this samples approximately 3.15% of the total space, relying on the high probability that correct indexing involves the simplest reflections.
Parameters to configure in `indexTriclinic`: MAX_PEAKS (10), MAX_BASIS (50), and HKL_COMBOS_LIMIT (500000).
Note: Search is constrained to cell edges between 2 and 50 Å, and all angles ($\alpha, \beta, \gamma$) between 60° and 150°.

A Note on Performance:
These search spaces are vast, for a full monoclinic search it can take about 10-20 minutes (and about one day for the triclinic !) but you can stop the program much earlier if you see a good solution. At the time of this writing (November 2025), the testing speed on a typical desktop machine, e.g. an Intel I7-12700, is about $10^5$ to $10^6$ trials per second for low symmetry systems. This is the case if the tab is visible, or active, otherwise the program is throttled by the browser.

Evaluating Solutions

The indexing search often produces multiple candidate solutions. During the search the solutions will be dynamically limited to best 50 solutions. Distinguishing the correct one requires a reliable figure of merit and careful refinement.

The de Wolff Figure of Merit: M(20)

Solutions are ranked by the de Wolff Figure of Merit, M(20), which assesses the fit's accuracy and completeness. It is calculated using the first 20 observed reflections. A high M(20) value is a strong indicator of a correct solution.

M(20) Value	Interpretation
> 20	The solution is very likely to be correct.
> 10	The solution is likely correct, especially if the volume is chemically sensible.
5 - 10	The solution is plausible and deserves further investigation.
< 5	The solution is likely spurious and should be viewed with skepticism.

The F(N) Figure of Merit

As a complementary metric to M(20), the program also calculates the F(N) Figure of Merit, typically for N=20 (F(20)). While M(20) focuses on the completeness and accuracy of the first 20 lines, F(N) provides a score based on the average positional accuracy of those lines.

It is defined by the formula: $$ F_N = \frac{N}{\langle |\Delta(2\theta)| \rangle \cdot N_{calc}} $$ Where:

$N$ is the number of observed lines being evaluated (e.g., 20).
$\langle |\Delta(2\theta)| \rangle$ is the average absolute discrepancy between observed and calculated $2\theta$ positions for those $N$ lines.
$N_{calc}$ is the number of *possible* theoretical reflections (including unobserved ones) up to the $2\theta$ position of the $N^{th}$ observed line.

A high F(N) value indicates a very high-quality fit with low average error. Together, a high M(20) (indicating a correct and complete cell) and a high F(N) (indicating high precision) give strong confidence in a solution.

Least-Squares and Zero-Point Refinement

Every promising trial cell is refined using a robust, two-stage process to find the best possible parameters and M(20) score.

Stage 1: Internal Zero-Point Correction

To generate the most accurate cell proposal, the program always performs an initial refinement that includes a zero-point error term. This internal correction is strictly limited (constrained to be no larger than the user-defined `2θ Error`) to account for minor instrument misalignments without overfitting. The resulting corrected peak positions are then used to refine a baseline set of lattice parameters. This makes the initial cell proposal much more robust against small experimental shifts.

Stage 2: Final (Optional) Zero-Point Refinement

If the cell from Stage 1 achieves a high M(20) score, and the user has the Refine Zero-Point Error checkbox enabled, a second, full and unconstrained refinement is performed. This final step solves for all parameters simultaneously to produce the final reported values and their estimated standard deviations. If this option is disabled, only the lattice parameters from Stage 1 are reported.

Space Group Analysis

After a high-scoring unit cell is found, the program provides an automated analysis to suggest the most probable space groups. This feature serves as a powerful guide for subsequent structure solution or Rietveld refinement.

Methodology

The analysis is a systematic process of elimination based on observed systematic absences:

Generate Unique Reflections: The program takes the refined unit cell and generates a complete theoretical reflection list, ensuring only crystallographically unique reflections (e.g., `(100)`, but not `(-100)`) are included to prevent self-ambiguity.
Index Observed Peaks: It then indexes all observed peaks from the user's list against this theoretical pattern.
Build High-Confidence Set: To avoid errors from accidental peak overlap (e.g., `(100)` and `(011)` having very similar $2\theta$ values), the algorithm filters the indexed list to find unambiguous reflections. A reflection is considered unambiguous only if no other theoretical $(hkl)$ line is calculated to be within the `2θ Error` tolerance of its position.
Determine Centering & Extinctions: This high-confidence set of observed $(hkl)$s is used for all further analysis. It is tested against the rules for lattice centerings and glide/screw axes. A violation is only counted if an observed unambiguous peak breaks a rule.
Rank Space Groups: The program filters its internal database for all space groups matching the solution's crystal system and plausible centering(s). Each candidate group is ranked by its number of violations.

How to Interpret the Results

0 Violations: This is the ideal result. It means that no observed unambiguous reflection violated any of the space group's extinction rules. These are the strongest candidates.
1-2 Violations: These are still plausible. A single violation could be caused by an experimental artifact or a weak, theoretically-forbidden reflection.
Ambiguous Peaks (in italics): In the PDF report's reflection table, any peak that was identified as ambiguous (and therefore excluded from the analysis) is printed in an italic font for easy identification.

Advanced Topics: Enhanced Search and Sieving

To improve the success rate, the program integrates several advanced search strategies after the initial indexing routine is complete. These "fishing" strategies test for non-obvious but crystallographically common relationships.

1. "Swap Fishing" for Ambiguity

For each promising solution, the program re-examines the indexing of the first four low-angle peaks. It identifies the two peaks that are closest to each other, as this is a likely point of mis-indexing due to ambiguity. It then creates a new hypothesis by swapping their HKL assignments and attempts to solve for a new trial cell. This new cell is then sent back through the full refinement and scoring process. If this "swapped" hypothesis is correct, it will often lead to a solution with a significantly higher M(20) score.

2. Matrix-based Cell Transformations

Solutions are transformed using crystallographic matrices to test for related primitive or higher-symmetry cells. For example, a body-centered cell ($I$) is tested for a valid primitive ($P$) equivalent.

3. HKL Divisor Analysis

The program examines the list of indexed Miller indices. If all indices in one direction (e.g., all $h$ values) share a common divisor, it tests a new cell with the corresponding axis halved (a sub-cell).

4. Orthorhombic to Hexagonal Check

A hexagonal lattice can sometimes be indexed as a C-centered orthorhombic cell where $b/a \approx \sqrt{3}$. The program specifically checks all orthorhombic solutions for this condition and generates the equivalent hexagonal cell for evaluation.

5. Niggli Cell Standardization

After a potential solution is found, the program calculates its Niggli reduced cell. This is a fundamental and unique, standardized representation of the crystal lattice, also known as the "reduced primitive cell".

The calculation first transforms the conventional (centered) cell found by the program—using the detected centering (e.g., 'I', 'F', 'C')—into its primitive form. It then applies a mathematical reduction algorithm to find the basis vectors ($a, b, c$) and angles ($\alpha, \beta, \gamma$) that are the most "compact" and obey a specific set of geometric rules.

This Niggli cell is extremely useful for:

Identifying a structure: It provides a canonical form for searching crystal databases (like the ICDD or CSD), as two different conventional cells (e.g., a C-centered monoclinic and a primitive monoclinic) can reduce to the same Niggli cell if they describe the same lattice.
Symmetry determination: The specific conditions of the Niggli cell parameters directly correlate with the 14 Bravais lattices and 7 crystal systems.
Standardization: It removes ambiguity in how a unit cell is reported.

The calculated Niggli cell for each high-scoring solution is reported in the detailed section of the PDF report to aid in further analysis and database comparison.

6. Final Sieving

After all searches, a final sieving process is applied to remove redundant solutions. If two solutions have very similar volumes (within 1%), the one with higher symmetry is preferred.

If the symmetries are also equal (e.g., two different monoclinic cells), the program compares their M(20) scores. Crucially, it treats M(20) scores as "effectively equal" if they are within a small tolerance (e.g., $\Delta M(20) < 0.05$). In this tie-breaker scenario, it invokes crystallographic conventions:

For Monoclinic Cells: A special filter is applied. If two monoclinic solutions have volumes within 2% of each other and have effectively equal M(20) scores, the program will prefer the solution whose $\beta$ angle is closer to 90° (e.g., 106° will be preferred over 147°).
For Other Systems: If M(20) scores are tied, the first solution found is retained.

Troubleshooting & FAQ

Why were no solutions found?

Poor Peak List: The most common cause. Ensure the peak list is accurate, with impurity peaks removed and positions refined. A minimum of 15-20 clean peaks is recommended.
Incorrect Parameters: Verify the Wavelength and ensure the Max Volume is chemically reasonable and sufficiently large.
High Zero-Point Error: While the program corrects for small zero errors internally, a very large instrument misalignment may still cause failure.
Sample is a Mixture: Indexing requires a peak list from a single crystalline phase.

Why is M(20) low for a visually good fit?

Incorrect Cell (Sub-multiple/Super-multiple): The found cell may be a multiple or sub-multiple of the true cell. It indexes some lines but is penalized by M(20) for being too "empty" or "dense" with calculated reflections.
High Error / Low Resolution: The 2θ Error might be set too tight for broad peaks. Try slightly increasing this value.
Spurious Solution: A random solution can sometimes fit a few lines by chance. Visual inspection is key; if major observed peaks are missed by the calculated pattern, the solution is incorrect regardless of the M(20) value.

Test files

In the github repository you will find several test files
Monoclinic_test_1.xy This a synchrotron XRD file for a monoclinic sample, measured with a monochromatic radiation of 0.79764 A. The values of the lattice parameters are 19.877, 8.196, 11.243 and beta 106.08.
C61Br2_079764.XY A dibromo-methano fullerene measured at ESRF with a wavelength of 0.79764 A. It contains an impurity peak at about 16.24 deg 2 theta. There are many solutions, a most likely is a cubic I, with a=18.92 A.
SPDDRR1_sample2_0692.xy Data taken from a round robin, measured at Daresbury with 0.692 A. The probable cell is orthorhombic with a=10.983, b=12.852, c=15.740 A.
SPDDRR1_sample2_Cu.xy Same sample as the previous listing, measured with a Cu Ka laboratory instrument.
SPDDRR1_zhu1_Cu.xy Sample 1 from the same Round Robin, measured with a Cu Ka X-ray tube; likely a monoclinic with a=7.672, b=9.624, c=7.076, beta=106.24
PbSO4.xra and FAP.xra Laboratory cu Ka samples: orthorhombic and hexagonal, respectively.

References

This program was developed by Nita Dragoe at Université Paris-Saclay (2024-2025) as a successor to a prior software, Powder, originally created by the same author in 1999-2000. If you use this program you can cite this reference https://doi.org/10.13140/RG.2.2.13443.57126.
For a deeper understanding of the methodology, consulting the original scientific papers is highly recommended.

The M(20) Figure of Merit:
de Wolff, P. M. (1968). "A Simplified Criterion for the Reliability of a Powder Pattern Indexing." Journal of Applied Crystallography, 1, 108-113.
The F(N) Figure of Merit:
Smith, G. S. & Snyder, R. L. (1979). "F(N): A Criterion for Rating Powder Diffraction Patterns and Evaluating the Reliability of Powder-Pattern Indexing." Journal of Applied Crystallography, 12, 60-65.
General Powder Diffraction Text:
Klug, H. P. & Alexander, L. E. (1974). X-Ray Diffraction Procedures for Polycrystalline and Amorphous Materials, 2nd ed. New York: Wiley-Interscience.
Kα2 Stripping Algorithm:
Rachinger, W. A. (1948). "A Correction for the α1α2 Doublet in the Measurement of Widths of X-Ray Diffraction Lines." Journal of Scientific Instruments, 25, 254-255.
Alternative Indexing Methods:
Ito, T. (1949). "A General Powder X-ray Photography."Nature, 164, 755-756.
Werner, P.-E., Eriksson, L., & Westdahl, M. (1985). "TREOR, a semi-exhaustive trial-and-error powder indexing program for all symmetries." Journal of Applied Crystallography, 18, 367-370.
Visser, J. W. (1969). "A fully automatic program for finding the unit cell from powder data." Journal of Applied Crystallography, 2, 89-95.
Le Bail, A. (2004). "Monte Carlo Indexing with McMaille". Powder Diffraction, 19(3), 249-254.
Boultif, A. & Louër, D. (2004). "Powder pattern indexing with the dichotomy method." Journal of Applied Crystallography, 37, 724-731.
Previous version:
Dragoe, N. (2001). "PowderV2: a suite of applications for powder X-ray diffraction calculations." Journal of Applied Crystallography, 34, 535.