Introduction: The Patterson Function

In crystallography, the "phase problem" prevents a direct calculation of the crystal structure (electron density, $\rho(x,y,z)$) from the measured diffraction intensities ($I_{hkl}$). The Patterson function, or Patterson map, is a powerful tool to overcome this, especially when a heavy atom is present in the structure (the "heavy-atom method").

A Patterson map is a Fourier transform of the intensities ($I_{hkl} \propto |F_{hkl}|^2$) rather than the structure factors ($F_{hkl}$), which contain the unknown phases. The formula is:

$$ P(u, v, w) = \frac{1}{V} \sum_{h} \sum_{k} \sum_{l} |F_{hkl}|^2 \cos(2\pi(hu + kv + lw)) $$

What do Patterson peaks mean?

The peaks in a Patterson map do not represent atom positions. Instead, they represent interatomic vectors.

If you have two atoms in the unit cell, atom $i$ at $(x_i, y_i, z_i)$ and atom $j$ at $(x_j, y_j, z_j)$, the Patterson map will show a peak at:

$$ (u, v, w) = (x_i - x_j, \;\; y_i - y_j, \;\; z_i - z_j) $$

The height of this peak is proportional to the product of the atomic numbers of the two atoms: $Z_i \cdot Z_j$.

Key Concept: The Heavy Atom Method
Because peak height is proportional to $Z_i \cdot Z_j$, vectors between heavy atoms (e.g., U-U, $Z=92$) will be vastly stronger than all other vectors (e.g., U-C or C-C). The Patterson map of a heavy-atom compound will be dominated by these few, very strong "heavy-atom vector" peaks.

If we can find the positions of these peaks, we can deduce the positions of the heavy atoms. This solves the phase problem and allows for the rest of the structure to be found.

Origin Peak: The map always has a very large peak at $(0, 0, 0)$. This represents all the "self-vectors" where $i=j$. It contains no useful information and is ignored.
Centrosymmetry: The map is always centrosymmetric, even if the crystal structure is not.

Harker Sections Explained

If a unit cell contains only one heavy atom in the asymmetric unit, the space group's symmetry operations will generate a set of equivalent atoms. The vectors between these symmetrically-related atoms are special.

These "Harker vectors" are not randomly distributed throughout the 3D Patterson map. They are constrained to lie on specific planes or lines, which are known as Harker sections.

Finding strong peaks on these known sections allows us to solve for the coordinates of the original heavy atom.

Example: A 2-fold screw axis ($2_1$)

A $2_1$ axis parallel to $c$ has the symmetry operation: $(x, y, z) \rightarrow (-x, -y, z + 1/2)$.

The interatomic vector (Harker vector) between the original atom and its symmetric equivalent is:

$u = x - (-x) = 2x$
$v = y - (-y) = 2y$
$w = z - (z + 1/2) = -1/2 \equiv 1/2$ (in fractional coordinates)

This vector is $(2x, 2y, 1/2)$. Notice that the $w$ coordinate is fixed at 1/2, regardless of the original atom's coordinates. This means all such vectors will lie on the Harker plane at $w=1/2$.

If we find a strong peak on this plane at $(u=0.6, v=0.4, w=0.5)$, we can solve for the atom's $x$ and $y$ coordinates:

$2x = 0.6 \implies x = 0.3$
$2y = 0.4 \implies y = 0.2$

Just by finding this one peak, we have determined $(x=0.3, y=0.2, z=?)$. The $z$ coordinate is unknown.

Example: A glide plane ($c$)

A $c$-glide plane perpendicular to $b$ has the operation: $(x, y, z) \rightarrow (x, -y, z + 1/2)$.

The Harker vector is: $(x-x, \;\; y-(-y), \;\; z-(z+1/2)) = (0, 2y, -1/2) \equiv (0, 2y, 1/2)$.

These vectors lie on the Harker line at $u=0, w=1/2$. If we find a peak at $(0, 0.4, 0.5)$, we can solve for $y$:

$2y = 0.4 \implies y = 0.2$

This gives us the partial coordinates $(x=?, y=0.2, z=?)$.

Real-World Example: Interpreting the PbSO₄ (Pnma) Report

This is a classic real-world example. The compound Lead(II) Sulfate (PbSO₄) crystallizes in space group Pnma (No. 62). The heavy Lead (Pb) atom is on a special Wyckoff position (4c), which fixes its $y$-coordinate at 0.250.

The specific symmetry operations for the setting of Pnma defined in the program's space_groups_harker.json file create Harker sections on specific planes. An analysis report for this compound shows the program found partial solutions on two key planes:

From Plane (v=0.000): A peak was found that, via its solver, gave the partial site (x=0.420, y=?, z=0.960).
From Plane (w=0.500): Another peak was found that gave the partial site (x=0.440, y=0.250, z=?).

This is a perfect example of the Site Consolidation algorithm in action. The program compares these two partial sites with the tolerance set to 0.030:

Are $x$ values close?
- $|0.420 - 0.440| = 0.020$.
- $0.020$ is less than the $0.030$ tolerance, so this is a match.
Are $y$ values close?
- Site 1 is undefined (?), Site 2 is 0.250. This is a match.
Are $z$ values close?
- Site 1 is 0.960, Site 2 is undefined (?). This is a match.

The program then "merges" them into one complete site:

$x$ = $(0.420 + 0.440) / 2 = 0.430$ (it averages the matching values)
$y$ = $0.250$ (from Site 2)
$z$ = $0.960$ (from Site 1)

The final result is the single, complete site (x=0.430, y=0.250, z=0.960), which is exactly what appears in the Consolidated Sites table.

Note: The report may show *other* partial sites, like (x=0.060, y=0.250, ?). These are not "wrong"—they are often related to the correct site by a symmetry operation (e.g., $x \approx 0.5 - 0.440$). The consolidation algorithm is designed to find the correct *pair* that combines to form a single, consistent solution.

By combining information from multiple Harker sections, the program can find all three coordinates $(x, y, z)$. This is the primary goal of the Harko program.

Program Overview & UI

The Harko program is a specialized tool designed to automate the Harker method. It is not a full structure solution package. Its purpose is mostly for teaching courses in my classes at Université Paris-Saclay. This program can:

Calculate the 3D Patterson map from an input file of reflection intensities.
Automatically find peaks on the known Harker sections for the given space group.
Attempt to combine these partial solutions to find a single, complete heavy atom site.

The User Interface

The application is split into the Controls Panel (left) and the Results Area (right).

Controls Panel

Select Data File: Button to load your input file. This is the first and most important step.
Map Tab:
- Crystal & Refinement Data: Displays the cell, space group, and reflection count parsed from your file.
- Map Resolution: Controls the grid size of the 3D Patterson map (e.g., $50 \times 50 \times 50$). Higher values are more accurate but much slower to calculate.
- Calculate 3D Map & Analyze: Starts the full analysis pipeline.
Slices Tab:
- Becomes active after calculation.
- Slice Sliders (U, V, W): Allow you to manually move through the 3D Patterson map one slice at a time. The 2D slice is shown in the results area.
Peaks Tab:
- Becomes active after calculation.
- Combination Tolerance: A crucial slider for combining partial solutions. See the Analysis & Workflow section.
- Proposed Heavy Atom Sites (Partial): A table listing all partial solutions found from all Harker sections.

Results Area

Patterson Map Slice: A 2D false-color plot of the map at the position selected by the `Slices` tab sliders.
Found Patterson Peaks: A table of the 50 strongest peaks found anywhere in the 3D map, sorted by height. This is for general inspection.
Consolidated Heavy Atom Sites: The most important table. This shows the program's final, combined $(x, y, z)$ coordinates for the heavy atom. This is your answer.

Data Input Format

The program requires a plain text file (.dat, .txt) that is typically the output of a Pawley or Le Bail refinement from a program like Powder5. An example of such a file is given in the github repository for this program. Files obtained by GSAS, FullProf, etc. could also be read, if minor changes are made to the datafiles.

The parser is designed to find three specific pieces of information. Your file must contain:

1. Space Group

A line identifying the space group number and/or name. The parser looks for a line like:

Space Group:  14  - P21/c

2. Unit Cell Parameters

Lines defining the lattice parameters. The parser is flexible and looks for patterns like:

a (A)       10.1234(5)
b (A)       8.4567(3)
c (A)       12.5555(6)
alpha (deg) 90.0
beta (deg)  101.23(1)
gamma (deg) 90.0

3. Reflection List

This is the most critical part. The program needs a list of reflections with their corresponding intensities ($I_{hkl}$), which are used as the $|F_{hkl}|^2$ values for the Patterson calculation.

The parser expects a list where each line contains the $(h, k, l)$ index, a 2-theta value, and then the intensity. The parser is very specific about the format.

For example, given a Pawley refinement output that looks like this:

--- Reflections List (Integrated Intensities) ---
h,k,l           2th_corr (°)      I_extracted       ESD(I)           I_obs (Area)
-------------------------------------------------------------------------------------------
(1,0,1)       16.4493           57.9                           57.1
(0,1,1)       20.7851           1874.8                         2122.3
(2,0,0)       20.9150           859.0                          610.5

The program's parser is built to:

Read the (h,k,l) values (e.g., (1,0,1)).
Match and skip the next column of numbers (the 2th_corr, e.g., 16.4493).
Read and use the column immediately following it as the intensity (the I_extracted, e.g., 57.9).

It ignores all subsequent columns on the line. Therefore, it is the I_extracted (refined) intensity that is used for the Patterson map, not I_obs.

The `space_groups_harker.json` File

This program is not hard-coded with the Harker section definitions. It reads them from a file named space_groups_harker.json, which must be in the same directory.

This JSON file contains an entry for each space group, defining its Harker sections and, most importantly, the "solver" equations.

A sample entry for $P2_1$ (No. 4) might look like this:

"4": {
    "name": "P 21",
    "harker_sections": [
        {
            "type": "screw",
            "coordinate": "v",
            "value": 0.5,
            "solver": { "x": "u/2", "y": "?", "z": "w/2" }
        }
    ]
}

This tells the program: "For space group 4, look for peaks on the plane $v=0.5$. If you find one at $(u_0, 0.5, w_0)$, the partial atom site is $(x = u_0/2, y = ?, z = w_0/2)$."

Calculation Methodology

When you click Calculate 3D Map & Analyze, the program executes a precise sequence of operations.

1. 3D Patterson Map Calculation

The program allocates a 3D grid of size $N \times N \times N$, where $N$ is the Map Resolution. It then loops through every grid point $(i, j, k)$ and calculates the Patterson function $P(u, v, w)$ using the direct summation formula:

$$ P(u, v, w) = \frac{1}{V} \sum_{h,k,l} I_{hkl} \cdot \cos(2\pi(hu + kv + lw)) $$

Where:

$(u, v, w) = (i/N, j/N, k/N)$ are the fractional coordinates of the grid point.
$I_{hkl}$ is the intensity for the $(hkl)$ reflection from your input file.
$V$ is the unit cell volume.

This 3D array of floating-point numbers is the full Patterson map.

2. 3D Peak Search

Next, the program performs a simple 3D peak search on the calculated map.

It finds the minimum and maximum values in the entire 3D map.
It sets a detection threshold (e.g., 15% above the minimum).
It iterates through every point in the map. A point is considered a "peak" if it is a local maximum (i.e., its value is greater than all 26 of its immediate neighbors) and is above the threshold.
All found peaks are sorted by height, and the top 50 strongest peaks are saved. These are displayed in the `Found Patterson Peaks` table.

3. Harker Section Analysis

This is the core logic. The program now attempts to find the partial atomic coordinates.

It looks up the space group (e.g., "14") in the loaded space_groups_harker.json data.
It gets the list of all defined Harker sections for that space group.
It loops through each of the 50 peaks found in Step 2.
For each peak, it loops through each Harker section definition.
It checks if the peak's coordinate matches the section's definition within a small tolerance (typically $1.5 \times$ the grid spacing).
Example:
- A peak is at $(u=0.602, v=0.401, w=0.501)$.
- A Harker definition exists for a $2_1$ screw axis: { coordinate: "w", value: 0.5, ... }.
- The program sees that $w=0.501$ is very close to $0.5$. This is a match.
Solving:
- The program retrieves the "solver" for that matched section, e.g., { "x": "u/2", "y": "v/2", "z": "?" }.
- It executes these equations using the peak's coordinates:
  - $x = u/2 = 0.602 / 2 = 0.301$
  - $y = v/2 = 0.401 / 2 = 0.2005$
  - $z = ?$
- This partial solution (x=0.301, y=0.201, z=?) is added to the Proposed Heavy Atom Sites (Partial) table.

Analysis & Workflow: Site Consolidation

After the calculation, you will have a list of partial sites in the `Peaks` tab. The final step, Site Consolidation, is to automatically combine these partial solutions to find a single, complete $(x, y, z)$ coordinate.

The Consolidation Algorithm

The program iterates through all possible pairs of partial sites, looking for a logical combination.

Imagine this scenario:

The program found two partial sites for space group $P2_1/c$:

From the $2_1$ screw axis: (x=0.301, y=?, z=0.449)
From the $c$-glide plane: (x=0.302, y=0.125, z=?)

The algorithm compares these two sites. It checks if the defined coordinates are close to each other.

Is $x_1 \approx x_2$? Yes: $0.301 \approx 0.302$ (within tolerance).
Is $z_1 \approx z_2$? No: $z_1=0.449$ but $z_2$ is unknown.
Is $y_1 \approx y_2$? No: $y_1$ is unknown but $y_2=0.125$.

Because they share a common coordinate ($x$), the program "merges" or "consolidates" them into one complete site, taking the defined part from each:

Final Site: `(x=0.301, y=0.125, z=0.449)` (using the average for $x$)

This final, complete site is then added to the `Consolidated Heavy Atom Sites` table. This is the program's primary result.

Using the `Combination Tolerance` Slider

What if the partial sites were:

(x=0.301, y=?, z=0.449)
(x=0.330, y=0.125, z=?)

Here, $x_1$ and $x_2$ are not very close. The default tolerance (e.g., 0.03) might be too small, and the algorithm would fail to combine them.

By moving the Combination Tolerance slider up to, say, 0.04, you are telling the algorithm: "It's okay, 0.301 and 0.330 are close enough. Combine them."

This slider directly controls the final consolidation step, and moving it will re-run the analysis without re-calculating the 3D map.

Recommended Workflow

Load your data file.
Press Calculate 3D Map & Analyze.
Look first at the `Consolidated Heavy Atom Sites` table.
If sites are present: Congratulations. These are your most likely heavy atom coordinates.
If no sites are present:
- Go to the `Peaks` tab.
- Look at the `Proposed Heavy Atom Sites (Partial)` table. Are there entries here?
- If YES, slowly increase the Combination Tolerance slider and see if any consolidated sites appear.
- If NO (even the partial list is empty), go to Troubleshooting.
(Optional) Go to the `Slices` tab and use the sliders to manually explore the map. Look at the Harker sections (e.g., $w=0.5$) to visually confirm if strong peaks are present.

Troubleshooting & FAQ

Error on load: "Could not parse Space Group" or "Could not determine cell parameters".

Your input file is missing this information or it is in a format the program cannot read. Manually edit the .dat file to include lines like Space Group: 14 and a (A) 10.123.

No peaks found / Map is blank.

This most likely means the parser failed to read your reflection list. Check that your file has a list of $(hkl)$s and intensities.
It's also possible all your intensities are zero or the data is extremely poor.

"No matching Harker peaks found" or "Partial Sites" table is empty.

Most likely cause: The heavy atom is in a "special position" (e.g., on a center of symmetry, or on the rotation axis itself). In these cases, some or all Harker vectors become zero-vectors (piling up at the origin) and cannot be found.
The wrong space group was assigned.
The space_groups_harker.json file is missing, or it has no entries for your space group.

"Partial sites found, but no pairs combined" / "Consolidated Sites" table is empty.

This is the most common scenario. The program is finding partial solutions, but they don't agree.
Solution 1: Slowly increase the Combination Tolerance. The coordinates may be just outside the default tolerance.
Solution 2: Look at the partial sites manually. You may be able to spot the correct combination yourself. For example, if you see (x=0.2, y=?, z=0.4) and (x=?, y=0.1, z=0.9), they don't combine. But if you also see (x=0.2, y=0.1, z=?), you can manually combine all three to get (0.2, 0.1, 0.4) (since $z=0.9$ is equivalent to $z=-0.1$, which is different from $0.4$).
Solution 3: There may be two heavy atoms, which complicates the Patterson map beyond what this simple program can auto-solve.

References

Original Patterson Paper:
Patterson, A. L. (1934). "A Fourier Series Method for the Determination of the Components of Interatomic Distances in Crystals". Physical Review, 46, 372–376.
Original Harker Paper:
Harker, D. (1936). "The Application of the Three-Dimensional Patterson Method and the Crystal Structures of Proustite, Ag3AsS3, and Pyrargyrite, Ag3SbS3". Journal of Chemical Physics, 4, 381–390.
General Crystallography Text:
Giacovazzo, C. (ed.) (2011). Fundamentals of Crystallography. 3rd ed. Oxford: Oxford University Press.
International Tables for Crystallography:
Volume A: Space-Group Symmetry. (Contains the definitive list of symmetry operations).