API

SOM Network

class simpsom.network.SOMNet(net_height: int, net_width: int, data: ndarray, load_file: Optional[str] = None, metric: str = 'euclidean', topology: str = 'hexagonal', neighborhood_fun: str = 'gaussian', init: str = 'random', PBC: bool = False, GPU: bool = False, CUML: bool = False, random_seed: Optional[int] = None, debug: bool = False, output_path: str = './')

Bases: object

Kohonen SOM Network class.

Initialize the SOM network.

Parameters:

net_height (int) – Number of nodes along the first dimension.
net_width (int) – Numer of nodes along the second dimension.
data (array) – N-dimensional dataset.
load_file (str) – Name of file to load containing information to initialize the network weights.
metric (string) – distance metric for the identification of best matching units. Accepted metrics are euclidean, manhattan, and cosine (default “euclidean”).
topology (str) – topology of the map tiling. Accepted shapes are hexagonal, and square (default “hexagonal”).
neighborhood_fun (str) – neighbours drop-off function for training, choose among gaussian, mexican_hat and bubble (default “gaussian”).
init (str or list[array, ...]) – Nodes initialization method, choose between random or PCA (default “random”).
PBC (boolean) – Activate/deactivate periodic boundary conditions, warning: only quality threshold clustering algorithm works with PBC (default False).
GPU (boolean) – Activate/deactivate GPU run with RAPIDS (requires CUDA, default False).
CUML (boolean) – Use CUML for clustering. If deactivate, use scikit-learn instead (requires CUDA, default False).
random_seed (int) – Seed for the random numbers generator (default None).
debug (bool) – Set logging level printed to screen as debug.
out_path (str) – Path to the folder where all data and plots will be saved (default, current folder).

CUML

GPU

PBC

__init__(net_height: int, net_width: int, data: ndarray, load_file: Optional[str] = None, metric: str = 'euclidean', topology: str = 'hexagonal', neighborhood_fun: str = 'gaussian', init: str = 'random', PBC: bool = False, GPU: bool = False, CUML: bool = False, random_seed: Optional[int] = None, debug: bool = False, output_path: str = './') → None

Initialize the SOM network.

Parameters:

net_height (int) – Number of nodes along the first dimension.
net_width (int) – Numer of nodes along the second dimension.
data (array) – N-dimensional dataset.
load_file (str) – Name of file to load containing information to initialize the network weights.
metric (string) – distance metric for the identification of best matching units. Accepted metrics are euclidean, manhattan, and cosine (default “euclidean”).
topology (str) – topology of the map tiling. Accepted shapes are hexagonal, and square (default “hexagonal”).
neighborhood_fun (str) – neighbours drop-off function for training, choose among gaussian, mexican_hat and bubble (default “gaussian”).
init (str or list[array, ...]) – Nodes initialization method, choose between random or PCA (default “random”).
PBC (boolean) – Activate/deactivate periodic boundary conditions, warning: only quality threshold clustering algorithm works with PBC (default False).
GPU (boolean) – Activate/deactivate GPU run with RAPIDS (requires CUDA, default False).
CUML (boolean) – Use CUML for clustering. If deactivate, use scikit-learn instead (requires CUDA, default False).
random_seed (int) – Seed for the random numbers generator (default None).
debug (bool) – Set logging level printed to screen as debug.
out_path (str) – Path to the folder where all data and plots will be saved (default, current folder).

_get(data: ndarray) → ndarray

Moves data from GPU to CPU. If already on CPU, it will be left as it is.

Parameters:: data (array) – data to move from GPU to CPU.
Returns:: the same data on CPU.
Return type:: (array)

_get_n_process() → int

Count number of GPU or CPU processors.

Returns:: the number of processors.
Return type:: (int)

_randomize_dataset(data: ndarray, epochs: int) → ndarray

Generates a random list of datapoints indices for online training.

Parameters:

data (array or list) – N-dimensional dataset.
epochs (int) – Number of training iterations.

Returns:

array with randomized indices

Return type:

entries (array)

_set_weights(load_file: Optional[str] = None) → None

Set initial map weights values, either by loading them from file or with random/PCA.

Parameters:: load_file (str) – Name of file to load containing information to initialize the network weights.

_update_learning_rate(n_iter: int) → None

Update the learning rate.

Parameters:: n_iter (int) – Iteration number.

_update_sigma(n_iter: int) → None

Update the gaussian sigma.

Parameters:: n_iter (int) – Iteration number.

cluster(coor: ndarray, project: bool = True, algorithm: str = 'DBSCAN', file_name: str = './som_clusters.npy', **kwargs: str) → List[int]

Project data onto the map and find clusters with scikit-learn clustering algorithms.

Parameters:

coor (array) – An array containing datapoints to be mapped or pre-mapped if project False.
project (bool) – if True, project the points in coor onto the map.
algorithm (clustering obj or str) – The clusters identification algorithm. A scikit-like class can be provided (must have a fit method), or a string indicating one of the algorithms provided by the scikit library
file_name (str) – Name of the file to which the data will be saved if not None.
kwargs (dict) – Keyword arguments to the clustering algorithm:
Returns –
int) ((list of) – A list containing the clusters of the input array datapoints.

cluster_algo

convergence

data

distance

epochs

find_bmu_ix(vecs: array) → SOMNode

Find the index of the best matching unit (BMU) for a given list of vectors.

Parameters:: vec (array or list[lists, ..]) – vectors whose distance from the network nodes will be calculated.
Returns:: The best matching unit node index.
Return type:: bmu (SOMNode)

get_nodes_difference() → None: Extracts the neighbouring nodes difference in weights and assigns it to each node object.

height

init

learning_rate

metric

neighborhood_fun

neighborhoods

nodes

nodes_list

output_path

pca(matrix: ndarray, n_pca: int) → ndarray

Get principal components to initialize network weights.

Parameters:

matrix (array) – N-dimensional dataset.
n_pca (int) – number of components to keep.

Returns:

Principal axes in feature space,: representing the directions of maximum variance in the data.

Return type:

(array)

plot_clusters(coor: ndarray, clusters: list, color_val: Optional[ndarray] = None, project: bool = False, jitter: bool = False, show: bool = False, print_out: bool = True, **kwargs: Tuple[int]) → None

Project points onto the trained 2D map and plot the result.

Parameters:

coor (array) – An array containing datapoints to be mapped or pre-mapped if project False.
clusters (list) – Cluster assignment list.
color_val (array) – The feature value to use as color map, if None the map will be plotted as white.
project (bool) – if True, project the points in coor onto the map.
jitter (bool) – if True, add jitter to points coordinates to help with overlapping points.
show (bool) – Choose to display the plot.
print_out (bool) – Choose to save the plot to a file.
kwargs (dict) – Keyword arguments to format the plot, such as - figsize (tuple(int, int)), the figure size; - title (str), figure title; - cbar_label (str), colorbar label; - labelsize (int), font size of label, the title 15% larger, ticks 15% smaller; - cmap (ListedColormap), a custom cmap.

plot_convergence(show: bool = False, print_out: bool = True, **kwargs: Tuple[int]) → None

Plot the the map training progress according to the chosen convergence criterion, when train_algo is batch.

Parameters:

show (bool) – Choose to display the plot.
print_out (bool) – Choose to save the plot to a file.
kwargs (dict) – Keyword arguments to format the plot, such as - figsize (tuple(int, int)), the figure size; - title (str), figure title; - xlabel (str), x-axis label; - ylabel (str), y-axis label; - logx (bool), if True set x-axis to logarithmic scale; - logy (bool), if True set y-axis to logarithmic scale; - labelsize (int), font size of label, the title 15% larger, ticks 15% smaller;

plot_map_by_difference(show: bool = False, print_out: bool = True, **kwargs: Tuple[int]) → None

Wrapper function to plot a trained 2D SOM map color-coded according neighbours weights difference. It will automatically calculate the difference values if not already computed.

Parameters:

show (bool) – Choose to display the plot.
print_out (bool) – Choose to save the plot to a file.
kwargs (dict) – Keyword arguments to format the plot, such as - figsize (tuple(int, int)), the figure size; - title (str), figure title; - cbar_label (str), colorbar label; - labelsize (int), font size of label, the title 15% larger, ticks 15% smaller; - cmap (ListedColormap), a custom cmap.

plot_map_by_feature(feature_ix: int, show: bool = False, print_out: bool = True, **kwargs: Tuple[int]) → None

Wrapper function to plot a trained 2D SOM map color-coded according to a given feature.

Parameters:

feature_ix (int) – The feature index number to use as color map.
show (bool) – Choose to display the plot.
print_out (bool) – Choose to save the plot to a file.
kwargs (dict) – Keyword arguments to format the plot, such as - figsize (tuple(int, int)), the figure size; - title (str), figure title; - cbar_label (str), colorbar label; - labelsize (int), font size of label, the title 15% larger, ticks 15% smaller; - cmap (ListedColormap), a custom cmap.

plot_projected_points(coor: ndarray, color_val: Optional[ndarray] = None, project: bool = True, jitter: bool = True, show: bool = False, print_out: bool = True, **kwargs: Tuple[int]) → None

Project points onto the trained 2D map and plot the result.

Parameters:

coor (array) – An array containing datapoints to be mapped or pre-mapped if project False.
color_val (array) – The feature value to use as color map, if None the map will be plotted as white.
project (bool) – if True, project the points in coor onto the map.
jitter (bool) – if True, add jitter to points coordinates to help with overlapping points.
show (bool) – Choose to display the plot.
print_out (bool) – Choose to save the plot to a file.
kwargs (dict) – Keyword arguments to format the plot, such as - figsize (tuple(int, int)), the figure size; - title (str), figure title; - cbar_label (str), colorbar label; - labelsize (int), font size of label, the title 15% larger, ticks 15% smaller; - cmap (ListedColormap), a custom cmap.

polygons

project_onto_map(array: ndarray, file_name: str = './som_projected.npy') → list

Project the datapoints of a given array to the 2D space of the SOM by calculating the bmus.

Parameters:

array (array) – An array containing datapoints to be mapped.
file_name (str) – Name of the file to which the data will be saved if not None.

Returns:

bmu x,y position for each input array datapoint.

Return type:

(list)

save_map(file_name: str = 'trained_som.npy') → None

Saves the network dimensions, the pbc and nodes weights to a file.

Parameters:: file_name (str) – Name of the file where the data will be saved.

sigma

start_learning_rate

start_sigma

tau

train(train_algo: str = 'batch', epochs: int = - 1, start_learning_rate: float = 0.01, early_stop: Optional[str] = None, early_stop_patience: int = 3, early_stop_tolerance: float = 0.0001, batch_size: int = - 1) → None

Train the SOM.

Parameters:

train_algo (str) – training algorithm, choose between “online” or “batch” (default “online”). Beware that the online algorithm will run one datapoint per epoch, while the batch algorithm runs all points at one for each epoch.
epochs (int) – Number of training iterations. If not selected (or -1) automatically set epochs as 10 times the number of datapoints. Warning: for online training each epoch corresponds to 1 sample in the input dataset, for batch training it corresponds to one full dataset training.
start_learning_rate (float) – Initial learning rate, used only in online learning.
early_stop (str) – Early stopping method, for now only “mapdiff” (checks if the weights of nodes don”t change) is available. If None, don”t use early stopping (default None).
early_stop_patience (int) – Number of iterations without improvement before stopping the training, only available for batch training (default 3).
early_stop_tolerance (float) – Improvement tolerance, if the map does not improve beyond this threshold, the early stopping counter will be activated (it needs to be set appropriately depending on the used distance metric). Ignored if early stopping is off (default 1e-4).
batch_size (int) – Split the dataset in batches of this size when calculating the new weights, works only when train_algo is “batch” and helps keeping down the memory requirements when working with large datasets, if -1 run the whole dataset at once.

width

xp

class simpsom.network.SOMNode(x: int, y: int, num_weights: int, net_height: int, net_width: int, PBC: bool, polygons: ~simpsom.polygons.Polygon, xp: module = <module 'numpy' from '/home/docs/checkouts/readthedocs.org/user_builds/simpsom/conda/latest/lib/python3.8/site-packages/numpy/__init__.py'>, init_vec: ~typing.Optional[~numpy.ndarray] = None, weights_array: ~typing.Optional[~numpy.ndarray] = None)

Bases: object

Single Kohonen SOM node class.

Initialize the SOM node.

Parameters:

x (int) – Position along the first network dimension.
y (int) – Position along the second network dimension
num_weights (int) – Length of the weights vector.
net_height (int) – Network height, needed for periodic boundary conditions (PBC)
net_width (int) – Network width, needed for periodic boundary conditions (PBC)
PBC (bool) – Activate/deactivate periodic boundary conditions.
polygons (Polygon obj) – a polygon object with information on the map topology.
xp (numpy or cupy) – the numeric library to be used.
weight_bounds (array) – boundary values for the random initialization of the weights. Must be in the format [min_val, max_val]. They are overwritten by “init_vec”.
init_vec (array) – Array containing the two custom vectors (e.g. PCA) for the weights initalization.
weights_array (array) – Array containing the weights to give to the node if loaded from a file.

PBC

__init__(x: int, y: int, num_weights: int, net_height: int, net_width: int, PBC: bool, polygons: ~simpsom.polygons.Polygon, xp: module = <module 'numpy' from '/home/docs/checkouts/readthedocs.org/user_builds/simpsom/conda/latest/lib/python3.8/site-packages/numpy/__init__.py'>, init_vec: ~typing.Optional[~numpy.ndarray] = None, weights_array: ~typing.Optional[~numpy.ndarray] = None) → None

Initialize the SOM node.

Parameters:

x (int) – Position along the first network dimension.
y (int) – Position along the second network dimension
num_weights (int) – Length of the weights vector.
net_height (int) – Network height, needed for periodic boundary conditions (PBC)
net_width (int) – Network width, needed for periodic boundary conditions (PBC)
PBC (bool) – Activate/deactivate periodic boundary conditions.
polygons (Polygon obj) – a polygon object with information on the map topology.
xp (numpy or cupy) – the numeric library to be used.
weight_bounds (array) – boundary values for the random initialization of the weights. Must be in the format [min_val, max_val]. They are overwritten by “init_vec”.
init_vec (array) – Array containing the two custom vectors (e.g. PCA) for the weights initalization.
weights_array (array) – Array containing the weights to give to the node if loaded from a file.

_set_difference(diff_value: Union[float, int]) → None

Set the neighbouring nodes weights difference.

Parameters:: diff_value (float or int) –

_update_weights(input_vec: ndarray, sigma: float, learning_rate: float, bmu: SOMNode) → None

Update the node weights.

Parameters:

input_vec (array) – A weights vector whose distance drives the direction of the update.
sigma (float) – The updated gaussian sigma.
learning_rate (float) – The updated learning rate.
bmu (SOMNode) – The best matching unit.

difference

get_node_distance(node: SOMNode) → float

Calculate the distance within the network between the current node and second node.

Parameters:: node (SOMNode) – The node from which the distance is calculated.
Returns:: The distance between the two nodes.
Return type:: (float)

height

polygons

pos

weights

width

xp

Distance functions

class simpsom.distances.Distance(xp: Optional[module] = None)

Bases: object

Container class for distance functions.

Instantiate the Distance class.

Parameters:: xp (numpy or cupy) – the numeric labrary to use to calculate distances.

__init__(xp: Optional[module] = None) → None

Instantiate the Distance class.

Parameters:: xp (numpy or cupy) – the numeric labrary to use to calculate distances.

_euclidean_squared_distance(x: ndarray, w: ndarray, w_flat_sq: Optional[ndarray] = None) → float

Calculate the full squared L2 distance.

Parameters:

x (array) – first array.
w (array) – second array.

Returns:

the full L2 squared distance between two: provided arrays

Return type:

(float)

_euclidean_squared_distance_part(x: ndarray, w: ndarray, w_flat_sq: Optional[ndarray] = None) → float

Calculate the partial squared L2 distance.

Parameters:

x (array) – first array.
w (array) – second array.

Returns:

the partial L2 squared distance between two: provided arrays

Return type:

(float)

batchpairdist(x: ndarray, w: ndarray, sq: ndarray, metric: str) → ndarray

Calculates distances betweens points in batches. Two array-like objects must be provided, distances will be calculated between all points in the first array and all those in the second array.

Parameters:

a (array) – first array.
b (array) – second array.
metric (string) – distance metric. Accepted metrics are euclidean, manhattan, and cosine(default “euclidean”).

Returns:

the calculated distances.

Return type:

d(array or list)

cosine_distance(x: ndarray, w: ndarray, w_flat_sq: ndarray) → float

Calculate the cosine distance between two arrays.

Parameters:

x (array) – first array.
w (array) – second array.

Returns:

the euclidean distance between two: provided arrays

Return type:

(float)

euclidean_distance(x: ndarray, w: ndarray, w_flat_sq: ndarray) → float

Calculate the L2 distance between two arrays.

Parameters:

x (array) – first array.
w (array) – second array.

Returns:

the euclidean distance between two: provided arrays

Return type:

(float)

manhattan_distance(x: ndarray, w: ndarray) → float

Calculate Manhattan distance between two arrays.

Parameters:

x (array) – first array.
w (array) – second array.

Returns:

the manhattan distance: between two provided arrays.

Return type:

(float)

pairdist(a: ndarray, b: ndarray, metric: str) → ndarray

Calculates distances betweens points. Two array-like objects must be provided, distances will be calculated between all points in the first array and all those in the second array.

Parameters:

a (array) – first array.
b (array) – second array.
metric (string) – distance metric. Accepted metrics are euclidean, manhattan, and cosine(default “euclidean”).

Returns:

the calculated distances.

Return type:

d(array or list)

Neighborhood functions

class simpsom.neighborhoods.Neighborhoods(xp: module, xx: ndarray, yy: ndarray, pbc_func: Optional[Callable])

Bases: object

Container class with functions to calculate neighborhoods.

Instantiate the Neighborhoods class.

Parameters:

xp (numpy or cupy) – the numeric labrary to use to calculate distances.
xx (array) – x coordinates in the grid mesh.
yy (array) – y coordinates in the grid mesh.
pbc_function (Callable) – function to extend a distance function to account for pbc, as defined in polygons

__init__(xp: module, xx: ndarray, yy: ndarray, pbc_func: Optional[Callable]) → None

Instantiate the Neighborhoods class.

Parameters:

xp (numpy or cupy) – the numeric labrary to use to calculate distances.
xx (array) – x coordinates in the grid mesh.
yy (array) – y coordinates in the grid mesh.
pbc_function (Callable) – function to extend a distance function to account for pbc, as defined in polygons

bubble(c: ndarray, n: ndarray, threshold: float) → ndarray

Bubble neighborhood function.

Parameters:

c (np.ndarray) – center point.
n (np.ndarray) – matrix of nodes positions.
threshold (float) – the bubble threshold.

Returns: (np.ndarray): a matrix of distances.

gaussian(c: ndarray, n: ndarray, denominator: float) → ndarray

Gaussian neighborhood function.

Parameters:

c (np.ndarray) – center point.
n (np.ndarray) – matrix of nodes positions.
denominator (float) – the 2sigma**2 value.

Returns: (np.ndarray): a matrix of distances.

mexican_hat(c: ndarray, n: ndarray) → ndarray

Mexican hat neighborhood function.

Parameters:

c (np.ndarray) – center point.
n (np.ndarray) – matrix of nodes positions.

Returns: (np.ndarray): a matrix of distances.

neighborhood_caller(neigh_func: str, center: Tuple[ndarray], sigma: float) → ndarray

Returns a neighborhood selection on any 2d topology.

Parameters:

center (Tuple[np.ndarray]) – index of the center point along the xx yy grid.
sigma (float) – standard deviation/size coefficient.
nigh_func (str) – neighborhood specific distance function name (choose among ‘gaussian’, ‘mexican_hat’ or ‘bubble’)

Returns:

the resulting neighborhood matrix.

Return type:

(array)

Map tiling

class simpsom.polygons.Hexagons

Bases: Polygon

Class to define a hexagonal tiling.

static _tile(coor: Tuple[float], color: Tuple[float], edgecolor: Optional[Tuple[float]] = None) → type

Set the hexagonal tile for plotting.

Parameters:

coor (tuple[float, float]) – positon of the tile in the plot figure.
color (tuple[float,...]) – color tuple.
edgecolor (tuple[float,...]) – border color tuple.

Returns:

the tile to add to the plot.

Return type:

(matplotlib patch object)

static distance_pbc(node_a: ~numpy.ndarray, node_b: ~numpy.ndarray, net_shape: ~typing.Tuple[float], distance_func: ~typing.Callable, axis: ~typing.Optional[int] = None, xp: module = <module 'numpy' from '/home/docs/checkouts/readthedocs.org/user_builds/simpsom/conda/latest/lib/python3.8/site-packages/numpy/__init__.py'>) → float

Manage distances with PBC based on the tiling.

Parameters:

node_a (np.ndarray) – the first node from which the distance will be calculated.
node_b (np.ndarray) – the second node from which the distance will be calculated.
net_shape (tuple[float, float]) – the sizes of the network.
distance_func (function) – the function to calculate distance between nodes.
axis (int) – axis along which the minimum distance across PBC will be calculated.
xp (numpy or cupy) – the numeric library to handle arrays.

Returns:

the distance adjusted by PBC.

Return type:

(float)

static neighborhood_pbc(center_node: ~typing.Tuple[~numpy.ndarray], nodes: ~typing.Tuple[~numpy.ndarray], net_shape: ~typing.Tuple[float], distance_func: ~typing.Callable, xp: module = <module 'numpy' from '/home/docs/checkouts/readthedocs.org/user_builds/simpsom/conda/latest/lib/python3.8/site-packages/numpy/__init__.py'>) → ndarray

Manage neighborhood with PBC based on the tiling, adapted for batch training neighborhood functions. Works along a single provided axis and calculates the distance of a single node (center_node) from all other nodes in the network (nodes)

Parameters:

center_node (Tuple[np.ndarray]) – position (index) of the first node along the provided axis. Shaped as (net_shape[1], 1, 1), for each axis.
nodes (Tuple[np.ndarray]) – the position of all nodes long a given axis as a matrix. Shaped as (1, net_shape[1], net_shape[0]), for each axis.
net_shape (tuple[float, float]) – the sizes of the network.
distance_func (function) – the function to calculate distance between nodes.
xp (numpy or cupy) – the numeric library to handle arrays.

Returns:

the distance from all nodes adjusted by PBC.

Return type:

(np.ndarray)

static to_tiles(coor: Tuple[float]) → ndarray

Convert 2D cartesian coordinates to tiling coordinates.

Parameters:: coor (tuple[float,..]) – the Cartesian coordinates.
Returns:: a 2d array containing the coordinates in the new space.
Return type:: array

topology = 'hexagonal'

class simpsom.polygons.Polygon

Bases: object

General class to define a custom polygonal tiling.

static _tile(coor: Tuple[float], color: Tuple[float], edgecolor: Optional[Tuple[float]] = None) → type

Set the tile shape for plotting.

Parameters:

coor (tuple[float, float]) – positon of the tile in the plot figure.
color (tuple[float,...]) – color tuple.
edgecolor (tuple[float,...]) – border color tuple.

Returns:

the tile to add to the plot.

Return type:

(matplotlib patch object)

static distance_pbc(node_a: ~numpy.ndarray, node_b: ~numpy.ndarray, net_shape: ~typing.Tuple[float], distance_func: ~typing.Callable, axis: ~typing.Optional[int] = None, xp: module = <module 'numpy' from '/home/docs/checkouts/readthedocs.org/user_builds/simpsom/conda/latest/lib/python3.8/site-packages/numpy/__init__.py'>) → float

Manage distances with PBC based on the tiling.

Parameters:

node_a (np.ndarray) – the first node from which the distance will be calculated.
node_b (np.ndarray) – the second node from which the distance will be calculated.
net_shape (tuple[float, float]) – the sizes of the network.
distance_func (function) – the function to calculate distance between nodes.
axis (int) – axis along which the minimum distance across PBC will be calculated.
xp (numpy or cupy) – the numeric library to handle arrays.

Returns:

the distance adjusted by PBC.

Return type:

(float)

classmethod draw_map(fig: Figure, centers: Collection[float], feature: Collection[float], cmap: Optional[ListedColormap] = None) → Axes

Draw a grid based on the selected tiling, nodes positions and color the tiles according to a given feature.

Parameters:

fig (matplotlib figure object) – the figure on which the hexagonal grid will be plotted.
centers (list, float) – array containing couples of coordinates for each cell to be plotted in the Hexagonal tiling space.
feature (list, float) – array contaning informations on the weigths of each cell, to be plotted as colors.
cmap (ListedColormap) – a custom color map.

Returns:

the axis on which the hexagonal grid has been plotted.

Return type:

ax (matplotlib axis object)

get_topology() → None: Get information on the set topology.

static neighborhood_pbc(center_node: ~typing.Tuple[~numpy.ndarray], nodes: ~typing.Tuple[~numpy.ndarray], net_shape: ~typing.Tuple[float], distance_func: ~typing.Callable, xp: module = <module 'numpy' from '/home/docs/checkouts/readthedocs.org/user_builds/simpsom/conda/latest/lib/python3.8/site-packages/numpy/__init__.py'>) → ndarray

Manage neighborhood with PBC based on the tiling, adapted for batch training neighborhood functions. Works along a single provided axis and calculates the distance of a single node (center_node) from all other nodes in the network (nodes)

Parameters:

center_node (Tuple[np.ndarray]) – position (index) of the first node along the provided axis. Shaped as (net_shape[1], 1, 1), for each axis.
nodes (Tuple[np.ndarray]) – the position of all nodes long a given axis as a matrix. Shaped as (1, net_shape[1], net_shape[0]), for each axis.
net_shape (tuple[float, float]) – the sizes of the network.
distance_func (function) – the function to calculate distance between nodes.
xp (numpy or cupy) – the numeric library to handle arrays.

Returns:

the distance from all nodes adjusted by PBC.

Return type:

(np.ndarray)

static to_tiles(coor: Tuple[float]) → ndarray

Convert 2D cartesian coordinates to tiling coordinates.

Parameters:: coor (tuple[float,..]) – the Cartesian coordinates.
Returns:: a 2d array containing the coordinates in the new space.
Return type:: array

topology = None

class simpsom.polygons.Squares

Bases: Polygon

Class to define a square tiling.

topology = 'square'

Plotting

simpsom.plots.line_plot(y_val: Union[ndarray, list], x_val: Optional[Union[ndarray, list]] = None, show: bool = True, print_out: bool = False, file_name: str = './line_plot.png', **kwargs: Tuple[int]) → Tuple[Figure, Axes]

A simple line plot with maplotlib.

Parameters:

y_val (array or list) – values along the y axis.
x_val (array or list) – values along the x axis, if none, these will be inferred from the shape of y_val.
show (bool) – Choose to display the plot.
print_out (bool) – Choose to save the plot to a file.
file_name (str) – Name of the file where the plot will be saved if print_out is active. Must include the output path.
kwargs (dict) – Keyword arguments to format the plot: - figsize (tuple(int, int)): the figure size, - title (str): figure title, - xlabel (str): x-axis label, - ylabel (str): y-axis label, - logx (bool): if True set x-axis to logarithmic scale, - logy (bool): if True set y-axis to logarithmic scale, - fontsize (int): font size of label, title 15% larger, ticks 15% smaller.

Returns:

the produced figure object. ax (ax object): the produced axis object.

Return type:

fig (figure object)

simpsom.plots.plot_map(centers: Collection[ndarray], feature: Collection[ndarray], polygons_class: Polygon, show: bool = True, print_out: bool = False, file_name: str = './som_plot.png', **kwargs: Tuple[int]) → Tuple[Figure, Axes]

Plot a 2D SOM

Parameters:

centers (list or array) – The list of SOM nodes center point coordinates (e.g. node.pos)
feature (list or array) – The SOM node feature defining the color map (e.g. node.weights, node.diff)
polygons_class (polygons) – The polygons class carrying information on the map topology.
show (bool) – Choose to display the plot.
print_out (bool) – Choose to save the plot to a file.
file_name (str) – Name of the file where the plot will be saved if print_out is active. Must include the output path.
kwargs (dict) – Keyword arguments to format the plot: - figsize (tuple(int, int)): the figure size, - title (str): figure title, - cbar_label (str): colorbar label, - fontsize (int): font size of label, title 15% larger, ticks 15% smaller, - cmap (ListedColormap): a custom colormap.

Returns:

the produced figure object. ax (ax object): the produced axis object.

Return type:

fig (figure object)

simpsom.plots.scatter_on_map(datagroups: Collection[ndarray], centers: Collection[ndarray], polygons_class: Polygon, color_val: Optional[bool] = None, show: bool = True, print_out: bool = False, file_name: str = './som_scatter.png', **kwargs: Tuple[int]) → Tuple[Figure, Axes]

Scatter plot with points projected onto a 2D SOM.

Parameters:

datagroups (list[array,...]) – Coordinates of the projected points. This must be a nested list/array of arrays, where each element of the list is a group that will be plotted separately.
centers (list or array) – The list of SOM nodes center point coordinates (e.g. node.pos)
color_val (array) – The feature value to use as color map, if None the map will be plotted as white.
polygons_class (polygons) – The polygons class carrying information on the map topology.
show (bool) – Choose to display the plot.
print_out (bool) – Choose to save the plot to a file.
file_name (str) – Name of the file where the plot will be saved if print_out is active. Must include the output path.
kwargs (dict) – Keyword arguments to format the plot: - figsize (tuple(int, int)): the figure size, - title (str): figure title, - cbar_label (str): colorbar label, - fontsize (int): font size of label, title 15% larger, ticks 15% smaller, - cmap (ListedColormap): a custom colormap.

Returns:

the produced figure object. ax (ax object): the produced axis object.

Return type:

fig (figure object)