Mamdani Fuzzy Neural Network - Architecture & Learning Algorithm

A comprehensive guide to the Mamdani-based Fuzzy Neural Network (FNN). Learn its 5-layer architecture, step-by-step fuzzy inference process, and how to train it using the error backpropagation algorithm. Discover how this model blends the interpretability of fuzzy logic with the adaptive power of neural networks.

Posted Jul 7, 2025

By Theodore Cooper

5 min read

Mamdani Fuzzy Neural Network - Architecture & Learning Algorithm

Introduction

The integration of fuzzy logic and neural networks seeks to create hybrid intelligent systems that leverage the strengths of both paradigms: the human-like, interpretable reasoning of fuzzy systems and the data-driven learning capabilities of neural networks.

The Mamdani Fuzzy Inference System

Before detailing the network architecture, it is essential to review the conventional Mamdani FIS on which it is based. We will only discuss Multi-Input, Single-Output (MISO) Mamdani system, because the fuzzy rules for MIMO can be decomposed into multiple MISO. A typical MISO Mamdani system processes information through four key stages:

Fuzzification: Crisp input values, such as $x = [x_1, x_2, …, x_n]^T$, are converted into fuzzy sets by calculating their membership degrees in predefined $j$-th linguistic term (e.g., “Low”, “Medium”, “High”). The membership degree $\mu_{A_i^j}(x_i)$ quantifies the extent to which input $x_i$ belongs to the fuzzy set $A_i^j$.
Fuzzy Rule Inference: The system evaluates a set of “IF-THEN” rules. For each rule $R_i$, an firing strength (or activation strength) $\alpha_i$ is computed by applying a fuzzy AND operator (typically product or min) to the membership degrees of the rule’s antecedents.

\[\alpha_{i}=\mu_{A_1^{i}}(x_{1}) \cdot \mu_{A_2^{i}}(x_{2}) \cdot \cdot \cdot \mu_{A_n^{i}}(x_{n}) \quad \text{(product operator)}\] \[\alpha_{i}=\mu_{A_1^{i}}(x_{1}) \wedge \mu_{A_2^{i}}(x_{2}) \cdot \cdot \cdot \mu_{A_n^{i}}(x_{n}) \quad \text{(min operator)}\]

The activation strength $\alpha_i$ is then used to determine the output fuzzy set for that rule. All individual rule outputs are aggregated into a single final fuzzy set $B$.

\[B = \bigcup^m_{i=1} B_i\]

Defuzzification: The final fuzzy output set $B$ is converted into a single crisp output value $y_0$. While the Center of Gravity (COG) method is common, a computationally simpler and widely used approximation is the weighted average method:

\[y_{0} = \frac{\sum_{i=1}^{m} y_{c_i} \alpha_i}{\sum_{i=1}^{m} \alpha_i} = \sum_{i=1}^{m} y_{c_i} \overline{\alpha_i}\]

Here, $y_{c_i}$ is the center of the output fuzzy set for the $i$-th rule, and $\overline{\alpha_i}$ is the normalized firing strength of that rule. This final formula is the blueprint for the network’s output calculation.

Five-Layer Architecture

The Mamdani FIS can be functionally represented by a five-layer feedforward neural network. Each layer performs a distinct step of the fuzzy inference process.

Layer 1: Input Layer This layer acts as a simple distributor.
- Function: It receives the crisp input vector $x = [x_1, …, x_n]^T$ and passes it to the next layer.
- Nodes: $n$ nodes, one for each input variable.
- Output: $O_i^{(1)} = x_i$.
Layer 2: Fuzzification Layer This layer calculates the membership degrees for each input.
- Function: Each node represents a linguistic term (e.g., “Positive Small”) and computes the membership degree of an input variable to that term.
- Nodes: A total of $N_2 = \sum_{i=1}^{n}m_i$ nodes, where $m_i$ is the number of fuzzy sets for the $i$-th input.
- Output: The output is the membership degree, often calculated using a Gaussian function:

\[O_{ij}^{(2)} = \mu_{A_i^j}(x_i) = \exp\left(-\frac{(x_i - c_{ij})^2}{\sigma_{ij}^2}\right)\]

The parameters $c_{ij}$ (center) and $\sigma_{ij}$ (width) of the membership function are learnable parameters of the network.

Layer 3: Rule Layer Each node in this layer corresponds to a single fuzzy rule.
- Function: To compute the firing strength $\alpha_j$ of each fuzzy rule by combining the membership degrees from the previous layer. This is typically done using a product T-norm operator.
- Nodes: $m$ nodes, where $m$ is the total number of fuzzy rules.
- Output: $O_j^{(3)} = \alpha_j = \prod_{i} O_{i,j}^{(2)}$, where $O_{i,j}^{(2)}$ are the membership degrees forming the antecedent of the $j$-th rule.
Layer 4: Normalization Layer This layer normalizes the activation strengths calculated in Layer 3.
- Function: To compute the normalized activation strength $\overline{\alpha}_j$ for each rule.
- Nodes: $m$ nodes, same as the rule layer[cite: 82].
- Output:

\[O_j^{(4)} = \overline{\alpha}_j = \frac{\alpha_j}{\sum_{k=1}^{m}\alpha_k} = \frac{O_j^{(3)}}{\sum_{k=1}^{m}O_k^{(3)}}\]

Layer 5: Defuzzification Layer This layer computes the final crisp output.
- Function: To calculate the overall system output as the weighted sum of the normalized activation strengths.
- Nodes: $r$ nodes, one for each output variable (for MISO, $r=1$).
- Output:

\[y_i = O_i^{(5)} = \sum_{j=1}^{m} w_{ij} \overline{\alpha}_j = \sum_{j=1}^{m} w_{ij} O_j^{(4)}\]

The connection weights $w_{ij}$ are the final set of learnable parameters. Physically, each weight $w_{ij}$ corresponds to the center of the consequent fuzzy set ($y_{c_j}$) for the $j$-th rule in the original Mamdani model.

Learning Algorithm via Error Backpropagation

As a feedforward network, this FNN can be trained using a supervised learning algorithm based on gradient descent, analogous to the standard backpropagation algorithm. The goal is to adjust the network’s parameters to minimize an error cost function.

Objective Function: The error is typically defined as the sum of squared differences between the network’s actual output $y_i$ and the desired target output $t_i$.

\[E = \frac{1}{2}\sum_{i=1}^{r}(t_i - y_i)^2\]

Parameter Update: The learnable parameters—weights $w_{ij}$, centers $c_{ij}$, and widths $\sigma_{ij}$—are updated iteratively using the gradient descent method. The general update rule is:

\[\text{parameter}(k+1) = \text{parameter}(k) - \beta \frac{\partial E}{\partial \text{parameter}}\]

where $\beta > 0$ is the learning rate.

Gradient Calculation: The core of the algorithm is computing the partial derivatives of the error $E$ with respect to each parameter. This is achieved by propagating an error signal $\delta^{(q)}$ backwards from the output layer (layer $q=5$) to the hidden layers.

For weights $w_{ij}$ (Layer 5): The error signal at the output layer is $\delta_i^{(5)} = t_i - y_i$. The gradient is then:

\[\frac{\partial E}{\partial w_{ij}} = \frac{\partial E}{\partial f_i^{(5)}} \frac{\partial f_i^{(5)}}{\partial w_{ij}} = -(t_i - y_i)\overline{\alpha}_j\]

where $f_i^{(5)}$ is the net input to the $i$-th node in layer 5.

For parameters $c_{ij}$ and $\sigma_{ij}$ (Layer 2): The error signal $\delta_i^{(5)}$ is propagated backward through Layer 4 and Layer 3 to obtain the error signal at Layer 2, $\delta_{ij}^{(2)}$. Once $\delta_{ij}^{(2)}$ is known, the gradients for the membership function parameters can be computed using the chain rule:

\[\frac{\partial E}{\partial c_{ij}} = \frac{\partial E}{\partial f_{ij}^{(2)}} \frac{\partial f_{ij}^{(2)}}{\partial c_{ij}} = -\delta_{ij}^{(2)} \frac{2(x_i - c_{ij})}{\sigma_{ij}^2}\] \[\frac{\partial E}{\partial \sigma_{ij}} = \frac{\partial E}{\partial f_{ij}^{(2)}} \frac{\partial f_{ij}^{(2)}}{\partial \sigma_{ij}} = -\delta_{ij}^{(2)} \frac{2(x_i - c_{ij})^2}{\sigma_{ij}^3}\]

where $f_{ij}^{(2)}$ is the net input to the corresponding node in Layer 2.

By applying these update rules iteratively with a set of training data, the network learns to approximate the desired input-output mapping. The resulting system is both a local approximation network, similar to an RBF network, and an interpretable model whose structure and parameters have clear physical meanings derived from fuzzy logic.

Fuzzy AI ML

This post is licensed under CC BY 4.0 by the author.