**Introduction (In advance for those who contributed their intellectual effort)**Power transformer is a key equipment and its normal operation is critical for the stability of a power system. It is always subject to various kinds of overload caused by lightning strikes, switching operations or system disturbances [1]. The randomness of electric energy from renewable power station also bring a negative impact on service life of a power transformer [2]. Generally, the abnormal overload condition may cause corona, overheating and arcing, which are the three main causes of the insulation degradation in a long in-service transformer, whose byproduct gases can be dissolved in the insulation oil [3][4]. The fault related gases commonly are hydrogen (H2), methane (CH4), acetylene (C2H2), ethylene (C2H4) and ethane (C2H6) [5], whose variations are often used as a reference index for detecting the incipient faults of a transformer. Dissolved gas analysis (DGA) is a well-known method for the incipient fault detection of an oil-immersed power transformer [6][7], which evaluates the condition of a transformer by using the combinations of gas ratios or gases themselves. To overcome the drawbacks of DGA interpretation methods, various computational techniques have been employed to improve the fault diagnosis accuracy, such as expert system [8] and fuzzy logic methods [9]. Recently, machine learning techniques have been utilised to construct the forecasting models for dissolved gas prediction, such as the artificial neural network (ANN) [10], grey model [11] and support vector machine (SVM) [12]. radial basis function (RBF) [13] and Back-prorogation (BP) [14] network are two typical methods used in nonlinear forecasting, however, their limitations are the required training process, network parameter assignment, over-fitting, slow convergence velocity and relapsing into local extremism easily [15]. The weakness of the ANN model raised from the stochastic nature of dissolved gases cannot be compensated solely through increasing ANN size or repeating its training procedure. Owing to the complexity of the power transformer fault mechanism, the relationships between characteristic gases and potential fault types are uncertainty. The modified grey model was applied to predict the oil dissolved gas trend of power transformers with small sample number and high forecasting accuracy [16]. The grey model only depicts a slowly increasing or decreasing process with time, as an exponential law cannot forecast exactly the development trend of the dissolved gases content [17]. SVM and least square support vector machine (LSSVM) were widely applied in condition assessment of power transformer, such as transient phenomena identification [18] and gas value prediction [19][20]. However, the model is theoretically deterministic once the parameters are confirmed and so the forecasting of the stochastic system by applying LSSVM is doubtful and questionable. Empirically speaking, it is crucial to notice that the dissolved gases content often show completely nonlinear and chaotic behaviours. Thus, it is very difficult to grasp the development discipline of dissolved gases content directly through the LSSVM forecasting model. Fei proposed the use of the particle swarm optimisation (PSO)-SVM to forecast the 37 dissolved gases content in power transformer oil, whose parameters are optimised through the particle swarm optimisation algorithm [21]. There appears to be several challenges associated with SVM in prediction of highly non-linear time series prediction applications such as, selection of kernel function, free parameter selection, managing the trade space complexity and decision on the optimization techniques. According to Duval in his study [22] entitled "Due to the economic and practical realities of laboratory DGA, the usual practice of laboratory chemists is to base their measurement accuracy estimates on the average error of only one or two measurements of gas-in-oil standards. Although accuracy estimates of this nature are found useful by most laboratory chemists, it must be recognized that when these accuracy figures are used for basic statistical inference, the statistical significance level is unknown.

The IEC 60599 and IEC 60567 [23] also indicate that there is always some degree of inaccuracy in laboratory dissolved-gas measurements, especially at low gas concentrations. A large number of transformers are still monitored by using off-line laboratory analysis, where the gas leakage and air ingression easily occur during sample transportation and measurement delay. As a gas detection method, gas chromatography method consumes gas itself during the process of detection test and the performance of the chromatographic column will gradually change, therefore, it regularly needs a standard gas calibration [24]. Recently, research results [25] illustrate that the online gas measurements agree with the laboratory analysis are generally within a 30% deviation. Basically, gas data records of current standards are almost obtained using the results of off-line laboratory analysis [26]. Although many computational methods have been proposed for gas value prediction, none of them attempts to predict the gas value with sampling errors for DGA. Considering the sampling errors, the problem can be viewed as a predicted system with fuzzy inputs and fuzzy outputs. Fuzzy regression analysis is a very powerful method for forecasting the fuzzy outputs with uncertain inputs. According to Wang and Tsaur in their study entitled "Insight of A Fuzzy Regression Model," fuzzy regression can be quite useful in estimating the relationships among variables where the available data are very limited and imprecise, and variables are interacting in an uncertain, qualitative, and fuzzy way [27]. There are two types of fuzzy regression that are fuzzy linear regression (FLR) and fuzzy nonlinear regression (FNR). FLR was first presented by Tanaka et al. [28] in 1982, who used a fuzzy linear function with crisp inputs, fuzzy outputs, and fuzzy coefficients to approximate an uncertain system. The FLR model was extended by to handle fuzzy regression tasks with fuzzy inputs and fuzzy outputs by improvements in two areas: constructing fuzzy linear functions with crisp coefficients [29, 30, 31, 32] and making fuzzy linear functions with fuzzy coefficients [33, 34, 35, 36]. Linear programming and least squares are used widely to solve the crisp or fuzzy coefficients for FLR models. However, for a crisp-in fuzzy-out or fuzzy-in fuzzy-out system, the relationship between the inputs and outputs is usually nonlinear in many applications. Thus, a more sophisticated FNR approach needs to be considered. Due to its better capacity for approximating nonlinear functions, feed-forward neural networks are usually employed for constructing complex FNR models. FNR models using feed-forward neural networks include the following two categories. The first one is Back-propagation (BP) network-based FNR models. Ishibuchi and Tanaka [37] proposed an FNR model (FNRBP−I) that uses two BP networks to fit the upper and lower bounds of interval-valued fuzzy numbers in 1992. These two BP networks with crisp weights and biases are trained using the standard BP algorithm [38]. Both the inputs and outputs of FNRBP−Iare interval-valued fuzzy numbers. Ishibuchi and Tanaka proposed another BP network-based FNR model (FNRBP−II) [39] that handles the FNR problem using crisp inputs and fuzzy outputs. Unlike FNRBP−I, there is only one BP network in FNRBP−II. The weights and biases in this BP network are interval-valued fuzzy numbers. In 1995, Ishibuchi et al. [40] proposed a BP network with triangular fuzzy number (TFN) weights to conduct the fuzzy regression analysis (FNRBP−III), where the inputs and outputs are interval-valued fuzzy numbers. The second one is radial basis function (RBF) network-based FNR models. Cheng and Lee [41] used an RBF network to design an FNR model (FNRRBF−I) in 2001. In FNRRBF−I, the inputs and centers of the input layer nodes are crisp, whereas the outputs and output layer weights are Triangle fuzzy numbers (TFNs). Compared with previous BP network-based FNR models, FNRRBF−Iobtained a faster convergence rate. A fuzzified RBF network-based FNR model (FNRRBF−II) was described by Zhang et al. [42] in 2005. In FNRRBF−II, the inputs, outputs, centers, deviations, and weights are Left-Right (L-R) fuzzy numbers. The FNRRBF−IIis able to serve as a universal function approximation for any continuous fuzzy function defined on a compact set. The main problems with these FNR models based on BP and RBF networks include their high training complexity, local minima, and complex parameter tuning. Optimization of the neural network parameters, e.g., the weights, biases, node centers, and node deviations in FNRBP−I, FNRBP−II, FNRBP−III, FNRRBF−I, and FNRRBF−II, is based on gradient descent approaches where the learning speed is relatively slow. Moreover, the gradient-based methods may converge to a local minimum. In addition, there are no widely accepted methods for determining the optimal learning rate, learning epochs, and stopping criteria for BP and RBF networks. Thus, trial-and-error methods are often appliedto select these parameters. These methods require large periods of computational time to establish BP and RBF 93 network-based FNR models. Thus, we develop a new FNR learning algorithm, which is faster with high generalization performance. It also avoids many difficulties that affect the traditional gradient-based FNR models. Compared with the conventional training algorithms based on BP and RBF networks, Random Weight Networks (RWNs) [? 45, 46] do not require iterative adjustments of the network weights and no learning parameter needs to be determined. The training speed of RWNs can be thousands of times faster than traditional gradient descent algorithms. In addition, the good generalization capacity of RWNs has been demonstrated in recent studies [44, 45, 46]. Therefore, a RWN-based FNR model called FNRRWNis proposed in this study. FNRRWN is a single hidden layer feed-forward neural network, where the inputs and outputs are TFNs. The input layer weights and hidden layer biases of FNRRWNare selected randomly. For the purpose of analytically calculating the output layer weights, we define a new computational paradigm to minimize the integrated squared error between α-cut sets that correspond to the predicted fuzzy outputs and target fuzzy outputs. The simulation results illustrate that FNRRWNhas better generalization performance and less training time compared with FNRBP−III and FNRRBF−II. Overall, our results demonstrate that FNRRWNcan effectively approximate a fuzzy-in fuzzy-out system for gas value prediction. The remainder of this paper is organized as follows. In Section 2, we provide a detail introduction to generation mechanism of dissolved gases and gas sampling errors. In Section 3, the RWN-based FNR model (FNRRWN) is presented. In Section 4, TFNs and their Mathematical Operations are discussed in detail. In Section 5, we report experimental comparisons that demonstrate the feasibility and effectiveness of FNRRWN. Finally, we give our conclusions and suggestions for further research in Section 6.