Solve the weather problem to predict the possibility of a rain happening under known parameters for e.g., temperature, humidity, wind flow, sunny or cloudy etc. using Bayesian Learning.
Experiment-2
Objective: Solve the weather problem to predict the possibility of a rain happening under known parameters for e.g., temperature, humidity, wind flow, sunny or cloudy etc. using Bayesian Learning.
Theory: The basic idea of Bayesian networks (BNs) (BNs) is to reproduce the most important dependencies and independencies among a set of variables in a graphical form (a directed acyclic graph) which is easy to understand and interpret. Let us consider the subset of climatic stations shown in the graph in Fıgure, where the variables (rainfall) are represented pictorially by a set of nodes; one node for each variable (for clarity of exposition, the set of nodes is denoted {y1, yn}). These nodes are connected by arrows, which represent a cause and
effect relationship. That is, if there is an arrow from node yi to node yj , we say that yi is the cause of yj , or equivalently, yj is the effect of yi. Another popular terminology of this is to say that yi is a parent of yj or yj is a child of yi. For example, in Figure, the nodes Gijon and Amieva and
Proaza are a child of Gijon and Rioseco (the set of parents of a node yi is denoted by πi). Directed graphs provide a simple defınition of independence (d-separation) based on the existence or not of certain paths between the variables.
The dependency/independency structure displayed by an acyclic directed graph can be also expressed in terms of a the Joint Probability Distribution (JPD) factorized as a product of several conditional distributions as follows:
Pr(y1,y2, …., yn) = n∏i=1 P(yi | Ï€i)
Therefore, the independencies from the graph are easily translated to the probabilistic model in a sound form. For instance, the JPD of a BN defıned by the graph given in Fıgure requires the specifıcation of 100 conditional probability tables, one for each variable conditioned to its parents’ set. Hereafter we shall consider rainfall discretized into three different states (0=“no rain”, 1=“weak rain”, 2=“heavy rain”), associated with the thresholds 0, 2, and 10 mm, respectively.
Procedure:
Learning Bayesian Networks from Data-
In addition to the graph structure, a BN requires that we specify the conditional probability of each node given its parents. However, in many practical problems, we do not know neither the complete topology of the graph, nor some of the required probabilities. For this reason, several methods have been recently introduced for learning the graphical structure (structure learning) and estimating probabilities (parametric learning) from data. A learning algorithm consists of two parts:
- A quality measure, which is used for computing the quality of the candidate BNs. This is a global measure, since it measures both the quality of the graphical structure and the quality of the estimated parameters.
- A search algorithm, which is used to effıciently search the space of possible BNs to fınd the one with highest quality. Note that the number of all possible networks, even for a small number of variables and, therefore, the search space is huge.
Among the different quality measures proposed in the literature the basic idea of Bayesian quality measures is to assign to every BN a quality value that is a function of the posterior probability distribution of the available data D = {y 1, …, y 100} (with the index t running daily from 1979 to 1993), given the BN (M,θ) with network structure M and the corresponding estimated probabilities θ. The posterior probability distribution p(M, θ|D) is calculated as follows:
Geiger and Heckerman consider multinomial networks and assume certain hypothesis about the prior distributions of the parameters, leading to the quality measure
where n is the number of variables, ri is the cardinal of the i-th variable, si the number of realizations of the parent’s set ∏i , ηijk are the “a priori” Dirichlet hyper-parameters for the conditional distribution of node i, Nijk is the number of realizations in the database consistent with yi = j and Ï€i = k, Nik is the number of realizations in the database consistent with Ï€i = k and Г is the gamma function.
- Inference- Once a model describing the relationships among the set of variables has been selected, it can then be used to answer queries when evidence becomes available.
- Validation of the Bayesian Network Forecast Model- To check the quality of BN in a simple case, we shall apply this methodology to a nowcasting problem. In this case we are given a forecast in a given subset of stations and we need to infer a prediction for the remaining stations in the network. To this aim, consider that we are given predictions in the fıve stations of the primary network. These predictions shall be plugged in the network as evidence, obtaining the probabilities for the remaining stations in the secondary network.
- Connecting With Numerical Atmospheric Models- Since we are interested in rainfall forecasts, we shall use the gridded forecasts of total precipitation given by the operative ECMWF model (these values are obtained by adding both the convective and the large scale precipitation outputs). The forecasts are obtained 24 hours ahead; therefore, they give a numeric estimation of the future precipitation pattern (one day ahead) on a coarse-grained resolution grid.
Output:
Bayesian network of precipitation grid points and local precipitation at the network of local stations.
Conclusion: We have used bayesian network learning and show their applicability for local weather forecasting and downscaling. The preliminary results presented how such models can be built and how they can be used for performing inferen
Comments
Post a Comment