Can We Use Normal Features With Rnn Tensorflow

Computational model used in auto learning

A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes form a directed or undirected graph along a temporal sequence. This allows it to showroom temporal dynamic behavior. Derived from feedforward neural networks, RNNs can use their internal state (memory) to process variable length sequences of inputs.^[1] ^[2] ^[three] This makes them applicative to tasks such as unsegmented, continued handwriting recognition^[4] or speech recognition.^[5] ^[6] Recurrent neural networks are theoretically Turing complete and tin run arbitrary programs to process arbitrary sequences of inputs.^[7]

The term "recurrent neural network" is used to refer to the class of networks with an infinite impulse response, whereas "convolutional neural network" refers to the class of finite impulse response. Both classes of networks showroom temporal dynamic behavior.^[8] A finite impulse recurrent network is a directed acyclic graph that tin be unrolled and replaced with a strictly feedforward neural network, while an infinite impulse recurrent network is a directed cyclic graph that can not be unrolled.

Both finite impulse and infinite impulse recurrent networks tin have additional stored states, and the storage tin be under direct control by the neural network. The storage can as well be replaced by another network or graph if that incorporates time delays or has feedback loops. Such controlled states are referred to equally gated state or gated memory, and are office of long curt-term retentivity networks (LSTMs) and gated recurrent units. This is too chosen Feedback Neural Network (FNN).

History [edit]

Recurrent neural networks were based on David Rumelhart's work in 1986.^[9] Hopfield networks – a special kind of RNN – were (re-)discovered by John Hopfield in 1982. In 1993, a neural history compressor system solved a "Very Deep Learning" task that required more than 1000 subsequent layers in an RNN unfolded in time.^[x]

LSTM [edit]

Long brusk-term memory (LSTM) networks were invented by Hochreiter and Schmidhuber in 1997 and set accurateness records in multiple applications domains.^[xi]

Around 2007, LSTM started to revolutionize speech recognition, outperforming traditional models in certain spoken communication applications.^[12] In 2009, a Connectionist Temporal Nomenclature (CTC)-trained LSTM network was the showtime RNN to win pattern recognition contests when it won several competitions in continued handwriting recognition.^[13] ^[14] In 2014, the Chinese company Baidu used CTC-trained RNNs to break the 2S09 Switchboard Hub5'00 speech recognition dataset^[15] benchmark without using any traditional speech processing methods.^[16]

LSTM also improved large-vocabulary speech recognition^[v] ^{[half dozen]} and text-to-spoken communication synthesis^[17] and was used in Google Android.^[13] ^[18] In 2015, Google's speech recognition reportedly experienced a dramatic performance bound of 49%^{[ citation needed ]} through CTC-trained LSTM.^[19]

LSTM bankrupt records for improved auto translation,^[xx] Linguistic communication Modeling^[21] and Multilingual Language Processing.^[22] LSTM combined with convolutional neural networks (CNNs) improved automatic epitome captioning.^[23]

Architectures [edit]

RNNs come in many variants.

Fully recurrent [edit]

Compressed (left) and unfolded (right) bones recurrent neural network.

Fully recurrent neural networks (FRNN) connect the outputs of all neurons to the inputs of all neurons. This is the most full general neural network topology because all other topologies tin can exist represented past setting some connexion weights to zero to simulate the lack of connections between those neurons. The analogy to the right may be misleading to many because practical neural network topologies are frequently organized in "layers" and the drawing gives that appearance. However, what appears to exist layers are, in fact, different steps in time of the aforementioned fully recurrent neural network. The left-most item in the illustration shows the recurrent connections as the arc labeled '5'. Information technology is "unfolded" in time to produce the advent of layers.

Elman networks and Jordan networks [edit]

An Elman network is a three-layer network (bundled horizontally every bit x, y, and z in the illustration) with the addition of a set of context units (u in the analogy). The center (hidden) layer is continued to these context units fixed with a weight of one.^[24] At each fourth dimension step, the input is fed frontwards and a learning rule is applied. The fixed dorsum-connections salve a re-create of the previous values of the subconscious units in the context units (since they propagate over the connections before the learning rule is applied). Thus the network tin maintain a sort of state, allowing it to perform such tasks as sequence-prediction that are across the power of a standard multilayer perceptron.

Jordan networks are similar to Elman networks. The context units are fed from the output layer instead of the hidden layer. The context units in a Jordan network are also referred to as the land layer. They accept a recurrent connection to themselves.^[24]

Elman and Hashemite kingdom of jordan networks are also known as "Uncomplicated recurrent networks" (SRN).

Elman network^[25]: ${\brainstorm{aligned}h_{t}&=\sigma _{h}(W_{h}x_{t}+U_{h}h_{t-1}+b_{h})\\y_{t}&=\sigma _{y}(W_{y}h_{t}+b_{y})\cease{aligned}}$
Jordan network^[26]: ${\begin{aligned}h_{t}&=\sigma _{h}(W_{h}x_{t}+U_{h}y_{t-1}+b_{h})\\y_{t}&=\sigma _{y}(W_{y}h_{t}+b_{y})\end{aligned}}$

Variables and functions

Hopfield [edit]

The Hopfield network is an RNN in which all connections across layers are equally sized. Information technology requires stationary inputs and is thus not a full general RNN, as it does not process sequences of patterns. However, information technology guarantees that information technology volition converge. If the connections are trained using Hebbian learning then the Hopfield network can perform as robust content-addressable retention, resistant to connection alteration.

Bidirectional associative retentivity [edit]

Introduced by Bart Kosko,^[27] a bidirectional associative retentivity (BAM) network is a variant of a Hopfield network that stores associative data as a vector. The bi-directionality comes from passing data through a matrix and its transpose. Typically, bipolar encoding is preferred to binary encoding of the associative pairs. Recently, stochastic BAM models using Markov stepping were optimized for increased network stability and relevance to real-world applications.^[28]

A BAM network has two layers, either of which tin be driven as an input to call up an association and produce an output on the other layer.^[29]

Echo country [edit]

The repeat state network (ESN) has a sparsely continued random hidden layer. The weights of output neurons are the only part of the network that can change (be trained). ESNs are good at reproducing certain time series.^[thirty] A variant for spiking neurons is known as a liquid land machine.^[31]

Independently RNN (IndRNN) [edit]

The Independently recurrent neural network (IndRNN)^[32] addresses the slope vanishing and exploding problems in the traditional fully connected RNN. Each neuron in one layer but receives its own past state every bit context information (instead of full connectivity to all other neurons in this layer) and thus neurons are independent of each other's history. The gradient backpropagation tin be regulated to avoid slope vanishing and exploding in order to keep long or short-term retention. The cross-neuron information is explored in the next layers. IndRNN can be robustly trained with the non-saturated nonlinear functions such as ReLU. Using skip connections, deep networks tin be trained.

Recursive [edit]

A recursive neural network^[33] is created past applying the same set of weights recursively over a differentiable graph-like structure by traversing the structure in topological order. Such networks are typically also trained past the reverse style of automated differentiation.^[34] ^[35] They tin can process distributed representations of structure, such as logical terms. A special case of recursive neural networks is the RNN whose structure corresponds to a linear chain. Recursive neural networks have been applied to natural linguistic communication processing.^[36] The Recursive Neural Tensor Network uses a tensor-based composition function for all nodes in the tree.^[37]

Neural history compressor [edit]

The neural history compressor is an unsupervised stack of RNNs.^[38] At the input level, it learns to predict its adjacent input from the previous inputs. Simply unpredictable inputs of some RNN in the hierarchy become inputs to the side by side higher level RNN, which therefore recomputes its internal state simply rarely. Each higher level RNN thus studies a compressed representation of the information in the RNN below. This is done such that the input sequence can be precisely reconstructed from the representation at the highest level.

The system effectively minimises the description length or the negative logarithm of the probability of the data.^[39] Given a lot of learnable predictability in the incoming data sequence, the highest level RNN can use supervised learning to easily allocate even deep sequences with long intervals between important events.

It is possible to distill the RNN hierarchy into ii RNNs: the "conscious" chunker (college level) and the "subconscious" automatizer (lower level).^[38] Once the chunker has learned to predict and compress inputs that are unpredictable past the automatizer, and then the automatizer tin can be forced in the next learning phase to predict or imitate through additional units the subconscious units of the more than slowly changing chunker. This makes it piece of cake for the automatizer to learn advisable, rarely changing memories across long intervals. In turn, this helps the automatizer to make many of its once unpredictable inputs predictable, such that the chunker can focus on the remaining unpredictable events.^[38]

A generative model partially overcame the vanishing gradient problem^[40] of automatic differentiation or backpropagation in neural networks in 1992. In 1993, such a system solved a "Very Deep Learning" chore that required more than 1000 subsequent layers in an RNN unfolded in fourth dimension.^[10]

Second society RNNs [edit]

Second lodge RNNs use higher social club weights $w{}_{ijk}$ instead of the standard $w{}_{ij}$ weights, and states can be a product. This allows a direct mapping to a finite-state car both in training, stability, and representation.^[41] ^[42] Long short-term memory is an instance of this only has no such formal mappings or proof of stability.

Long short-term memory [edit]

Long short-term memory unit

Long brusque-term retentivity (LSTM) is a deep learning system that avoids the vanishing gradient problem. LSTM is normally augmented by recurrent gates chosen "forget gates".^[43] LSTM prevents backpropagated errors from vanishing or exploding.^[40] Instead, errors can flow backwards through unlimited numbers of virtual layers unfolded in space. That is, LSTM can larn tasks^[thirteen] that require memories of events that happened thousands or even millions of detached time steps earlier. Problem-specific LSTM-like topologies tin be evolved.^[44] LSTM works even given long delays between significant events and can handle signals that mix low and loftier frequency components.

Many applications use stacks of LSTM RNNs^[45] and railroad train them by Connectionist Temporal Classification (CTC)^[46] to find an RNN weight matrix that maximizes the probability of the label sequences in a grooming ready, given the corresponding input sequences. CTC achieves both alignment and recognition.

LSTM tin can acquire to recognize context-sensitive languages different previous models based on hidden Markov models (HMM) and like concepts.^[47]

Gated recurrent unit of measurement [edit]

Gated recurrent units (GRUs) are a gating mechanism in recurrent neural networks introduced in 2014. They are used in the total form and several simplified variants.^[48] ^[49] Their functioning on polyphonic music modeling and spoken communication point modeling was establish to exist similar to that of long short-term memory.^[50] They have fewer parameters than LSTM, as they lack an output gate.^[51]

Bi-directional [edit]

Bi-directional RNNs use a finite sequence to predict or label each element of the sequence based on the element'southward by and future contexts. This is done by concatenating the outputs of ii RNNs, 1 processing the sequence from left to correct, the other ane from right to left. The combined outputs are the predictions of the teacher-given target signals. This technique has been proven to be especially useful when combined with LSTM RNNs.^[52] ^[53]

Continuous-time [edit]

A continuous-time recurrent neural network (CTRNN) uses a organization of ordinary differential equations to model the effects on a neuron of the incoming inputs.

For a neuron $i$ in the network with activation $y_{i}$ , the rate of change of activation is given past:

\tau _{i}{\dot {y}}_{i}=-y_{i}+\sum _{j=one}^{n}w_{ji}\sigma (y_{j}-\Theta _{j})+I_{i}(t)

Where:

CTRNNs have been applied to evolutionary robotics where they take been used to accost vision,^[54] co-operation,^[55] and minimal cognitive behaviour.^[56]

Note that, by the Shannon sampling theorem, discrete time recurrent neural networks can be viewed every bit continuous-time recurrent neural networks where the differential equations have transformed into equivalent divergence equations.^[57] This transformation can be thought of as occurring afterward the postal service-synaptic node activation functions $y_{i}(t)$ have been depression-pass filtered but prior to sampling.

Hierarchical [edit]

This section needs expansion. You can assist past adding to it. (August 2019)

Hierarchical RNNs connect their neurons in various ways to decompose hierarchical behavior into useful subprograms.^[38] ^[58] Such hierarchical structures of cognition are present in theories of retentiveness presented past philosopher Henri Bergson, whose philosophical views accept inspired hierarchical models.^[59]

Recurrent multilayer perceptron network [edit]

Generally, a recurrent multilayer perceptron network (RMLP) network consists of cascaded subnetworks, each of which contains multiple layers of nodes. Each of these subnetworks is feed-frontwards except for the last layer, which can have feedback connections. Each of these subnets is connected only by feed-frontward connections.^[60]

Multiple timescales model [edit]

A multiple timescales recurrent neural network (MTRNN) is a neural-based computational model that can simulate the functional hierarchy of the brain through self-organisation that depends on spatial connection between neurons and on distinct types of neuron activities, each with distinct time properties.^[61] ^[62] With such varied neuronal activities, continuous sequences of any fix of behaviors are segmented into reusable primitives, which in turn are flexibly integrated into diverse sequential behaviors. The biological approval of such a blazon of bureaucracy was discussed in the memory-prediction theory of encephalon function past Hawkins in his book On Intelligence.^{[ citation needed ]} Such a hierarchy also agrees with theories of memory posited by philosopher Henri Bergson, which take been incorporated into an MTRNN model.^[59] ^[63]

Neural Turing machines [edit]

Neural Turing machines (NTMs) are a method of extending recurrent neural networks by coupling them to external retention resources which they can interact with by attentional processes. The combined system is analogous to a Turing automobile or Von Neumann architecture just is differentiable terminate-to-stop, allowing information technology to be efficiently trained with gradient descent.^[64]

Differentiable neural computer [edit]

Differentiable neural computers (DNCs) are an extension of Neural Turing machines, allowing for the usage of fuzzy amounts of each memory address and a record of chronology.

Neural network pushdown automata [edit]

Neural network pushdown automata (NNPDA) are similar to NTMs, just tapes are replaced past analogue stacks that are differentiable and that are trained. In this style, they are similar in complexity to recognizers of context free grammars (CFGs).^[65]

Memristive Networks [edit]

Greg Snider of HP Labs describes a system of cortical calculating with memristive nanodevices.^[66] The memristors (memory resistors) are implemented past thin movie materials in which the resistance is electrically tuned via the transport of ions or oxygen vacancies within the pic. DARPA's SyNAPSE project has funded IBM Research and HP Labs, in collaboration with the Boston Academy Department of Cerebral and Neural Systems (CNS), to develop neuromorphic architectures which may be based on memristive systems. Memristive networks are a detail type of concrete neural network that have very like backdrop to (Little-)Hopfield networks, equally they have a continuous dynamics, have a limited memory capacity and they natural relax via the minimization of a role which is asymptotic to the Ising model. In this sense, the dynamics of a memristive excursion has the advantage compared to a Resistor-Capacitor network to have a more interesting non-linear behavior. From this indicate of view, engineering an analog memristive networks accounts to a peculiar type of neuromorphic engineering in which the device behavior depends on the circuit wiring, or topology. ^[67] ^[68]

Grooming [edit]

Gradient descent [edit]

Slope descent is a get-go-lodge iterative optimization algorithm for finding the minimum of a role. In neural networks, it tin can be used to minimize the fault term by changing each weight in proportion to the derivative of the error with respect to that weight, provided the not-linear activation functions are differentiable. Various methods for doing so were developed in the 1980s and early on 1990s by Werbos, Williams, Robinson, Schmidhuber, Hochreiter, Pearlmutter and others.

The standard method is called "backpropagation through time" or BPTT, and is a generalization of back-propagation for feed-forward networks.^[69] ^[70] Like that method, information technology is an instance of automatic differentiation in the reverse accumulation way of Pontryagin's minimum principle. A more than computationally expensive online variant is chosen "Existent-Fourth dimension Recurrent Learning" or RTRL,^[71] ^[72] which is an instance of automatic differentiation in the forrard aggregating manner with stacked tangent vectors. Unlike BPTT, this algorithm is local in time just not local in space.

In this context, local in space means that a unit'southward weight vector can be updated using only information stored in the connected units and the unit itself such that update complexity of a unmarried unit is linear in the dimensionality of the weight vector. Local in fourth dimension ways that the updates accept place continually (on-line) and depend only on the near contempo time step rather than on multiple fourth dimension steps inside a given time horizon every bit in BPTT. Biological neural networks announced to be local with respect to both time and space.^[73] ^[74]

For recursively computing the partial derivatives, RTRL has a time-complication of O(number of hidden ten number of weights) per fourth dimension step for computing the Jacobian matrices, while BPTT but takes O(number of weights) per time step, at the cost of storing all forrad activations within the given time horizon.^[75] An online hybrid betwixt BPTT and RTRL with intermediate complexity exists,^[76] ^[77] forth with variants for continuous fourth dimension.^[78]

A major problem with gradient descent for standard RNN architectures is that mistake gradients vanish exponentially chop-chop with the size of the time lag between important events.^[forty] ^[79] LSTM combined with a BPTT/RTRL hybrid learning method attempts to overcome these problems.^[11] This problem is also solved in the independently recurrent neural network (IndRNN)^[32] by reducing the context of a neuron to its own past state and the cross-neuron information can so be explored in the following layers. Memories of dissimilar range including long-term memory tin can exist learned without the slope vanishing and exploding problem.

The on-line algorithm called causal recursive backpropagation (CRBP), implements and combines BPTT and RTRL paradigms for locally recurrent networks.^[80] It works with the most general locally recurrent networks. The CRBP algorithm can minimize the global error term. This fact improves stability of the algorithm, providing a unifying view on gradient calculation techniques for recurrent networks with local feedback.

I approach to the computation of gradient data in RNNs with arbitrary architectures is based on indicate-flow graphs diagrammatic derivation.^[81] It uses the BPTT batch algorithm, based on Lee's theorem for network sensitivity calculations.^[82] It was proposed past Wan and Beaufays, while its fast online version was proposed by Campolucci, Uncini and Piazza.^[82]

Global optimization methods [edit]

Training the weights in a neural network can be modeled equally a non-linear global optimization problem. A target function tin can be formed to evaluate the fitness or fault of a item weight vector as follows: First, the weights in the network are set co-ordinate to the weight vector. Next, the network is evaluated against the grooming sequence. Typically, the sum-squared-difference between the predictions and the target values specified in the grooming sequence is used to represent the mistake of the electric current weight vector. Arbitrary global optimization techniques may and so exist used to minimize this target part.

The most common global optimization method for training RNNs is genetic algorithms, especially in unstructured networks.^[83] ^[84] ^[85]

Initially, the genetic algorithm is encoded with the neural network weights in a predefined manner where i gene in the chromosome represents one weight link. The whole network is represented as a unmarried chromosome. The fitness function is evaluated equally follows:

Each weight encoded in the chromosome is assigned to the corresponding weight link of the network.
The training set is presented to the network which propagates the input signals forwards.
The mean-squared-error is returned to the fettle function.
This office drives the genetic selection process.

Many chromosomes make upwardly the population; therefore, many different neural networks are evolved until a stopping benchmark is satisfied. A common stopping scheme is:

When the neural network has learnt a certain percentage of the grooming data or
When the minimum value of the mean-squared-error is satisfied or
When the maximum number of training generations has been reached.

The stopping criterion is evaluated by the fitness function equally it gets the reciprocal of the mean-squared-error from each network during training. Therefore, the goal of the genetic algorithm is to maximize the fettle function, reducing the mean-squared-error.

Other global (and/or evolutionary) optimization techniques may be used to seek a skilful set of weights, such as simulated annealing or particle swarm optimization.

Related fields and models [edit]

RNNs may behave chaotically. In such cases, dynamical systems theory may be used for analysis.

They are in fact recursive neural networks with a item structure: that of a linear chain. Whereas recursive neural networks operate on any hierarchical structure, combining child representations into parent representations, recurrent neural networks operate on the linear progression of time, combining the previous time footstep and a hidden representation into the representation for the current time step.

In particular, RNNs can appear equally nonlinear versions of finite impulse response and space impulse response filters and also equally a nonlinear autoregressive exogenous model (NARX).^[86]

Libraries [edit]

Apache Singa
Caffe: Created past the Berkeley Vision and Learning Eye (BVLC). It supports both CPU and GPU. Developed in C++, and has Python and MATLAB wrappers.
Chainer: The get-go stable deep learning library that supports dynamic, ascertain-by-run neural networks. Fully in Python, production support for CPU, GPU, distributed preparation.
Deeplearning4j: Deep learning in Java and Scala on multi-GPU-enabled Spark. A general-purpose deep learning library for the JVM product stack running on a C++ scientific computing engine. Allows the creation of custom layers. Integrates with Hadoop and Kafka.
Flux: includes interfaces for RNNs, including GRUs and LSTMs, written in Julia.
Keras: High-level, easy to use API, providing a wrapper to many other deep learning libraries.
Microsoft Cerebral Toolkit
MXNet: a mod open up-source deep learning framework used to train and deploy deep neural networks.
PyTorch: Tensors and Dynamic neural networks in Python with strong GPU acceleration.
TensorFlow: Apache 2.0-licensed Theano-like library with support for CPU, GPU and Google'due south proprietary TPU,^[87] mobile
Theano: The reference deep-learning library for Python with an API largely uniform with the pop NumPy library. Allows user to write symbolic mathematical expressions, then automatically generates their derivatives, saving the user from having to code gradients or backpropagation. These symbolic expressions are automatically compiled to CUDA code for a fast, on-the-GPU implementation.
Torch (world wide web.torch.ch): A scientific computing framework with wide support for car learning algorithms, written in C and lua. The primary writer is Ronan Collobert, and it is at present used at Facebook AI Enquiry and Twitter.

Applications [edit]

Applications of recurrent neural networks include:

Machine translation^[20]
Robot control^[88]
Fourth dimension serial prediction^[89] ^[90] ^[91]
Oral communication recognition^[92] ^[93] ^[94]
Spoken language synthesis^[95]
Brain–computer interfaces^[96]
Time serial bibelot detection^[97]
Rhythm learning^[98]
Music composition^[99]
Grammer learning^[100] ^[101] ^[102]
Handwriting recognition^[103] ^[104]
Human being action recognition^[105]
Poly peptide homology detection^[106]
Predicting subcellular localization of proteins^[53]
Several prediction tasks in the area of business process direction^[107]
Prediction in medical intendance pathways^[108]

References [edit]

^ Dupond, Samuel (2019). "A thorough review on the current advance of neural network structures". Annual Reviews in Control. 14: 200–230.
^ Abiodun, Oludare Isaac; Jantan, Aman; Omolara, Abiodun Esther; Dada, Kemi Victoria; Mohamed, Nachaat Abdelatif; Arshad, Humaira (2018-11-01). "Country-of-the-fine art in artificial neural network applications: A survey". Heliyon. four (eleven): e00938. doi:10.1016/j.heliyon.2018.e00938. ISSN 2405-8440. PMC6260436. PMID 30519653.
^ Tealab, Ahmed (2018-12-01). "Time serial forecasting using artificial neural networks methodologies: A systematic review". Future Computing and Information science Journal. 3 (2): 334–340. doi:10.1016/j.fcij.2018.10.003. ISSN 2314-7288.
^ Graves, Alex; Liwicki, Marcus; Fernandez, Santiago; Bertolami, Roman; Bunke, Horst; Schmidhuber, Jürgen (2009). "A Novel Connectionist System for Improved Unconstrained Handwriting Recognition" (PDF). IEEE Transactions on Design Analysis and Machine Intelligence. 31 (5): 855–868. CiteSeerX10.1.1.139.4502. doi:x.1109/tpami.2008.137. PMID 19299860. S2CID 14635907.
^ ^a ^b Sak, Haşim; Senior, Andrew; Beaufays, Françoise (2014). "Long Curt-Term Memory recurrent neural network architectures for large scale acoustic modeling" (PDF).
^ ^a ^b Li, Xiangang; Wu, Xihong (2014-x-xv). "Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition". arXiv:1410.4281 [cs.CL].
^ Hyötyniemi, Heikki (1996). "Turing machines are recurrent neural networks". Proceedings of Footstep '96/Publications of the Finnish Artificial Intelligence Society: thirteen–24.
^ Miljanovic, Milos (Feb–Mar 2012). "Comparative analysis of Recurrent and Finite Impulse Response Neural Networks in Time Series Prediction" (PDF). Indian Journal of Calculator and Engineering. 3 (one).
^ Williams, Ronald J.; Hinton, Geoffrey East.; Rumelhart, David Due east. (October 1986). "Learning representations past back-propagating errors". Nature. 323 (6088): 533–536. Bibcode:1986Natur.323..533R. doi:10.1038/323533a0. ISSN 1476-4687. S2CID 205001834.
^ ^a ^b Schmidhuber, Jürgen (1993). Habilitation thesis: System modeling and optimization (PDF). Folio 150 ff demonstrates credit assignment across the equivalent of ane,200 layers in an unfolded RNN.
^ ^a ^b Hochreiter, Sepp; Schmidhuber, Jürgen (1997-11-01). "Long Short-Term Retentivity". Neural Computation. 9 (eight): 1735–1780. doi:10.1162/neco.1997.9.8.1735. PMID 9377276. S2CID 1915014.
^ Fernández, Santiago; Graves, Alex; Schmidhuber, Jürgen (2007). An Awarding of Recurrent Neural Networks to Discriminative Keyword Spotting. Proceedings of the 17th International Conference on Artificial Neural Networks. ICANN'07. Berlin, Heidelberg: Springer-Verlag. pp. 220–229. ISBN978-iii-540-74693-5.
^ ^a ^b ^c Schmidhuber, Jürgen (January 2015). "Deep Learning in Neural Networks: An Overview". Neural Networks. 61: 85–117. arXiv:1404.7828. doi:ten.1016/j.neunet.2014.09.003. PMID 25462637. S2CID 11715509.
^ Graves, Alex; Schmidhuber, Jürgen (2009). Bengio, Yoshua; Schuurmans, Dale; Lafferty, John; Williams, Chris editor-K. I.; Culotta, Aron (eds.). "Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks". Neural Data Processing Systems (NIPS) Foundation: 545–552. ;
^ "2000 HUB5 English Evaluation Speech - Linguistic Information Consortium". catalog.ldc.upenn.edu.
^ Hannun, Awni; Instance, Carl; Casper, Jared; Catanzaro, Bryan; Diamos, Greg; Elsen, Erich; Prenger, Ryan; Satheesh, Sanjeev; Sengupta, Shubho (2014-12-17). "Deep Speech: Scaling upwards finish-to-finish speech recognition". arXiv:1412.5567 [cs.CL].
^ Fan, Bo; Wang, Lijuan; Soong, Frank One thousand.; Xie, Lei (2015) "Photo-Real Talking Caput with Deep Bidirectional LSTM", in Proceedings of ICASSP 2015
^ Zen, Heiga; Sak, Haşim (2015). "Unidirectional Long Brusk-Term Retentiveness Recurrent Neural Network with Recurrent Output Layer for Low-Latency Speech Synthesis" (PDF). Google.com. ICASSP. pp. 4470–4474.
^ Sak, Haşim; Senior, Andrew; Rao, Kanishka; Beaufays, Françoise; Schalkwyk, Johan (September 2015). "Google vox search: faster and more accurate".
^ ^a ^b Sutskever, Ilya; Vinyals, Oriol; Le, Quoc Five. (2014). "Sequence to Sequence Learning with Neural Networks" (PDF). Electronic Proceedings of the Neural Data Processing Systems Conference. 27: 5346. arXiv:1409.3215. Bibcode:2014arXiv1409.3215S.
^ Jozefowicz, Rafal; Vinyals, Oriol; Schuster, Mike; Shazeer, Noam; Wu, Yonghui (2016-02-07). "Exploring the Limits of Linguistic communication Modeling". arXiv:1602.02410 [cs.CL].
^ Gillick, Dan; Brunk, Cliff; Vinyals, Oriol; Subramanya, Amarnag (2015-11-xxx). "Multilingual Language Processing From Bytes". arXiv:1512.00103 [cs.CL].
^ Vinyals, Oriol; Toshev, Alexander; Bengio, Samy; Erhan, Dumitru (2014-xi-17). "Testify and Tell: A Neural Image Explanation Generator". arXiv:1411.4555 [cs.CV].
^ ^a ^b Cruse, Holk; Neural Networks as Cybernetic Systems, second and revised edition
^ Elman, Jeffrey L. (1990). "Finding Structure in Time". Cognitive Science. 14 (ii): 179–211. doi:x.1016/0364-0213(90)90002-E.
^ Jordan, Michael I. (1997-01-01). "Series Order: A Parallel Distributed Processing Approach". Neural-Network Models of Cognition - Biobehavioral Foundations. Advances in Psychology. Neural-Network Models of Knowledge. Vol. 121. pp. 471–495. doi:ten.1016/s0166-4115(97)80111-ii. ISBN9780444819314.
^ Kosko, Bart (1988). "Bidirectional associative memories". IEEE Transactions on Systems, Man, and Cybernetics. 18 (ane): 49–60. doi:10.1109/21.87054. S2CID 59875735.
^ Rakkiyappan, Rajan; Chandrasekar, Arunachalam; Lakshmanan, Subramanian; Park, Ju H. (2 January 2015). "Exponential stability for markovian jumping stochastic BAM neural networks with mode-dependent probabilistic time-varying delays and impulse control". Complexity. 20 (3): 39–65. Bibcode:2015Cmplx..20c..39R. doi:x.1002/cplx.21503.
^ Rojas, Rául (1996). Neural networks: a systematic introduction. Springer. p. 336. ISBN978-3-540-60505-8.
^ Jaeger, Herbert; Haas, Harald (2004-04-02). "Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication". Science. 304 (5667): 78–fourscore. Bibcode:2004Sci...304...78J. CiteSeerXten.one.ane.719.2301. doi:10.1126/scientific discipline.1091277. PMID 15064413. S2CID 2184251.
^ Maass, Wolfgang; Natschläger, Thomas; Markram, Henry (2002-08-twenty). "A fresh look at existent-time computation in generic recurrent neural circuits". Technical report. Institute for Theoretical Figurer Scientific discipline, Technische Universität Graz.
^ ^a ^b Li, Shuai; Li, Wanqing; Melt, Chris; Zhu, Ce; Yanbo, Gao (2018). "Independently Recurrent Neural Network (IndRNN): Building a Longer and Deeper RNN". arXiv:1803.04831 [cs.CV].
^ Goller, Christoph; Küchler, Andreas (1996). Learning job-dependent distributed representations by backpropagation through structure. IEEE International Conference on Neural Networks. Vol. ane. p. 347. CiteSeerX10.1.1.52.4759. doi:x.1109/ICNN.1996.548916. ISBN978-0-7803-3210-ii. S2CID 6536466.
^ Linnainmaa, Seppo (1970). The representation of the cumulative rounding fault of an algorithm as a Taylor expansion of the local rounding errors. M.Sc. thesis (in Finnish), Academy of Helsinki.
^ Griewank, Andreas; Walther, Andrea (2008). Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation (2d ed.). SIAM. ISBN978-0-89871-776-1.
^ Socher, Richard; Lin, Cliff; Ng, Andrew Y.; Manning, Christopher D., "Parsing Natural Scenes and Natural language with Recursive Neural Networks" (PDF), 28th International Briefing on Machine Learning (ICML 2011)
^ Socher, Richard; Perelygin, Alex; Wu, Jean Y.; Chuang, Jason; Manning, Christopher D.; Ng, Andrew Y.; Potts, Christopher. "Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank" (PDF). Emnlp 2013.
^ ^a ^b ^c ^d Schmidhuber, Jürgen (1992). "Learning complex, extended sequences using the principle of history compression" (PDF). Neural Computation. 4 (2): 234–242. doi:10.1162/neco.1992.four.2.234. S2CID 18271205.
^ Schmidhuber, Jürgen (2015). "Deep Learning". Scholarpedia. x (11): 32832. Bibcode:2015SchpJ..1032832S. doi:10.4249/scholarpedia.32832.
^ ^a ^b ^c Hochreiter, Sepp (1991), Untersuchungen zu dynamischen neuronalen Netzen, Diploma thesis, Institut f. Informatik, Technische Univ. Munich, Counselor Jürgen Schmidhuber
^ Giles, C. Lee; Miller, Clifford B.; Chen, Dong; Chen, Hsing-Hen; Lord's day, Guo-Zheng; Lee, Yee-Chun (1992). "Learning and Extracting Finite State Automata with 2nd-Lodge Recurrent Neural Networks" (PDF). Neural Computation. iv (three): 393–405. doi:ten.1162/neco.1992.4.3.393. S2CID 19666035.
^ Omlin, Christian W.; Giles, C. Lee (1996). "Amalgam Deterministic Finite-Land Automata in Recurrent Neural Networks". Journal of the ACM. 45 (half dozen): 937–972. CiteSeerX10.1.1.32.2364. doi:x.1145/235809.235811. S2CID 228941.
^
^ Bayer, Justin; Wierstra, Daan; Togelius, Julian; Schmidhuber, Jürgen (2009-09-xiv). Evolving Memory Cell Structures for Sequence Learning (PDF). Artificial Neural Networks – ICANN 2009. Lecture Notes in Information science. Vol. 5769. Berlin, Heidelberg: Springer. pp. 755–764. doi:10.1007/978-iii-642-04277-5_76. ISBN978-three-642-04276-8.
^ Fernández, Santiago; Graves, Alex; Schmidhuber, Jürgen (2007). "Sequence labelling in structured domains with hierarchical recurrent neural networks". Proc. 20th International Joint Conference on Artificial Intelligence, Ijcai 2007: 774–779. CiteSeerXx.one.ane.79.1887.
^ Graves, Alex; Fernández, Santiago; Gomez, Faustino J. (2006). "Connectionist temporal nomenclature: Labelling unsegmented sequence data with recurrent neural networks". Proceedings of the International Briefing on Machine Learning: 369–376. CiteSeerXx.ane.one.75.6306.
^
^ Heck, Joel; Salem, Fathi M. (2017-01-12). "Simplified Minimal Gated Unit Variations for Recurrent Neural Networks". arXiv:1701.03452 [cs.NE].
^ Dey, Rahul; Salem, Fathi Thousand. (2017-01-20). "Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks". arXiv:1701.05923 [cs.NE].
^ Chung, Junyoung; Gulcehre, Caglar; Cho, KyungHyun; Bengio, Yoshua (2014). "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling". arXiv:1412.3555 [cs.NE].
^ Britz, Denny (Oct 27, 2015). "Recurrent Neural Network Tutorial, Function 4 – Implementing a GRU/LSTM RNN with Python and Theano – WildML". Wildml.com . Retrieved May xviii, 2016.
^ Graves, Alex; Schmidhuber, Jürgen (2005-07-01). "Framewise phoneme classification with bidirectional LSTM and other neural network architectures". Neural Networks. IJCNN 2005. 18 (5): 602–610. CiteSeerX10.one.1.331.5800. doi:ten.1016/j.neunet.2005.06.042. PMID 16112549.
^ ^a ^b Thireou, Trias; Reczko, Martin (July 2007). "Bidirectional Long Short-Term Memory Networks for Predicting the Subcellular Localization of Eukaryotic Proteins". IEEE/ACM Transactions on Computational Biology and Bioinformatics. four (3): 441–446. doi:x.1109/tcbb.2007.1015. PMID 17666763. S2CID 11787259.
^ Harvey, Inman; Husbands, Phil; Cliff, Dave (1994), "Seeing the light: Artificial evolution, real vision", third international conference on Simulation of adaptive behavior: from animals to animats 3, pp. 392–401
^ Quinn, Matthew (2001). "Evolving advice without dedicated advice channels". Advances in Artificial Life. Lecture Notes in Reckoner Scientific discipline. Vol. 2159. pp. 357–366. CiteSeerXx.1.ane.28.5890. doi:ten.1007/3-540-44811-X_38. ISBN978-three-540-42567-0.
^ Beer, Randall D. (1997). "The dynamics of adaptive behavior: A research program". Robotics and Autonomous Systems. 20 (2–4): 257–289. doi:10.1016/S0921-8890(96)00063-ii.
^ Sherstinsky, Alex (2018-12-07). Bloem-Reddy, Benjamin; Paige, Brooks; Kusner, Matt; Caruana, Rich; Rainforth, Tom; Teh, Yee Whye (eds.). Deriving the Recurrent Neural Network Definition and RNN Unrolling Using Signal Processing. Critiquing and Correcting Trends in Machine Learning Workshop at NeurIPS-2018.
^ Paine, Rainer Due west.; Tani, Jun (2005-09-01). "How Hierarchical Control Self-organizes in Artificial Adaptive Systems". Adaptive Beliefs. 13 (3): 211–225. doi:10.1177/105971230501300303. S2CID 9932565.
^ ^a ^b "Burns, Benureau, Tani (2018) A Bergson-Inspired Adaptive Time Constant for the Multiple Timescales Recurrent Neural Network Model. JNNS".
^ Tutschku, Kurt (June 1995). Recurrent Multilayer Perceptrons for Identification and Command: The Road to Applications. Institute of Reckoner Science Research Written report. Vol. 118. University of Würzburg Am Hubland. CiteSeerX10.1.1.45.3527. {{cite book}}: CS1 maint: engagement and year (link)
^ Yamashita, Yuichi; Tani, Jun (2008-11-07). "Emergence of Functional Bureaucracy in a Multiple Timescale Neural Network Model: A Humanoid Robot Experiment". PLOS Computational Biological science. 4 (11): e1000220. Bibcode:2008PLSCB...4E0220Y. doi:10.1371/journal.pcbi.1000220. PMC2570613. PMID 18989398.
^ Alnajjar, Fady; Yamashita, Yuichi; Tani, Jun (2013). "The hierarchical and functional connectivity of higher-order cognitive mechanisms: neurorobotic model to investigate the stability and flexibility of working memory". Frontiers in Neurorobotics. seven: ii. doi:ten.3389/fnbot.2013.00002. PMC3575058. PMID 23423881.
^ "Proceedings of the 28th Annual Conference of the Japanese Neural Network Social club (Oct, 2018)" (PDF).
^ Graves, Alex; Wayne, Greg; Danihelka, Ivo (2014). "Neural Turing Machines". arXiv:1410.5401 [cs.NE].
^ Sun, Guo-Zheng; Giles, C. Lee; Chen, Hsing-Hen (1998). "The Neural Network Pushdown Automaton: Architecture, Dynamics and Training". In Giles, C. Lee; Gori, Marco (eds.). Adaptive Processing of Sequences and Data Structures. Lecture Notes in Computer science. Berlin, Heidelberg: Springer. pp. 296–345. CiteSeerX10.1.1.56.8723. doi:10.1007/bfb0054003. ISBN9783540643418.
^ Snider, Greg (2008), "Cortical calculating with memristive nanodevices", Sci-DAC Review, ten: 58–65
^ Caravelli, Francesco; Traversa, Fabio Lorenzo; Di Ventra, Massimiliano (2017). "The complex dynamics of memristive circuits: analytical results and universal slow relaxation". Physical Review E. 95 (ii): 022140. arXiv:1608.08651. Bibcode:2017PhRvE..95b2140C. doi:10.1103/PhysRevE.95.022140. PMID 28297937. S2CID 6758362.
^ Caravelli, Francesco (2019-11-07). "Asymptotic Behavior of Memristive Circuits". Entropy. 21 (viii): 789. Bibcode:2019Entrp..21..789C. doi:10.3390/e21080789. PMC789. PMID 33267502.
^ Werbos, Paul J. (1988). "Generalization of backpropagation with application to a recurrent gas marketplace model". Neural Networks. 1 (four): 339–356. doi:10.1016/0893-6080(88)90007-10.
^ Rumelhart, David E. (1985). Learning Internal Representations past Error Propagation. San Diego (CA): Institute for Cognitive Scientific discipline, University of California.
^ Robinson, Anthony J.; Fallside, Frank (1987). The Utility Driven Dynamic Error Propagation Network. Technical Report CUED/F-INFENG/TR.1. Department of Engineering science, University of Cambridge.
^ Williams, Ronald J.; Zipser, D. (i February 2013). "Gradient-based learning algorithms for recurrent networks and their computational complication". In Chauvin, Yves; Rumelhart, David E. (eds.). Backpropagation: Theory, Architectures, and Applications. Psychology Press. ISBN978-i-134-77581-i.
^ Schmidhuber, Jürgen (1989-01-01). "A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks". Connection Science. 1 (four): 403–412. doi:ten.1080/09540098908915650. S2CID 18721007.
^ Príncipe, José C.; Euliano, Neil R.; Lefebvre, W. Brusque (2000). Neural and adaptive systems: fundamentals through simulations. Wiley. ISBN978-0-471-35167-two.
^ Yann, Ollivier; Tallec, Corentin; Charpiat, Guillaume (2015-07-28). "Preparation recurrent networks online without backtracking". arXiv:1507.07680 [cs.NE].
^ Schmidhuber, Jürgen (1992-03-01). "A Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks". Neural Computation. iv (2): 243–248. doi:x.1162/neco.1992.four.2.243. S2CID 11761172.
^ Williams, Ronald J. (1989). "Complexity of verbal gradient computation algorithms for recurrent neural networks". Technical Report NU-CCS-89-27. Boston (MA): Northeastern University, College of Reckoner Science.
^ Pearlmutter, Barak A. (1989-06-01). "Learning State Space Trajectories in Recurrent Neural Networks". Neural Ciphering. i (2): 263–269. doi:10.1162/neco.1989.1.two.263. S2CID 16813485.
^ Hochreiter, Sepp; et al. (15 Jan 2001). "Slope flow in recurrent nets: the difficulty of learning long-term dependencies". In Kolen, John F.; Kremer, Stefan C. (eds.). A Field Guide to Dynamical Recurrent Networks. John Wiley & Sons. ISBN978-0-7803-5369-five.
^ Campolucci, Paolo; Uncini, Aurelio; Piazza, Francesco; Rao, Bhaskar D. (1999). "On-Line Learning Algorithms for Locally Recurrent Neural Networks". IEEE Transactions on Neural Networks. 10 (two): 253–271. CiteSeerXten.1.1.33.7550. doi:10.1109/72.750549. PMID 18252525.
^ Wan, Eric A.; Beaufays, Françoise (1996). "Diagrammatic derivation of slope algorithms for neural networks". Neural Computation. eight: 182–201. doi:10.1162/neco.1996.8.1.182. S2CID 15512077.
^ ^a ^b Campolucci, Paolo; Uncini, Aurelio; Piazza, Francesco (2000). "A Indicate-Flow-Graph Approach to On-line Gradient Calculation". Neural Computation. 12 (8): 1901–1927. CiteSeerX10.one.1.212.5406. doi:10.1162/089976600300015196. PMID 10953244. S2CID 15090951.
^ Gomez, Faustino J.; Miikkulainen, Risto (1999), "Solving not-Markovian control tasks with neuroevolution" (PDF), IJCAI 99, Morgan Kaufmann, retrieved 5 August 2017
^ Syed, Omar (May 1995). "Applying Genetic Algorithms to Recurrent Neural Networks for Learning Network Parameters and Compages". 1000.Sc. thesis, Department of Electrical Engineering, Instance Western Reserve Academy, Advisor Yoshiyasu Takefuji.
^ Gomez, Faustino J.; Schmidhuber, Jürgen; Miikkulainen, Risto (June 2008). "Accelerated Neural Evolution Through Cooperatively Coevolved Synapses". Journal of Motorcar Learning Research. 9: 937–965.
^ Siegelmann, Hava T.; Horne, Neb G.; Giles, C. Lee (1995). "Computational Capabilities of Recurrent NARX Neural Networks". IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics). 27 (2): 208–fifteen. CiteSeerX10.1.1.48.7468. doi:10.1109/3477.558801. PMID 18255858.
^ Metz, Cade (May 18, 2016). "Google Built Its Very Own Fries to Ability Its AI Bots". Wired.
^ Mayer, Hermann; Gomez, Faustino J.; Wierstra, Daan; Nagy, Istvan; Knoll, Alois; Schmidhuber, Jürgen (October 2006). A System for Robotic Heart Surgery that Learns to Tie Knots Using Recurrent Neural Networks. 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems. pp. 543–548. CiteSeerXten.1.1.218.3399. doi:x.1109/IROS.2006.282190. ISBN978-1-4244-0258-8. S2CID 12284900.
^ Wierstra, Daan; Schmidhuber, Jürgen; Gomez, Faustino J. (2005). "Evolino: Hybrid Neuroevolution/Optimal Linear Search for Sequence Learning". Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh: 853–858.
^ Petneházi, Gábor (2019-01-01). "Recurrent neural networks for fourth dimension serial forecasting". arXiv:1901.00069 [cs.LG].
^ Hewamalage, Hansika; Bergmeir, Christoph; Bandara, Kasun (2020). "Recurrent Neural Networks for Time Series Forecasting: Current Status and Future Directions". International Journal of Forecasting. 37: 388–427. arXiv:1909.00590. doi:x.1016/j.ijforecast.2020.06.008. S2CID 202540863.
^ Graves, Alex; Schmidhuber, Jürgen (2005). "Framewise phoneme nomenclature with bidirectional LSTM and other neural network architectures". Neural Networks. 18 (five–6): 602–610. CiteSeerX10.1.1.331.5800. doi:10.1016/j.neunet.2005.06.042. PMID 16112549.
^ Fernández, Santiago; Graves, Alex; Schmidhuber, Jürgen (2007). An Application of Recurrent Neural Networks to Discriminative Keyword Spotting. Proceedings of the 17th International Conference on Artificial Neural Networks. ICANN'07. Berlin, Heidelberg: Springer-Verlag. pp. 220–229. ISBN978-3540746935.
^ Graves, Alex; Mohamed, Abdel-rahman; Hinton, Geoffrey E. (2013). "Speech Recognition with Deep Recurrent Neural Networks". Acoustics, Speech and Indicate Processing (ICASSP), 2013 IEEE International Briefing on: 6645–6649. arXiv:1303.5778. Bibcode:2013arXiv1303.5778G. doi:10.1109/ICASSP.2013.6638947. ISBN978-i-4799-0356-half-dozen. S2CID 206741496.
^ Chang, Edward F.; Chartier, Josh; Anumanchipalli, Gopala Thou. (24 April 2019). "Speech synthesis from neural decoding of spoken sentences". Nature. 568 (7753): 493–498. Bibcode:2019Natur.568..493A. doi:10.1038/s41586-019-1119-1. ISSN 1476-4687. PMID 31019317. S2CID 129946122.
^ Moses, David A., Sean L. Metzger, Jessie R. Liu, Gopala G. Anumanchipalli, Joseph G. Makin, Pengfei F. Sun, Josh Chartier, et al. "Neuroprosthesis for Decoding Speech in a Paralyzed Person with Anarthria." New England Journal of Medicine 385, no. 3 (July 15, 2021): 217–27. https://doi.org/10.1056/NEJMoa2027540.
^ Malhotra, Pankaj; Vig, Lovekesh; Shroff, Gautam; Agarwal, Puneet (April 2015). "Long Brusque Term Memory Networks for Anomaly Detection in Time Series" (PDF). European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning — ESANN 2015.
^
^ Eck, Douglas; Schmidhuber, Jürgen (2002-08-28). Learning the Long-Term Structure of the Blues. Artificial Neural Networks — ICANN 2002. Lecture Notes in Calculator Science. Vol. 2415. Berlin, Heidelberg: Springer. pp. 284–289. CiteSeerXx.i.one.116.3620. doi:10.1007/3-540-46084-5_47. ISBN978-3540460848.
^ Schmidhuber, Jürgen; Gers, Felix A.; Eck, Douglas (2002). "Learning nonregular languages: A comparison of simple recurrent networks and LSTM". Neural Ciphering. xiv (9): 2039–2041. CiteSeerXten.1.ane.11.7369. doi:10.1162/089976602320263980. PMID 12184841. S2CID 30459046.
^
^ Pérez-Ortiz, Juan Antonio; Gers, Felix A.; Eck, Douglas; Schmidhuber, Jürgen (2003). "Kalman filters better LSTM network operation in problems unsolvable by traditional recurrent nets". Neural Networks. 16 (2): 241–250. CiteSeerX10.1.1.381.1992. doi:x.1016/s0893-6080(02)00219-viii. PMID 12628609.
^ Graves, Alex; Schmidhuber, Jürgen (2009). "Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks". Advances in Neural Information Processing Systems 22, NIPS'22. Vancouver (BC): MIT Printing: 545–552.
^ Graves, Alex; Fernández, Santiago; Liwicki, Marcus; Bunke, Horst; Schmidhuber, Jürgen (2007). Unconstrained Online Handwriting Recognition with Recurrent Neural Networks. Proceedings of the 20th International Conference on Neural Data Processing Systems. NIPS'07. Curran Assembly Inc. pp. 577–584. ISBN9781605603520.
^ Baccouche, Moez; Mamalet, Franck; Wolf, Christian; Garcia, Christophe; Baskurt, Atilla (2011). Salah, Albert Ali; Lepri, Bruno (eds.). "Sequential Deep Learning for Homo Action Recognition". 2nd International Workshop on Man Behavior Understanding (HBU). Lecture Notes in Computer science. Amsterdam, Netherlands: Springer. 7065: 29–39. doi:10.1007/978-iii-642-25446-8_4. ISBN978-iii-642-25445-ane.
^ Hochreiter, Sepp; Heusel, Martin; Obermayer, Klaus (2007). "Fast model-based poly peptide homology detection without alignment". Bioinformatics. 23 (fourteen): 1728–1736. doi:10.1093/bioinformatics/btm247. PMID 17488755.
^ Revenue enhancement, Niek; Verenich, Ilya; La Rosa, Marcello; Dumas, Marlon (2017). Predictive Business concern Process Monitoring with LSTM neural networks. Proceedings of the International Conference on Advanced Information Systems Technology (CAiSE). Lecture Notes in Information science. Vol. 10253. pp. 477–492. arXiv:1612.02130. doi:10.1007/978-3-319-59536-8_30. ISBN978-3-319-59535-i. S2CID 2192354.
^ Choi, Edward; Bahadori, Mohammad Taha; Schuetz, Andy; Stewart, Walter F.; Dominicus, Jimeng (2016). "Doc AI: Predicting Clinical Events via Recurrent Neural Networks". Proceedings of the 1st Car Learning for Healthcare Conference. 56: 301–318. arXiv:1511.05942. Bibcode:2015arXiv151105942C. PMC5341604. PMID 28286600.

External links [edit]

Recurrent Neural Networks with over 60 RNN papers past Jürgen Schmidhuber'southward group at Dalle Molle Institute for Artificial Intelligence Inquiry
Elman Neural Network implementation for WEKA

herringherl1981.blogspot.com

Source: https://en.wikipedia.org/wiki/Recurrent_neural_network