Monday, October 26, 2009

Artificial Larynx

1. An artificial larynx comprising a sound chamber having an air inlet, an outlet for air, an air discharge element communicating with said outlet, a tone-producing element inside said chamber, said tone-producing element having an edge that is vibratable against a receptor element and vibratable in the human voice range in response to air flow from said inlet, and spreader means for spreading said vibratable edge away from said receptor element.

2. The artificial larynx of claim 1 wherein said tone-producing element comprises a reed vibratable against a channeled receptor which communicates with said outlet for vibrating air, and said spreader means is adjustabe for parting the vibrating portion of said reed from said receptor.

3. The artificial larynx of claim 2 wherein the spreader means is a reciprocatory means disposed to part said vibrating portion of the reed from said receptor in response to pressure delivered from said spreader means to said vibrating portion.

4. The artificial larynx of claim 3 wherein said spreader means comprises a manually despressible spring-loaded piston projecting through the outer perphery of said chamber for delivering parting pressure to the vibrating portion of said reed.

5. The artificial larynx of claim 1 which has means for adjusting pitch of the tone produced by said tone-producing element.

6. The artificial larynx of claim 5 wherein said tone, producing element comprises a reed vibratable against a channeled receptor which communicates with said outlet or air, and said means for adjusting pitch comprises manually-modulatable reed-pressing means disposed for gradually shortening and lengthening the vibrating portion of said reed, and the range of said reed-pressing means is adjustable along the longitudinal axis of said reed.

7. The artificial larynx of claim 6 wherein said reed-pressing means is a leaf spring projecting with and gradually and increasingly diverging from said reed, operating pressure on said spring is transmittable thereto from a plunger projecting through the outer periphery of said chamber, and the root of said leaf spring is adjustable along the longitudinal axis of said reed.

8. The artificial larynx of claim 7 wherein said spring, said reed, said channeled receptor, and a reed setter are held together as a disassemblable subassembly that is detachably plugged into a sleeve which constitutes part of the outer periphery of said sound chamber.

9. The artificial larynx of claim 8 wherein said spreader means comprises a manually-depressible, spring-loaded piston projecting through said sleeve and a parting pin inside said sleeve, said pin being disposed to pass through the channeled receptor and spread therefrom the vibrating portion of said reed in response to depression of said piston.

10. The artificial layrnx of claim 1 wherein said sound chamber air inlet has a stoma cover projecting immediately therefrom.

Description:

This invention relates to an acoustical artificial larynx for restoring speech to a person who has had the natural larynx removed. Heretofore various devices of this sort have been proposed. Representative are the following U.S. Pat. No. 1,836,816; 1,840,112; 1,867,350; 1,910,966 and 2,405,850. Advantages of the instant invention over such prior proposals include a novel whispering control, balance and general handiness for one-hand operation, both modulatable and fixed pitch adjustment with the range of modulation adjustable, and particularly easy disassembly and reassembly for cleaning and sanitation. This invention lends itself especially well to light, compact construction.

BROAD STATEMENT OF THE INVENTION

The artificial larynx of this invention comprises a sound chamber having an air inlet, an outlet for air, an air discharge element communicating with said outlet, a tone-producing element inside said chamber, said tone-producing element having an edge that is vibratable against a receptor element and vibratable in the human voice range in response to air flow from said inlet, and spreader means for spreading said vibratable edge away from said receptor element.

BRIEF DESCRIPTION OF THE DRAWING

FIG. 1 is a side view of the invention;

FIG. 2 is a vertical cross-section through the center of FIG. 1;

FIG. 3 is an enlarged end view with portions removed; and

FIG. 4 is a three dimensional exploded view of the invention.

DETAILED DESCRIPTION OF THE INVENTION

FIG. 1 is a side view of a preferred embodiment of the instant artificial larynx . Item 2 is a stoma cover of plastic or, preferably, soft rubber that is adapted to fit snugly over the stoma area, i.e. the tracheal opening in the neck. The small end of stoma cover 2 fits tightly over one end of cylindrical metal sleeve 3, such sleeve being the lateral exterior wall of the sound chamber. Suitably sleeve 3 can be about 17 mm. in inside diamter, about 19 mm. in outside diameter, and about 41/2cm. long. Into the other end of sleeve 3 is detachably plugged a subassembly of a reed (not visible in this view), a leaf spring (not visible in this view), channeled receptor 4, the exposed portion of which is depicted in this view, reed setter 6, the exposed portion of which is depicted in this view, and flexible mouth tube 7, the upstream end of which is held snugly between the exposed portion of receptor 4 and the root (fixed end) of said reed. Said upstream end of tube 7 acts as an outlet for vibrating air from the chamber. Tube 7 conducts such air into the mouth and there discharges it through one or more apertures near the end. Exhaled air passes between channeled receptor 4 and said reed, to vibrate the unsupported end of said reed. The vibrating air then passes through a channel in the receptor (said channel not depicted in this view), out flexible tube 7, and into the mouth where it is formed into speech in the speaker'mouth, nasal, and sinus cavities. Reed setter 6 fits into the exposed portion of said receptor to hold both said reed and said leaf spring firmly in operating position. The receptor and reed setter desirably are made of a hard plastic. Item 8 is a staple embedded into and projecting from reed setter 6 to assist in moving such setter inwardly relative to the chamber to shorten the vibrating length of said reed or draw it outwardly to lengthen the vibrating length of the reed. thus giving a fixed pitch adjustment to said reed as well as to change the range of modulated pitch adjustment. Plunger 9 operates with finger pressure against the leaf spring to give modulated pitch adjustment. Plunger 9 is journalled through guide 11 that projects from sleeve 3 to give easy in-and-out motion with insignificant air leakage. Piston 12, spring-loaded to remain extended outwardly at rest, is likewise journalled through a guide (not visible in this view). Inward pressure on piston 12 acts to spread the vibratable portion (tip) of the reed from the channeled receptor, thereby permitting the user to whisper as well as to disengage the reed tip from the receptor when it is lightly stuck thereto by condensation or the like. FIG. 2 is a vertical cross section through the center of FIG. 1, except that only the upstream portion of flexible tube 7 is depicted instead of all of it. Items 2, 3, 4, 6, 7, 8, 9, 11, 12, and 14 are and act as described above in connection with FIG. 1. From FIG. 2 it can be seen that one end of channeled receptor 4 extends inwardly into cylinder sleeve 3, as does one end of reed setter 6. Also extending inwardly into sleeve 3 is reed 16, typicaly a flat, thin, plastic strip overlapping channel 21 of receptor 4 and forming a gradually widening air aperture between the vibrating portion of such reed and the inward extension of said receptor bounding channel 21. Typically channel 21 is about 3 cm. long, about 3 mm. wide, about 3 mm. deep where it abuts the end of tube 7 and about 2 mm. deep at the other end. (The arrangement is broadly analogous to a clarinet mouthpiece fitted with a reed). Also rooted together with reed 16 and extending into sleeve 3 is leaf spring 17. Reed setter 6 holds spring 17 and reed 16 tightly together at their roots with the root of reed 16 pressing against the upstream end of tube 7 and slightly flattening it. Said end is fitted into the exterior portion of receptor 4. Reed 16 is a piece of flat, thin (about 0.25 mm. or a little less), flexible thermoplastic, typically about 5.3 cm. long and tapering in width from 1 cm. at the root to about 7 mm. at the vibrating tip. It is readily replaceable with other similar reeds and not adhesively united to any other part. On the other hand, leaf spring 16 advantageously is adhesively united (glued) to the inside surface of reed setter 6 and extends outwardly therefrom about 1.3 cm. It is about 0.2 mm. thick and of corrosion-resistant springy metal chamfered slightly torwards the unsupported, depressible tip (the unsupported portion going from about 11 mm. wide at root end to about 9 mm. wide at the tip). Such unsupported portion bends away from the plane of reed 16 gradually. Pressure on plunger 9 pushes spring 17 against reed 16, and this shortens the vibrating portion of the reed gradually while raising its pitch also gradually. There should be minimum leakage in the instrument of vibrating air that should flow to the mouth. Accordingly, items 4, 6, 16, 17, and the upstream end of flexible tube 7 in service should be fitted quite well and held tighly together in a substantially leakproof manner when plugged as a subassembly into sleeve 3. A special feature of this invention, alluded to in FIG. 1, is shown in its preferred structure in FIG. 2. Thus spring-loaded piston 12, when pushed inwardly depresses, spring wire 18. The root of wire 18 is embedded into receptor 4. The free end of wire 18 is bent upwardly to form a parting pin passing through tiny hole 22 in receptor 4. At rest, this pin does not touch reed 16 or interfere with its vibration. However, when whispering is desired, piston 12 is pushed against element 18 to propel the pin end portion thereof contact with reed 16, thereby spreading the vibrating tip more or less but very slightly away from the receptor against which it vibrates. This parting pin can be actuated to free the reed when it is stuck closed or slightly clogged. FIGS. 1 and 2 are drawn approximately to scale whereas FIG. 3 is enlarged for clarity. It depicts the exterior end view of the subassembly fitted together, with tube 7 and staple 8 arbitrarily being cut off substantially flush with the exterior end of channeled receptor 4. Thus, reed setter 6 sits into a slot in the top of receptor 4. Leaf spring 17 is glued to the bottom or reed setter 6, and the spring root presses flush against the up side of the root of reed 16. The bottom of such reed root presses against that upstream portion of tube 7 that is fitted into receptor 4 and is otherwise surrounded by such receptor. Advantageously such engaged tube part merely fits snugly without leakage about the exposed end of receoptor 4. Accordingly, it can be easily removed for cleansing or replacement. It also can be sealed, as with a cement, into said receptor end, at least temporiarily when desired. FIG. 4 is a three dimensional exploded view of the subassembly that is held together by being plugged into the end of sleeve 3, except that the discharge end of tube 7 into the mouth is not shown, but rather only the end which fits into receptor 4. Reading arbitarily from left to right,the upstream end of tube 7 fits snugly into the outlet end of channeled receptor 4 that is exposed. Receptor 4 is of hard plastic with flat planes at interior level 4'. Extending from right to left is facing 4", which is essentially an extension of planes 4'. This facing starts from the left in essentially coplanar fashion with planes 4', but after second shoulder 4''' such facing gradually slopes away from the plane of reed 16 to form the lay for such reed. Item 21 is the channel in receptor 4 whence vibrating air is conducted to the outlet, through tube 7, and into the mouth. Spring 18 is embedded at its left end to fix it into the external part of receptor 4; it is bent at the right hand end to form a pin which passes through hole 22 so that the tip of the pin at rest is just below being flush with the facing. The left hand end of plastic reed 16 rests on planes 4' and the left end of facing 4". The reed can be moved right or left somewhat to adjust its vibratable length. The left hand end of the reed can protrude from the exterior end of receptor 4 if necessary. Desirably the reed will cover all of channel 21 and extends to the right thereof. Leaf spring 17 desirably is glued fairly permanently onto the bottom of reed setter piece 6. Staple 8 projects from the rear of piece 6. Piece 6 and leaf spring 17 fit into the exterior portion of receptor 4, thus making such spring press against the root of reed 16, which in turn is pressed against the inserted part of tube 7 shown in FIG. 4 and also against the left end of facing 4". The spring and reed setter can be moved slightly to the right or left slidably over on planes 4" and the part of facing 4" with which the reed is in contact. This changes the vibrting length of the reed as the clamped-down portion (root) of leaf spring 17 moves along the longitudal axis of the reed. Accordingly, the length of the reed from which pitch modulation is done by depressing the unsupported end of spring 17 against reed 16 can be adjusted. In other words, the range of such modulatable reed-pressing means thereby is made adjustable. When these elements depicted in FIG. 4 are assembled for use, they are plugged in unitary manner into the end of sleeve 3, shown in fragmentary view at the far right. The end of sleeve 3 clamps the subassembly together and surrounds it almost up to the shoulder 6' of the reed setter piece (which is permitted a little right and left adjustment for helping to fix pitch and the modulatable range thereof as explained above). To use this depicted embodiment of artificial larynx , the user places the free end of tube 7 into the mouth, preferably from the side and well back on the tongue. He holds bell-like stoma cover 2 over the stoma, then breathes the desired words while at the same time mouthing them. Breathing through the unit should cause the reed to buzz. When the unit has not been used for a long period of time it sometimes will fail to operate. Blowing hard through it often helps to correct this. If the reed is warped or too much moisture has collected inside, it advantageously is disassembled, the parts dired with tissue, and tube 7 cleansed with a small plug of tissue pushed through it with a wire. If the reed is warped away from the channel of the receptor, thus letting too much air through tube 7, or is warped downwardly toward the receptor to unnecessarily throttle such air, turning the reed over in either case may help out (or the reed can be replaced). The thicker and stiffer the vibrating part of the reed, generally the higher the pitch. The reed itself can be moved to extend it farther into the sleeve for lowering the pitch or drawn back to shorten it and raise the pitch in a non-modulated way. Reed setter 6 also can be moved inwardly to raise the pitch of the reed or outwardly to lower it for an umodulated pitch adjustment. The unit can be partly or almost completely disassembled for washing, scalding or otherwise sterilizing, e.g. with rubbing alcohol. Accordingly, all the materials of construction, plastic, rubber, metal or otherwise advantageously should be tolerant of and resistant to these treatments. Even well-chewed chewing gum or the like can be used temporarily to hold the reed in place when in use and to fill leaks, e.g. about the root area of the reed. While the illustrated preferred embodiment of this invention has a tone-producing element that comprises a single reed vibratable against a channeled receptor, it should be evident that a double reed arrangement could be used with one reed vibrating against the other (analogously to a bagpipe reed). The channel therebetween then would conduct the vibrating air stream to a chamber outlet communicating with the mouth air discharge. Also possible would be a vibratory tone-producing element consisting essentially of a hollow, flat, soft rubber tube supported at its outlet end, with its unsupported end acting to vibrate. In fact, almost any tone-producing element having an edge that is vibratable in the human voice range is broadly suitable. Lower, masculine, tones are most favored for the instant instrument. Reeds preferably are of plastic, but also can be of cane or metal. The spreader means for separating the vibrating reed from its receptor (or opposed reed tips from each other if a double reed is used) can also take alternative forms. Thus, where double opposed reeds or a flattened, flexible tubing is used to provide a plurality of vibratable edges, these can be spread or parted, e.g. by a pin or a tapered element inserted at the tip end of such vibrating elements. The modulatable pitch-adjusting element need not be leaf spring 17 subject to pressure as shown, although this is believed to be a superior construction. Many mechanisms which can foreshorten or lengthen the vibrating portion of the reed gradually or in very tiny increments to modulate the pitch adjustment and thereby the pitch, would be adequate, e.g. hand-operated roller or slider devices readily movable up and down the reed at or near the base of the vibrating portion of the reed. While button-topped, pressure-activated pistons and plungers are the preferred vibration-modulating actuators for this instrument, mechanical equivalents for the purpose (such as spring loaded levers mounted to draw a pin or a leaf spring outwardly at rest and release it for inward contact with the vibratable part of the reed) also are feasible in place of such plungers and pistons. In order to breath in one simply can separate the stoma cover, e.g. item 2, slightly away from the neck. Where it is desired to have a longer term stoma connection, or to plug the air inlet directly into the stoma, it can be useful to have a breathing port cut into the sound chamber, e.g. into sleeve 3. Such port can be covered by a fingertip or a padded, levered cover that is spring-loaded to keep it normally closed. However, to utilize such port simultaneously with pitch and whisper modulator elements most likely would take three fingers, or a thumb and two fingers, advantageously using two hands. Having more than two-finger or a thumb-and-one-finger operation on the device can make it a bit complex to use, as with an ordinary telephone. While the drawings depict the preferred embodiment of the invention in one orientation, it should be obvious that the plungers and pistons or other modulating actuators can be disposed about the periphery of the sound chamber in many different other orientations as necessary or desired, that the stoma cover need not be directly connected to the sound chamber, but rather the input air can be led from such connection through a tube, and further, the flexible outlet tube for vibrating air that goes into the mouth can be terminated to discharge air there differently and can be oriented differently relative to the modulating actuators, etc. without departure from the scope of this invention. Even the flexible tube for vibrating air can be more or less rigid without appreciable loss of utility. Accordingly, the invention should be restricted only by the appended claims. Conventional moulded or extruded plastics and rubbers are particularly well suited for making most of the elements of the instant inventive construction, even some spring elements, and metals such as aluminum, stainless steels, copper, and brass obviously are useful for many of the elements also.

Speech Recognition System

Note : The Images are reversed in sequence i.e. aligned Bottom to top

Fundamentals of Speech Recognition

AUTOMATIC SPEECH RECOGNITION (ASR).

The concept of a machine than can recognize the human voice has long been an accepted feature in Science Fiction. From ‘Star Trek’ to George Orwell’s ‘1984’ - “Actually he was not used to writing by hand. Apart from very short notes, it was usual to dictate everything into the speakwriter.” - it has been commonly assumed that one day it will be possible to converse naturally with an advanced computer-based system. Indeed in his book ‘The Road Ahead’, Bill Gates (co-founder of Microsoft Corp.) hails ASR as one of the most important innovations for future computer operating systems.

From a technological perspective it is possible to distinguish between two broad types of ASR: ‘direct voice input’ (DVI) and ‘large vocabulary continuous speech recognition’ (LVCSR). DVI devices are primarily aimed at voice command-and-control, whereas LVCSR systems are used for form filling or voice-based document creation. In both cases the underlying technology is more or less the same. DVI systems are typically configured for small to medium sized vocabularies (up to several thousand words) and might employ word or phrase spotting techniques. Also, DVI systems are usually required to respond immediately to a voice command. LVCSR systems involve vocabularies of perhaps hundreds of thousands of words, and are typically configured to transcribe continuous speech. Also, LVCSR need not be performed in real-time - for example, at least one vendor has offered a telephone-based dictation service in which the transcribed document is e-mailed back to the user.

From an application viewpoint, the benefits of using ASR derive from providing an extra communication channel in hands-busy eyes-busy human-machine interaction (HMI), or simply from the fact that talking can be faster than typing. Also, whilst speaking to a machine cannot be described as natural, it can nevertheless be considered intuitive; as one ASR advertisement declared “you have been learning since birth the only skill needed to use our system”.

ASR products have existed in the marketplace since the 1970s. However, early systems were expensive hardware devices that could only recognize a few isolated words (i.e. words with pauses between them), and needed to be trained by users repeating each of the vocabulary words several times. The 1980s and 90s witnessed a substantial improvement in ASR algorithms and products, and the technology developed to the point where, in the late 1990s, software for desktop dictation became available ‘off-the-shelf’ for only a few tens of dollars. As a consequence, the markets for ASR systems have now grown to include:

· large vocabulary dictation - for RSI sufferers and quadriplegics, and for formal document preparation in legal or medical services

· interactive voice response - for callers who do not have tone pads, for the automation of call centers, and for access to information services such as stock market quotes

· telecom assistants - for repertory dialing and personal management systems

· process and factory management - for stocktaking, measurement and quality control

The progress in ASR has been fuelled by a number of key developments, not least the relentless increase in the power of desktop computing. Also R&D has been greatly stimulated by the introduction of competitive public system evaluations, particularly those sponsored by the US Defense Advanced Research Projects Agency (DARPA). However, scientifically, the key step has been the introduction of statistical techniques for modeling speech patterns coupled with the availability of vast quantities of recorded speech data for training the models.

The main breakthrough in ASR has been the discovery that recognition can be viewed as an integrated search process, and this first appeared in the 1970s with the introduction of a powerful mathematical search technique known as ‘dynamic programming’ (DP) or ‘Viterbi search’. Initially DP was used to implement non-linear time alignment in a whole-word template-based approach, and this became known as ‘dynamic time warping’ (DTW).

DTW-based systems were quite successful, and could even be configured to recognize connected words. However another significant step came in the late 1980s when pattern matching was replaced by ‘hidden Markov modeling’. This not only allowed systems to be configured for large numbers of users – providing so-called ‘speaker independent’ systems – but ‘sub-word HMMs’ enabled the recognition of words that had not been encountered in the training material.

A hidden Markov model (HMM) is a stochastic generative process that is particularly well suited to modeling time-varying patterns such as speech. HMMs represent speech as a sequence of observation vectors derived from a probabilistic function of a first-order Markov chain. Model ‘states’ are identified with an output probability distribution that describes pronunciation variations, and states are connected by probabilistic ‘transitions’ that capture durational structure. An HMM can thus be used as a ‘maximum likelihood classifier’ to compute the probability of a sequence of words given a sequence of acoustic observations.

Figure 1 illustrates a contemporary ASR system. Incoming speech is subject to some form of front-end signal processing - usually ‘cepstral’ analysis – that outputs a sequence of acoustic vectors. Using Viterbi search, this sequence is compared with an integrated network of HMM states in order to find the path that corresponds to the most likely explanation of the observations. The path reveals the recognized sequence of words.

The key to this approach is the process for compiling the HMM network. Two sets of training corpora are involved; one consisting of many hours of annotated speech material, and another comprising several million words of text. The first is used to estimate the parameters of the ‘acoustic model’ – an inventory of context-sensitive sub-word HMMs such as ‘diphones’ or ‘triphones’ – and the second is used to estimate the parameters of an n-gram ‘language model’. Each word in the target vocabulary is then expressed in terms of a sequence of phonetic sub-word units, and compiled into a network together with the language model and non-speech HMMs (to accommodate noise).

This mainstream approach to ASR is not without its detractors. It is difficult to construct such a system to exhibit accurate discriminatory behavior. As a result, a handful of researchers have investigated ‘artificial neural networks’ (ANNs), particularly for sub-word modeling. However, such systems have not outperformed HMMs on benchmark tests. A more general criticism – primarily leveled at the dominance of the DARPA-sponsored evaluations – has been concerned with the inadvertant suppression of scientific diversity (Bourlard et al, 1996). Participation in such prestige activities not only commits a large research effort, thereby severely reducing the opportunity for lateral thinking, but also discourages any short-term risk that the resultant performance might be worse.

Finally, a comprehensive comparison between ASR and HSR accuracy was performed in 1997. Richard Lippmann presented comparative word error rates for a range of tasks and conditions. The results indicated that ASR currently performs about an order-of-magnitude worse than a human listener.

Bibliography

Bourlard, H., Hermansky, H. & Morgan, N. (1996). Towards increasing word recognition error rates, J. Speech Communication (Vol. 18, pp. 205-231). Elsevier.

Deller, J. R., Proakis, J. G. & Hansen, J. H. L. (2000). Discrete-time processing of speech signals, IEEE Press Classic Reissue, Piscataway, NJ, IEEE Press.

Gibbon, D., Moore, R. K. & Winski, R. eds. (1997). Handbook of Standards and Resources for Spoken Language Systems, Mouton de Gruyter.

Gold, B. & Morgan, N. (2000). Speech and Audio Processing, New York: John Wiley and sons.

Holmes, J.N. & Holmes, W.J. (2001). Speech Synthesis and Recognition (second edition), Taylor and Francis, London.

Jelinek, F. (1997). Statistical Methods for Speech Recognition, Cambridge, MA: MIT Press.

Lippmann, R. (1997). Speech recognition by machines and humans. J. Speech Communication (Vol. 22, pp. 1-15). Elsevier.

O'Shaughnessy, D. (2000). Speech Communications: human and machine, Second Edition, Piscataway, NJ: IEEE Press.

Rabiner, L. R. and Juang, B.-H. (1993). Fundamentals of Speech Recognition, Englewood Cliffs, NJ: Prentice Hall.

Young, S. J. (1996). A review of large-vocabulary continuous-speech recognition. IEEE Signal Processing Magazine (pp. 45–57).

Saturday, October 24, 2009

Artificial Neural Network

Artificial Neural Network Overview

An artificial neural network is a collection of connected models neurons. Taken one at a time each neuron is rather simple. As a collection however, a group of neurons is capable of producing complex results. In the following sections I will briefly summarize a mathematical model of a neuron, neuron layer, and neural network before discussing the types of behavior achievable from a neural network. Finally, I will conclude with a short description of the program included in this lesson so you can form networks that are tailored to your class.

Models

The models presented in this section appear fairly difficult mathematically. However, they eventually boil down to just multiplication and addition. The use of matrices and vectors simplifies the notation but is not absolutely required for this application.

Neuron Model

A model of a neuron has three basic parts: input weights, a summer, and an output function. The input weights scale values used as inputs to the neuron, the summer adds all the scaled values together, and the output function produces the final output of the neuron. Often, one additional input, known as the bias is added to the system. If a bias is used, it can be represented by a weight with a constant input of one. This description is laid out visually below.

Where I1, I2, and I3 are the inputs, W1, W2, and W3 are the weights, B is the bias, x is an intermediate output, and a is final output. The equation for a is given by where f could be any function. Most often, f is the sign of the argument (i.e. 1 if the argument is positive and -1 if the argument is negative), linear (i.e. the output is simply the input times some constant factor), or some complex curve used in function matching (not needed here). For this model we will use the first case where f is the sign of the argument for two reasons: it closely matches the ‘all or nothing’ property seen in biological neurons and it is fairly easy it implement.

When artificial neurons are implemented, vectors are commonly used to represent the inputs and the weights so the first of two brief reviews of linear algebra is appropriate here. The dot product of two vectors and is given by . Using this notation the output is simplified to where all the inputs are contained in and all the weights are contained in .

Neuron Layer

In a neuron layer each input is tied to every neuron and each neuron produces its own output. This can be represented mathematically by the following series of equations:

. . .

NOTE: In general these functions may be different, however, I will take them to be the sign of the argument from now on.

And we will take our second digression into linear algebra. We need to recall that to perform the operation of matrix multiplication you take each column of the second matrix and perform the dot product operation with each row of the first matrix to produce each element in the result. For example the dot product of the ith column of the second matrix and the jth row of the first matrix results in the (j,i) element of the result. If the second matrix is only one column, then the result is also one column.

Keeping matrix multiplication in mind, we append the weights so that each row of a matrix represents the weights of on neuron. Now, representing the input vector and the biases as one column matrices, we can simplify the above notation to:

which is the final form of the mathematical representation of one layer of artificial neurons.

Neural Network

A neural network is simply a collection of neuron layers where the output of each previous layer becomes the input to the next layer. So, for example, the inputs to layer two are the outputs of layer one. In this exercise we are keeping it relatively simple by not having feedback (i.e. output from layer n being input for some previous layer). To mathematically represent the neural network we only have to chain together the equations. The finished equation for the three layer network in this equation is given by:

Neural Network Behavior

Although transistor now switch in as little as 0.000000000001 seconds and biological neurons take about .001 seconds to respond we have not been able to approach the complexity or the overall speed of the brain because of, in part, the large number (approximately 100,000,000,000) neurons that are highly connected (approximately 10,000 connections per neuron). Although not as advanced as biologic brains, artificial neural networks are still perform many important functions in a wide range of applications including sensing, controls, pattern recognition, and categorization. Generally, networks (including our brains) are trained to achieve a desired result. The training mechanisms and rules are beyond the scope of this paper, however it is worth mentioning that generally good behavior is rewarded while bad behavior is punished. That is to say that when a network performs well it is modified only slightly (if at all) and when it performs poorly larger modifications are made. As a final thought on neural network behavior, it is worth noting that if the output function of the neurons are all linear functions, the network is reducible to a one layer network. In other words, to have a useful network of more than one layer we must us a function like the sigmoid (an s shaped curve), the sign function we used above, a linear function that saturates, or any other non-line shaped curve.

Matlab Code

This section covers the parameters in my Matlab code that you might choose to modify if you decide to create a network with inputs and outputs other than what have been already documented in this lesson. Before using my code you should be aware that it was not written to solve general neural network problems, but rather to find a network by randomly trying values. This means that it could loop forever even if a solution to your inputs and outputs exists. If you do not get a good result after a few minutes you may want to stop the execution and change your parameters. Finally, I will not claim that I have worked all bugs out of this program so you should check your results carefully before executing them in a classroom setting.

p1, p2, and p3 are input patterns for three different inputs. Each input pattern consists of three elements pertaining to different attributes of the input. For example in my lesson I used redness, roundness, and softness. Here, for instance, a one in the first position means that an object is red while a zero indicates that it is not red.

a1, a2, and a3 are output patterns. They need to be initialized to be incorrect (that way the program enters the loop rather than bypasses it). The second argument of the conditionals for the loop should be the desired results. In my case, I chose to have one neuron in the last layer be an indicator for each object. When that object was used as an input for the network, that neuron would end up being a one while the other neurons in the last layer would be negative one (if everybody did their math correctly). More explicitly, when the first element of a1 is not a positive one then it is wrong and I want to do the loop again. In a similar manner, when the second element of a1 is not a negative one it is wrong and I want to do the loop again. And the same for the rest of the outputs.

Note that there is one known bug involving the termination of non-terminating decimals (in binary 0.1 is non-terminating). It is possible that a 0.0000 is taken to be positive rather than zero.

Host Distance Estimation Using Artificial Neural Network

Machine Learning

Jianing Hu {hujn@cs.cmu.edu}

1. Introduction

It is an emerging trend that Internet content providers are using multiple hosts to provide the same content, in order to enhance availability and reliability. In such cases it is desirable for the user to know the distances, in terms of metrics such as latency or bandwidth, to other hosts so that he can choose the "nearest" host to access. For example, the user can select to access the nearest of multiple equal-content web servers to get the quickest response.

There are already several proposals on how to provide such a service based on some real-world measurements like those mentioned in [1] and [2]. However, real-world measurement could sometimes be expensive to get. In this project I propose another approach to provide host proximity service without measuring[1]. I will try train a neural network to estimate the distances to a host instead of measuring it.

The training data comes from the "ping" utility. The input of the neural network will be the IP address of the destination host and the time-of-day and weekday. The output will be the estimated latency from a certain host (in this case my machine). Presumably, IP address is a good indicator of latency. For example it's pretty safe to guess that a host at 128.2.*.* is much closer than one at 168.160.*.*. Time-of-day and weekday are chosen as input because they often reflect different traffic patterns and they probably affect the latency.

Some early experiments showed that the function from the chosen parameters to latency is highly non-linear. I tried several ANN structures and input/output encoding schemes. Some of them turned out to be effective.

2. My approach

2.1 Data Collection

There is no ready-to-use data set for this project so I have to compile my own data set. The first step is to get a list of hosts to measure. Because of the overwhelming number of WWW servers and because one of the typical usage of this service is to select from multiple equal content WWW servers, I decided to collect data only for WWW servers. I wrote a helper program that collects WWW host names by following links from several major search engines. I collected 1000 host names for this project. This number is so decided that pinging all hosts takes roughly an hour. Thus provides fine granularity for time-of-day measurement. Further granularity doesn't seem to be necessary.

A script was written that calls the ping utility program upon those hosts continuously. The output of ping, together with the time when ping is called, is stored in a file. Thus how the raw data is collected.

To translate the raw data to a convenient format for ANN training, I wrote a preprocessor that parses the raw data and translates into the following format:

<1st> ... <4th>

represents the weekday by integers from 0 to 6, corresponding to Sunday to Saturday, respectively. and is the time when this measurement takes place. No finer granularity (e.g., second) is used since I believe that finer granularity of time doesn't affect the latency much and even if it did, it's very hard to capture its influence by the data I can get. The measured latency is the average of the four tries of a pinging. If pinging of a host fails in all of its four tries, the raw data of this pinging will be discarded. If some but not all tries fail, the failed tries will be assigned a relatively big value (tentatively 1500ms) and the average will be computed as the actual measured latency.

2.2 Design of ANN

I chose to use back-propagation network with sigmoid as computing unit. Back-propagation network is one of the most widely used structures and sigmoid is one of the most widely used computing units. The advantages of using them include existing code support and clear understanding of their functions and properties. The basic structure of the ANN I used is a two-layer structure. I tried several different structures in experiments and compared their performance. Those structures differ in the number of hidden units and input/output encoding schemes. I'll present in this section the structures of five of the ANNs I used and briefly talk about some other structures I've tried. In the next section I'll present the experimental results. The six ANN structures presented below all have two layers. Each hidden unit is fed by all input units. There is one output unit in each structure, fed by all the hidden units.

The first structure is shown in figure one.

The weekday unit encodes the week day information. It linearly scales weekday from 0-6 to 0.0-1.0, namely, Sunday is encoded as 0.0, Monday as 0.166667, ..., Saturday as 1.0. The time unit encodes time using the formula: time = (hour * 60 + minute)/1439. It linearly scales time to 0.0-1.0. Thus 0:00am is encoded as 0.0 and 23:59pm is encoded as 1.0. IP1 through IP4 encode the IP address, each unit encodes one byte. IPn encodes the nth (starts from 1) byte of the IP address. They also do linear scaling to 0.0-1.0, so a value of 0 is encoded as 0.0 and a value of 255 is encoded as 1.

The measured latency / output is also linearly scaled to 0.0-1.0. Since the maximum latency is chosen to be 1500 ms, a latency of 1500 ms will be encoded as 1.0, a latency of 0 ms will be encoded as 0.0. Therefore an output of 0.0 corresponds to an estimated latency of 0 ms and an output of 1.0 corresponds to an estimated latency of 1500ms, and the estimated latency corresponding to an output value between 0.0 and 1.0 can be easily computed by linear interpolation. In practice it's often desirable not to encode output to a range of 0.0-1.0 but to a smaller range like 0.1-0.9, to avoid long training time. However, in our case the measured latency will never be 0ms or 1500ms, so the encoded latency will never be 0.0 or 1.0. In this case it's actually safe to use a directly linear scaling as mentioned above and an experiment with a revised linear scaling scheme didn't show much difference in performance.

The second structure is same as the first one in unit layout. It only differs from the first one by the encoding of the measured latency / output. The latency is encoded as log(latency)/log(1500). Note that the minimum latency measured is 10 ms, so this formula only gives positive results less than 1.0. I will talk about why I used this encoding scheme in the next section, since it has to do with the evaluation criteria.

The third structure differs from the second one in that it uses 8 input units for each byte of the IP address, each unit corresponding to one bit of the particular byte. The unit is set to 1.0 if the corresponding bit is 1, and 0.0 if the corresponding bit is 0. This change of encoding is based on the observation that the target function is so non-linear that a little difference in the IP addresses could result in great difference in the estimated latencies. Thus I tried to enlarge the differences between IP addresses by using this encoding scheme. The experiment results showed that this scheme is rather effective.

The forth structure goes further on changing the input encoding scheme. It uses two input units to encode time, instead of only one unit used in previous structures. The two input units encode hour and minute of time, respectively, using linear scaling.

The fifth structure differs from the forth one in that it uses 8 hidden units. They are the same in all other aspects.

The sixth structure differs from the forth one only in output encoding. It uses the linear scaling encoding as that used in structure 1.

Other structures used include multi-output-unit, multi-layer(more than two), and partial input-hidden unit connection, namely not all the input units feed in each hidden unit. None of those structures gave outstanding performance and I'll omit them in the following discussion.

3. Experimental results and discussion

In this section I present the experimental results of the five ANN structures mentioned in the last section. The only performance metric I used is accuracy, I use two kinds of error rates to measure it. As argued by the authors of [3], it often suffices to obtain accuracy within a factor of 2. Therefore an error is defined as an estimation that is out of the range of a factor of 2 within its real value. Namely, if the estimation is e and the real latency is r, then e is considered an error if e is not within [r/2, 2r]. I name the rate of this kind of error as value error rate or VER. The other error rate has a closer relationship with the nature of the application of this service. Since this service is used to choose from some otherwise "equal" hosts, sometimes it doesn't really matter if we get the estimated distances to each host right. What really matters is that we get the order of the distances right. The second error rate, called order error rate or OER, measures the rate of the order error. To compute OER, each estimated latency is compared with all other estimated latencies. If an estimated latency e1 is greater/less than another estimated latency e2, while its real latency r1 is less/greater than that of e2's, an order error is considered to occur. The total number of order errors of one estimated latency divided by the number of estimated latencies, is the OER of that estimated latency. The mean of all the latency OERs in an experiment is the OER of that experiment.

The data sets used in the experiments contains 5506 items, each item being the result of one execution of ping. The total number of data items collected is much greater than that. However, due to the time limit to run the experiments I used this relatively small data set. Among the 5506 items, 3660 of them are randomly chosen to be in the training set, the other 1846 are in the testing set. No hold out set is used for simplicity.

The following two charts show the error rates on the test data.

One obvious problem is that none of those structures converges. It's hard to tell where the problem is, probably it's just due to the highly non-linear nature of the target function. A highly non-linear target function requires some weights to be very big, which is hard to be learned by a back-propagation network. However, these experimental results are still useful in comparing the different ANN structures.

In both charts, structure 1 and structure 2 give the worst performance. That shows the effectiveness of the encoding scheme that encodes each byte of the IP addresses with 8 input units. That scheme works because it enlarges the differences between IP addresses and therefore "smooth" the target function.

Another observation is that structure 6 gives significantly worse performance than structure 3, 4, 5. This shows the importance of an appropriate encoding scheme. The linear scaling scheme doesn't work well for output unit because our evaluation criteria are non-linear. The linear scaling scheme favors functions that minimize the absolute error. In our case, where the evaluation criteria are the relative error, the logarithmic scaling works better.

An interesting question to ask is: now that this ANN doesn't converge, how well on earth does it work? Or does it offer any good at all? To answer this question, let's compute what error rate we could get by randomly guessing. Suppose we pick our estimated latency from an even distribution from 0ms to 1500 ms. The VER we might have is. The OER we might have is 50%, too. So it seems that some of the "bad" structures are doing even worse than randomly guessing. However, the above VER is computed under a even distribution of latency. In fact the distribution of latency is not even, but favors smaller values. Therefore the VER of randomly guessing would be much higher. Also observed is that OER is often much better than VER. If the applications that use this service only care about the right order of the distances of hosts, this estimator can still be effectively used.

4. Related work

The SONAR[1] and HOPS[2] project provide architectures of host proximity services. [3] presents an architecture for host distance estimation but their work has little AI support.

Back-propagation network and sigmoid are widely used and thoroughly studied. [4] pg 126-pg127 give a list of references on this topic. William Porter and Abdellatief Abouali worked on ANN designs and presented their own approaches that is different from back-propagation network [5][6][7]. Their work could be an alternative of the back-propagation structure used in this project.

5. Conclusions and future work

Estimating host distances is a hard problem due to the high non-linear nature of the Internet. The result given by my estimator could be effectively used by applications that care only about the right order of the distance of hosts. Since I'm not aware of any performance report of other host proximity services, I can not give a performance comparison here. But I believe that distance estimation has its advantage over distance measuring and future work can improve its performance.

The structure of an ANN can affect its performance greatly. Encoding scheme sometimes affect performance more than unit layout does.

One important future work is to study the reason why the ANNs don't converge. If it's a shortcoming of back-propagation network that it can not learn highly non-linear function, then what kind of network should be used to learn such function? William Porter and Abdellatief Abouali's work[5][6][7] could be a good starting point of future work.

Meanwhile, other ANN structures might help improve the performance. Also worth trying is to use different parameters (e.g., the base of the log) and bigger data sets. During the development of my ANN I realized that lack of a literature on ANN design, namely a general guide for the design of layout and encoding scheme. I could only rely on intuition and experience (none of which is abundant in me, by the way). An engineering book on ANN design is needed and can help widen the use of ANN.

6. References

[1] K. Moore, J. Cox, and S. Green, "Sonar - a network proximity service"

[2] P. Francis, "Host proximity service (hops)"

[3] P. Francis et. al., "An Architecture for a Global Internet Host Distance Estimation Service"

[4] Tom Mitchell, "Machine Learning"

[5] William Porter and Abdellatief Abouali, "On Neural Network Design Part I"

[6] William Porter and Abdellatief Abouali, "On Neural Network Design Part II"

[7] William Porter and Abdellatief Abouali, "Function Emulation Using MVQ Neural Networks"

Abstract

Applicability of Artificial Neural Network (ANN) and Decision Tree (DT) to Digital (predictive) Soil Mapping

Considering that the land degradation caused by deforestation and mismanagement in sloping areas is steadily increasing, conservation-oriented studies in these areas become vital. Fortunately, ample attention is paid to landslide and erosion as the two most common degradation types. The demand for high resolution soil mapping is more and more growing, in particular in land use planning projects.

The objective of this study is focused on applying a few methods of digital soil mapping in inaccessible sloping areas, susceptible to landslide and erosion. The intention is to apply some of the available methods of digital soil mapping in order to select the most effective one to map the soils in a quick, accurate and inexpensive way. Artificial Neural Network (ANN) and Decision Tree (DT) were employed to comply with the objectives. Geopedologic approach was applied as from the first stage; that is the visual image interpretation, through the fieldwork (during the phase of data collection). After the geoform map was produced, training areas could be selected, wherein the application of the Jenny equation and SCORPAN model (recently derived from the Jenny equation) could be executed.

The major task, forming the scientific framework of this exercise, is parameterization of the soil forming factors and their integration. A digital soil mapping was done in the study area, Hoi Num Rin sub-watershed, covering an area of about 20 km². The ANN is based on feedforward-backpropagation learning algorithm determined with one hidden layer. The decision tree is based on the expert system concept. Both methods were applied to integrate the parameterized soil forming factors. The description of soil predictors to train the ANN and to formulate the decision trees: 4 organism types, 7 relief-type units, 9 lithological units, 3 time series, 4 landscape units and 8 landform units were extracted from the maps and databases. The results: soil mapping derived from ANN, 10 soil classes names showed training error (MSE) under 0.003, 98% training accuracy and 39 minutes learning time.

The soil map resulted from using decision tree took much more time; more than 2 days to learn soil and its environment over the landscape and landform variable and to formalize and generalize 10 statements (formulas). Soil physical property maps were used in the ANN topology to predict 32 soil data from sample areas to unsampled areas. For the validation of soil classes with observed data, the results show very high accuracy at Order and Suborder levels, high accuracy in Great group and Subgroup levels and more than 90% matching when compared with decision-tree-derived map. For the validation of soil properties map, there is good accuracy of soil bulk density, shear strength and plasticity index maps, being 69%, 60% and 70%, respectively. In summary, the geopedological approach is quite valuable to obtain special soil information in inaccessible areas. ANN as well as DT can help produce a high resolution map. The difference, however, is that ANN is faster, thus more recommendable in terms of time and cost saving.

Keyword: geopedologic approach to soil survey, predictive soil map, digital soil map, artificial neural network, decision tree, ANN, DT, landslide, erosion.

[1] I should say without real time measuring, since the training data comes from measuring after all.

BIOMED ALL INVITED