NAME

       Julian - Grammar based continuous speech recognition parser.

SYNOPSIS

       julian [-C jconffile] [options ...]

DESCRIPTION

       Julian is a continuous speech recognition parser based on
       finite state grammar. High precision recognition is 
       achieved using a two pass hierarchical search.         

       Julian can perform recognition on microphone input, audio
       files, and feature parameter files. Also, as standard 
       format acoustic models and language models can be used,
       these models can be changed to perform recognition under
       various conditions.

       The maximum vocabulary is 65,535 words.

Model Usage

       Julius uses the following models.


       Acoustic Models
                 Acoustic HMM(Hidden Markov Model) are used.
                 Phoneme models (monophone), context dependent
                 phoneme models (triphone), tied-mixture and 
                 phonetic tied-mixture models can be used. When
                 using context dependent models, interword
                 context is taken into consideration. Files
                 written in HTKs HMM definition language can be
                 used.


       Language Model
                 For the task grammar, sentence structures are
                 written in a BNF style using word categories as
                 terminating symbols to a grammar file. A voca 
                 file contains the pronunciation (phoneme sequence)
                 for all words within each category are created.
                 These files are converted with mkdfa.pl(1) to a 
                 deterministic finite automaton file (.dfa) and a
                 dictionary file(.dict)

Speech Input

       It is possible to recognize live input from either a
       microphone A-D or a DatLink (NetAudio) system. Speech
       waveform files (16bit WAV (no compression), or RAW format)
       and feature parameter files (HTK format) can be used.

       Warning: Julian can only extract MFCC_E_D_N_Z features
       internally. If it is necessary to use HMMs based on
       another type of feature extraction then microphone input
       and speech waveform files cannot be used. Use an external
       tool such as wav2mfcc to create the appropriate feature
       parameter files.

Search Algorithm

  Recognition in Julian uses a two pass structure. In the 
       first pass a high-speed, approximate search is performed 
       using weaker constraints then the given grammar. Here
       a LA beam search using only inter-category constraints
       extracted from the grammar is performed.
       Using the original grammar rules the second pass 
       re-searches the results of the first pass, a high
       precision result is gained quickly. In the second pass 
       the optimal solution is guaranteed using the A* search.

       When using a context dependent phoneme model (triphone),
       interword contexts are considered on both
       the first and second passes. For tied-mixture and phonetic
       tied-mixture models, high speed acoustic likelihood
       calculations using gaussian pruning are performed.

OPTIONS

       The options below allow you to set models used and system
       parameters. You can set these option at the command line,
       however it is recommended that you combine these options
       in the jconf settings file and use the "-C" option at
       run time.

       Below we give an explanation of each of the options.


   Speech Input
       -input {rawfile|mfcfile|mic|netaudio|adinserv}
              Select the speech wave data input source.
              (default: mfcfile)
              For informtaion on file formats refer to the Julius
              documentation.              


       -NA server:unit
              When using (-input netaudio) set the server name
              and unit ID of the DatLink unit to connect to.


       -filelist file
              With (-input rawfile|mfcfile) perform
              recognition on all files contained within the target
              filelist.


       -adport portnum
              With (input adinserv) A-D server port number.


   Speech segmentation 
       -pausesegment

       -nopausesegment
              Force speech segmentation (segment detection) ON / OFF.
              (For mic, adinnet default = ON. For files, default = OFF)

       -lv threslevel
              Amplitude threshold (0 - 32767). If the amplitude
              passes this threshold it is considered as the
              beginning of the speech segment, if it drops below
              this level then it is the end of the speech segment.
              (default: 3000)


       -headmargin msec
              Margin at the start of the speech segment (msec).
              (default: 300)


       -tailmargin msec
              Margin at the end of the speech segment (msec).
              (default: 400)


       -zc zerocrossnum
              Zero crossing threshold. (default: 60)


       -nostrip
              Depending on the sound device, invalid "0" samples
              at the start and end of recording may not be removed
              automatically. The default is to perform automatic removal.


   Acoustic Analysis
       -smpFreq frequency
              Sampling frequency (Hz).
              (default: 16kHz = 625ns).


       -smpPeriod period
              Sampling rate (ns)
              (default: 625ns = 16kHz).


       -fsize sample
              Analysis window size (No. samples).
              (default: 400, 25mS)


       -fshift sample
              Frame shift (No. samples). (default: 160, 10mS)


       -hipass frequency
              Highpass filter cutoff frequency (Hz).
              (default: -1 = disable)


       -lopass frequency
              Lowpass filter cutoff frequency (Hz).
              (default: -1 = disable)


   Language Model(BNF type Grammar)
       -dfa dfa_filename
              Select the finite state automaton grammar file
              (.dfa) to use. (Required)


       -penalty1 float
              First pass word insertion penalty. (default: 0.0)


       -penalty2 float
              Second pass word insertion penalty.
              (default: 0.0)


   Recognition Dictionary
       -v dictionary_file
              Recognition Dictionary File (Required).


       -silhead {WORD|WORD[OUTSYM]|#num}

       -siltail {WORD|WORD[OUTSYM]|#num}
              Sentence start and end silence as defined in the 
              word dictionary.
              (default: "<s>" / "</s>")

              These are dealt with specially during recognition to 
              hypothesise start and end points (margins). They can
              be defined as shown below.

                                      Example
           Word_name                    <s>
           Word_name[output_symbol]     <s>[silB]
           #Word_ID                     #14

            (Word_ID is the word position in the dictionary file 
             (order) starting from 0) 


       -forcedict
              Disregard dictionary errors.
              (Skip word definitions with errors)


   Acoustic Model(HMM)
       -h hmmfilename
              The name of the HMM definintion file to use.
              (Required)


       -hlist HMMlistfilename
              HMMList filename. Required when using triphone 
              based HMMS. Details are contained in the Julius
              documentation. 
              This file provides a mapping between the logical 
              triphones names generated from the phonetic 
              representation in the dictionary and the HMM
              definition names.


       -force_ccd / -no_ccd
              When using a triphone acoustic model these options
              control interword context dependency.If neither of 
              these options are set then the use of interword
              context dependency will be determined from the
              models definition names.
              If the "-force_ccd" option is set when using
              something other then a triphone model, there is no
              guarantee that Julius will run.


       -notypecheck
              Do not check the input parameter type.
              (default: Perform the check)


       -iwcd1 {max|avg}
              When using a triphone acoustic model set the
              interword acoustic likelihood calculation method
              used in the first pass.
                  max: The maximum same context triphone value (default)
                  avg: The average same context triphone value


   Options for tied-mixture and PTM acoustic models
       -tmix K
              Perform Gaussian Pruning only calculate the upper
              k gaussian densities per codebook. (default: 2) 


       -gprune {safe|heuristic|beam|none}
              Set the gaussian pruning technique to use.
              (default: safe (standard) beam (high-speed))


       -gshmm hmmdefs
              Set the Gaussian Mixture Selection monophone
              model to use. A GMS monophone model is generated
              from an ordinary monophone HMM model using the
              attached program mkgshmm(1).
              (default : none (do not use GMS))

       -gsnum N
              When using GMS, only perform triphone calculations
              for the top N monophone states. (default: 24)


   Search Parameters (First Pass)
       -b beam_width
              Beam width (Number of HMM nodes).
              As this value increases the precision also increases,
              however, however processing time and memory usage also
              increase.             

            default values: Model dependent,
                400 (monophone)
                800 (triphone,PTM)
               1000 (triphone,PTM,engine=v2.1)


       -1pass 
              Only perform the first pass search. This mode is
              automatically set when no 3-gram language model
              has been specified (-nlr).


       -realtime

       -norealtime
              Explicity state whether real time processing will be
              used in the first pass or not. For file input the 
              default is OFF (-norealtime), for microphone, or 
              NetAudio network input the default is ON
              (-realtime). This option relates to the way CMN is
              performed: when OFF CMN is calculated for each
              input independently, when the realtime option is ON
              the previous 5 second of input is always used.
              Refer to -progout.


   Search Parameters (Second Pass)
       -b2 hyponum
              Hypothesis envelope width. This number of hypotheses
              are expanded(sorted by length), shorter hypotheses are
              not expanded. This prevents search failures. (default: 30)


       -n candidate_num
              The search continues until "candidate_num" sentence
              hypothesis have been found. These hypotheses are 
              re-sorted by score and the final result is displayed.
              (Refer to the "-output" option). As Julius does not 
              strictly guarantee a optimal second pass search, 
              the maximum likelihood candidate is not always
              given first.

              As this value is increased the probability that the
              maximum likelihood hypothesis is returned increases,
              but as a prolonged search must be performed, the
              processing time also becomes large. (default: 1)   


            default value is dependent on the recognition engine
            settings ("--enable-setup= ").
              10  (standard)
               1  (fast,v2.1)


       -output N
              Used with the "-n" option above. Output the top N 
              sentence hypothesis. (default: 1)


       -sb score
              Score envelope width. For each frame, do not scan
              areasthat deviate from the highest score by more
              then this envelope. This directly relates to the speed
              of the second pass acoustic likelihood calculations.
              (default: 80.0)


       -s stack_size
              The maximum number of hypothesis that can be stored
              on the stack during the search. A larger value gives more
              stable results, but increases the amount of memory
              required. (default: 500)


       -m overflow_pop_times
              Number of expanded hypotheseserequired to
              discontinue the search. If the number of expanded  
              hypotheses is greater then this threshold then, the search 
              is discontinued at that point. The larger this 
              value is, the longer the search will continue, but 
              processing time for search failures will also
              increase. (default: 2000)


       -lookuprange nframe
              When performing word expansion, this option sets 
              the number of frames before and after in which to consider
              word expansion. This prevents the omission of short
              words but, with a large value, the number of hypotheses
              expanded increases and the system slow. (default: 5)  


   Forced alignment
       -walign
              Return the result of viterbi alignment of the word 
              units from the recognition results. 


       -palign
              Return the result of viterbi alignment of the 
              phoneme units from the recognition results. 


   Message Output              
       -quiet Omit phoneme sequence and score, only output
              the best word sequence hypothesis.


       -progout
              Gradually output the interim results from the
              first pass at regular intervals.
 

       -proginterval msec
              set the -progout output time interval (msec).


       -demo  The same as "-progout -quiet".


   Other
       -debug  Display debug information.


       -C jconffile
              Load the jconf settings file. Here runtime options
              can be loaded that are set in this file. 


       -version
              Display program name, compile time, and compile    
              time options.             


       -help 
              Display a brief overview of options.

EXAMPLES

       For examples of system usage refer to the Julian documentation.

SEE ALSO

       mkbingram(1), adinrec(1), adintool(1), mkdfa(1),
       mkgsmm(1), wav2mfcc(1)

DIAGNOSTICS

       On exiting normally, Julian will return the exit status
       0, If an error is found then Julius exits abnormally, and the
       exit status 1 is returned.

       If an input file cannot be found or cannot be loaded for
       some reason then Julian will skip processing for that file.

BUGS

       There are a number of restrictions to the type and size of the
       models Julian can use. For a detailed explanation refer
       to the Julian and Julius documentation.       

       For bug-reports, inquires and comments please contact  
       julius@kuis.kyoto-u.ac.jp

AUTHORS

       Rev.1.0 (1998/07/20)
              Designed by Tatsuya Kawahara and Akinobo Lee
              (Kyoto University)

       Rev.2.0 (1999/02/20)

       Rev.2.1 (1999/04/20)

       Rev.2.2 (1999/10/04)

       Rev.3.1 (2000/05/11)
              Development by Akinobo Lee (Kyoto University)

       Rev.3.2 (2001/08/15)
              Development mainly by Akinobo Lee 
                 (Nara Institute of Science and Technology)

THANKS TO

       Up to Rev.3.1 this program was released under the speech
       media laboratory, Kyoto University (Doshiya Lab). From
       Rev.3.2 Julian has been integrated with Julius and released
       under the "Information Processing Society, Continuous
       Speech Recognition Consortium".

       The Windows Microsoft Speech API compatible version was
       developed by Takashi Sumiyoshi (Kyoto University).

       I am very grateful to all those that provided me with
       timely advice, comments and guidance.

Last modified: 2001/11/16 07:27:14