NAME

       Julius  -  Japanese LVCSR engine

SYNOPSIS

       julius [-C jconffile] [options ...]

DESCRIPTION

  Julius is a open source speech recognition engine that can
       perform continuous speech recognition with a vocabulary in
       the tens of thousands of words. High precision recognition
       can be obtained using a 3-gram based two pass search 
       technique.
       
       Julius can perform recognition on microphone input, audio
       files, and feature parameter files. Also as standard 
       format acoustic models and language models can be used,
       these models can be changed to perform recognition under
       various conditions.

       The maximum vocabulary is 65,535 words.

Model Usage

       Julius uses the following models.     

       Acoustic Models
                 Acoustic HMM(Hidden Markov Model) are used.
                 Phoneme models (monophone), context dependent
                 phoneme models (triphone), tied-mixture and 
                 phonetic tied-mixture models can be used. When
                 using context dependent models, interword
                 context is taken into consideration. Files
                 written in HTKs HMM definition language can be
                 used.

       Language Model
                 The system uses 2-gram and reverse 3-gram
                 language models. Standard format ARPA files can
                 be loaded. Binary format N-gram models built
                 using the attached tool mkbingram can also be
                 used. 
                 

Speech Input

       It is possible to recognize live input from either a
       microphone A-D or a DatLink (NetAudio) system. Speech
       waveform files (16bit WAV (no compression), or RAW format)
       and feature parameter files (HTK format) can be used.

       Warning: Julius can only extract MFCC_E_D_N_Z features
       internally. If it is necessary to use HMMs based on
       another type of feature extraction then microphone input
       and speech waveform files cannot be used. Use an external
       tool such as wav2mfcc to create the appropriate feature
       parameter files.

Search Algorithm

       Julius recognition is based on a two pass strategy. On the
       first pass the entire input is process and a interim
       result is displayed. The model used in this pass is a word
       2-gram and a word HMM tree structured network. Decoding 
       is performed by a frame synchronous beam search.

       The second pass searches using a reverse 3-gram, this
       attempts to gain a higher precision recognition result.
       Word unit stack decoding is performed using the
       restrictions from interim results of the first pass and  
       look-ahead information. 

       When using context dependent phones (triphone), interword
       contexts are taken into consideration. For tied-mixture
       and phonetic tied-mixture models, high-speed acoustic
       likelihood calculation is possible using gaussian pruning. 

OPTIONS

       
       The options below allow you to select the models used and
       set system parameters. You can set these option at the
       command line, however it is recommended that you combine
       these options in the jconf settings file and use the "-C" 
       option at run time.

       Below is an explanation of all the possible options.


   Speech Input
       -input {rawfile|mfcfile|mic|netaudio|adinserv}
              Select the speech wave data input source.
              (default: mfcfile)
              For information on file formats refer to the Julius
              documentation.              


       -NA server:unit
              When using (-input netaudio) set the server name
              and unit ID of the Dateline unit to connect to.


       -firelight file
              With (-input rawfile|mfcfile) perform
              recognition on all files contained within the target
              firelight.


       -adport portnum
              With (input adinserv) A-D server port number.


   Speech segmentation 
       -pausesegment

       -nopausesegment
              Force speech segmentation (segment detection) ON / OFF.
              (For mic, adinnet default = ON. For files, default = OFF)

       -lv threslevel
              Amplitude threshold (0 - 32767). If the amplitude
              passes this threshold it is considered as the
              beginning of the speech segment, if it drops below
              this level then it is the end of the speech segment.
              (default: 3000)


       -headmargin msec
              Margin at the start of the speech segment (msec).
              (default: 300)


       -tailmargin msec
              Margin st the end of the speech segment (msec).
              (default: 400)


       -zc zerocrossnum
              Zerocrossing threshold. (default: 60)


       -nostrip
              Depending on the sound device, invalid "0" samples
              at the start and end of recording may not be removed
              automatically. The default is to perform automatic removal.


   Acoustic Analysis
       -smpFreq frequency
              Sampling frequency (Hz).
              (default: 16kHz = 625ns)


       -smpPeriod period
              Sampling rate (ns)
              (default: 625ns = 16kHz)


       -fsize sample
              Analysis window size (No. samples).
              (default: 400, 25mS)


       -fshift sample
              Frame shift (No. samples). (default: 160, 10mS)


       -hipass frequency
              Highpass filter cutoff frequency (Hz).
              (default: -1 = disable)


       -lopass frequency
              Lowpass filter cutoff frequency (Hz)
              (default: -1 = disable)


   Language Model(N-gram)
       -nlr 2gram_filename
              2-gram language model filename (standard ARPA format)


       -nrl rev_3gram_filename
              Reverse 3-gram language model filename. This is 
              required for the second search pass. If this is
              not defined then only the first pass will take 
              place.


       -d bingram_filename
              Use a binary language model as built using
              mkbingram(1). This is used in place of the "-nlr"
              and "-nlr" options above, and allows Julius to 
              perform initialization quickly.


       -lmp lm_weight lm_penalty
       -lmp2 lm_weight2 lm_penalty2
              Language model score weights and word insertion
              penalties for the first and second passes respectively.
              
              The hypothesis language scores are scaled as shown below

            lm_score1 = lm_weight * 2-gram_score + lm_penalty
            lm_score2 = lm_weight2 * 3-gram_score + lm_penalty2

              The actual hypothesis word score is a N-gram 
              log-likelihood which is scaled is using the 
              appropriate factors given below. 
               
            The default values are dependent on the language model:
               First-Pass | Second-Pass 
               -------------------------- 
                5.0/-1.0  |  6.0/0.0     (monophone)
                8.0/-2.0  |  8.0/-2.0    (triphone,PTM)
                9.0/8.0   | 11.0/-2.0    (triphone,PTM,engine=v2.1)


       -transp float
              Insertion penalty for [transparent words].
              (default: 0.0)


   Word Dictionary
       -v dictionary_file
              Word Dictionary File (Required)


       -silhead {WORD|WORD[OUTSYM]|#num}

       -siltail {WORD|WORD[OUTSYM]|#num}
              Sentence start and end silence as defined in the 
              word dictionary.
              (default: "<s>" / "</s>")

              These are dealt with specially during recognition to 
              hypotheses start and end points (margins). They can
              be defined as shown below.

                                      Example
           Word_name                    <s>
           Word_name[output_symbol]     <s>[silB]
           #Word_ID                     #14

            (Word_ID is the word position in the dictionary file
             (order) starting from 0) 


       -forcedict
              Disregard dictionary errors.
              (Skip word definitions with errors)


   Acoustic Model(HMM)
       -h hmmfilename
              The name of the HMM definition file to use.
              (Required)


       -hlist HMMlistfilename
              HMMList filename. Required when using triphone 
              based HMMS. Details are contained in the Julius
              documentation. 
              This file provides a mapping between the logical 
              triphones names generated from the phonetic 
              representation in the dictionary and the HMM
              definition names.


       -force_ccd / -no_ccd
              When using a triphone acoustic model these options
              control interword context dependency.If neither of 
              these options are set then the use of interword
              context dependency will be determined from the
              models definition names.
              If the "-force_ccd" option is set when using
              something other then a triphone model, there is no
              guarantee that Julius will run.


       -notypecheck
              Do not check the input parameter type.
              (default: Perform the check)


       -iwcd1 {max|avg}
              When using a triphone acoustic model set the
              interword acoustic likelihood calculation method
              used in the first pass.
                  max: The maximum, identical context triphone value (default)
                  avg: The average, identical context triphone value


   Options for tied-mixture and PTM acoustic models
       -tmix K
              When performing gaussian pruning only calculate the upper
              k gaussian densities per codebook. (default: 2) 


       -gprune {safe|heuristic|beam|none}
              Set the gaussian pruning technique to use.
              (default: safe (standard) beam (high-speed))


       -gshmm hmmdefs
              Set the Gaussian Mixture Selection monophone acoustic
              model to use. A GMS monophone model is generated
              from an ordinary monophone HMM model using the
              attached program mkgshmm(1).
              (default : none (do not use GMS))

       -gsnum N
              When using GMS, only perform triphone calculations
              for the top N monophone states. (default: 24)


   Short pause segmentation
       -spdur Set the sp threshold length for use in the first
              pass (number of frames). If number of frames that 
              the sp "unit" has the maximum likelihood is greater
              then this threshold then, interrupt the first pass
              and start the second pass. (default: 10)

       By default short pause segmentation is not used. At 
       configuration time use the "--enable-sp-segment" option to
       perform segmentation.
       (For details refer to the Julius documentation)


   Search Parameters (First Pass)
       -b beam_width
              Beam width (Number of HMM nodes).
              As this value increases the precision also increases,
              however, however processing time and memory usage also
              increase.              

            default values: Model dependent,
                400 (monophone)
                800 (triphone,PTM)
               1000 (triphone,PTM,engine=v2.1)


       -sepnum N
              (Used with the configure option "--enable-lowmem2")
              Number of high frequency words to separate from the
              dictionary tree. (default: 150)


       -1pass 
              Only perform the first pass search. This mode is
              automatically set when no 3-gram langauge model
              has been specified (-nlr).


       -realtime

       -norealtime
              Explicitly state whether real time processing will be
              used in the first pass or not. For file input the 
              default is OFF (-norealtime), for microphone, or 
              NetAudio network input the default is ON
              (-realtime). This option relates to the way CMN is
              performed: when OFF CMN is calculated for each
              input independently, when the realtime option is ON
              the previous 5 second of input is always used.
              Refer to -progout.


   Search Parameters (Second Pass)
       -b2 hyponum
              Hypothesis envelope width. This number of hypotheses
              are expanded(sorted by length), shorter hypotheses are
              not expanded. This prevents search failures. (default: 30)


       -n candidate_num
              The search continues until "candidate_num" sentence
              hypothesis have been found. These hypotheses are 
              re-sorted by score and the final result is displayed.
              (Refer to the "-output" option). As Julius does not 
              strictly guarantee a optimal second pass search, 
              the maximum likelihood candidate is not always
              given first.

              As this value is increased the probability that the
              maximum likelihood hypothesis is returned increases,
              but as a prolonged search must be performed, the
              processing time also becomes large. (default: 1)   


            default value is dependent on the recognition engine
            settings ("--enable-setup= ").
              10  (standard)
               1  (fast,v2.1)


       -output N
              Used with the "-n" option above. Output the top N 
              sentence hypothesis. (default: 1)


       -sb score
              Score envelope width. For each frame, do not scan
              areasthat deviate from the highest score by more
              then this envelope. This directly relates to the speed
              of the second pass acoustic likelihood calculations.
              (default: 80.0)


       -s stack_size
              The maximum number of hypothesis that can be stored
              on the stack during the search. A larger value gives more
              stable results, but increases the amount of memory
              required. (default: 500)


       -m overflow_pop_times
              Number of expanded hypotheseserequired to
              discontinue the search. If the number of expanded  
              hypotheses is greater then this threshold then, the search 
              is discontinued at that point. The larger this 
              value is, the longer the search will continue, but 
              processing time for search failures will also
              increase. (default: 2000)


       -lookuprange nframe
              When performing word expansion, this option sets 
              the number of frames before and after in which to consider
              word expansion. This prevents the omission of short
              words but, with a large value, the number of hypotheses
              expanded increases and the system slow. (default: 5)  


   Forced alignment
       -walign
              Return the result of viterbi alignment of the word 
              units from the recognition results. 


       -palign
              Return the result of viterbi alignment of the 
              phoneme units from the recognition results. 


   Message Output
       -separatescore
              Output the language acoustic scores separately               


       -quiet Omit phoneme sequence and score, only output
              the best word sequence hypothesis.


       -progout
              Gradually output the interim results from the
              first pass at regular intervals.
 

       -proginterval msec
              set the -progout output time interval (msec).


       -demo  The same as "-progout -quiet".


   Other
       -debug  Display debug information.


       -C jconffile
              Load the jconf settings file. Here runtime options
              can be loaded that are set in this file. 


       -version
              Display program name, compile time, and compile    
              time options.             


       -help 
              Display a brief overview of options.

EXAMPLES

       For examples of system usage refer to the Julius documentation.

SEE ALSO

       mkbingram(1),    adinrec(1),    adintool(1),     mkdfa(1),
       mkgsmm(1), wav2mfcc(1)

DIAGNOSTICS

       On exiting normally, Julius will return the exit status
       0, If an error is found then Julius exits abnormally, and the
       exit status 1 is returned.

       If an input file cannot be found or cannot be loaded for
       some reason then Julius will skip processing for that file.

BUGS

       There are some restrictions to the type and size of the
       models Julius can use. For a detailed explanation refer
       to the Julius documentation.       

       For bug-reports, inquires and comments please contact  
       Julius@kuis.kyoto-u.ac.jp

AUTHORS

       Rev.1.0 (1998/02/20)
              Designed by Tatsuya Kawahara and Akinobo Lee
              (Kyoto University)

              Development by Akinobo Lee (Kyoto University)

       Rev.1.1 (1998/04/14)

       Rev.1.2 (1998/10/31)

       Rev.2.0 (1999/02/20)

       Rev.2.1 (1999/04/20)

       Rev.2.2 (1999/10/04)

       Rev.3.0 (2000/02/14)

       Rev.3.1 (2000/05/11)
              Development by Akinobo Lee (Kyoto University)

       Rev.3.2 (2001/08/15)
              Development mainly by Akinobo Lee 
                 (Nara Institute of Science and Technology)

THANKS TO

       From Rev.3.2 Julius is released by the "Information 
       Processing Society, Continuous Speech Consortium"       

       The Windows DLL version was developed and released by
       Hideki Banno (Nagoya University)

       The Windows Microsoft Speech API compatible version was
       developed by Takashi Sumiyoshi (Kyoto University)
  
       I am very grateful to all those that provided me with
       timely advice, comments and guidance.

Last modified: 2001/11/15 07:27:14