NCSA Portfolio

ncsa.util
Class ReaderTokenizer

java.lang.Object
  |
  +--ncsa.util.ReaderTokenizer

public class ReaderTokenizer
extends java.lang.Object

This object takes an incoming text stream from a Reader and attempts to parse tokens from it.

By default:

- Characters from 0 through ' ' (space) and commas are considered whitespace characters.

- The quote character is " (double quote) and is used to delimit multi-word strings which are returned in sval, with ttype set to TT_WORD The value of nval is indeterminate.

- The characters {}[]/ are returned in ttype as their character value.

- The newline character is set to \n

- The default comment character is #. The # and all characters following it to the end of line are ignored.

- Character strings are collected and returned in sval, with ttype set to TT_WORD. The value of nval is indeterminate.

- Numbers are recognized and parsed as whole, decimal, hexadecimal or exponential return in nval as a double, with ttype set to TT_NUMBER. The value of sval is indeterminate.

- When the end of the Reader's input is reached, all calls to nextToken() will return TT_EOF in ttype.

These defaults can be changed by calling methods within this class. See the method descriptions for more information.

This was written because StreamTokenizer can't be subclassed effectively (try it, you can't), and we needed to be able to parse hexadecimal and exponential text strings.


Field Summary
static int COMMENT
           
 boolean eol
          true if end of line was hit during last nextToken() call
static int ESCAPE
           
 int lineno
           
 int MAXSIZE
           
static int NEWLINE
           
static int NUMBER
           
 double nval
          When ttype is set to TT_NUMBER, nval contains a double.
static int QUOTE
           
 java.lang.String sval
          When ttype is set to TT_WORD, sval contains a string.
static int TOKEN
           
static int TT_EOF
           
static int TT_NUMBER
           
static int TT_WORD
           
 int ttype
          Contains a constant indicating what type of token has been returned from calls to nextToken().
static int WHITESPACE
           
static int WORDCHAR
           
 
Constructor Summary
ReaderTokenizer(java.io.Reader r)
          Calling this constructor sets the Tokenizer to use the default parsing described above.
 
Method Summary
 int charType(int val)
          Returns type of the specified character.
Valid values are: WHITESPACE WORDCHAR NUMBER TOKEN QUOTE COMMENT NEWLINE
 void commentChar(int c)
          Tags the value c as a quote character.
 boolean eol()
          returns true of end of line has been reached
 void escapeChar(int c)
          Tags the value c as an escape character.
 int lineno()
          returns the current line number of the file being parsed.
 void newlineChar(int c)
          Tags the value c as a end of line character.
 int nextToken()
          Retrieve the next recognized token.
 void ordinaryChar(int c)
          Tags the value c as a single character token.
 void parseNumbers()
          Tags all values from zero through 9, period and dash as numbers because these values are legal in numbers
 void parseNumbersAsWords()
          Tags all values from zero through 9, period and dash as word chars.
 void pushBack()
           
 void putback(char c)
          Pushes the character c back onto the stream so that it can be parsed by the next call the nextToken().
 void quoteChar(int c)
          Tags the value c as a quote character.
 char read()
          Read a single character
 void resetSyntax()
          Reset all parsing rules to the default
 void setType(int val, int type)
          Specifies how the internal presentation of a character should be specified.
 void whitespaceChar(int c)
          Tags the value c as a whitespace character.
 void whitespaceChars(int start, int end)
          Tags all values from start to end as whitespace characters.
 void wordChars(int start, int end)
          Tags all values from start to end as legal values which can appear in TT_WORD
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

TT_WORD

public static final int TT_WORD

TT_NUMBER

public static final int TT_NUMBER

TT_EOF

public static final int TT_EOF

WHITESPACE

public static final int WHITESPACE

WORDCHAR

public static final int WORDCHAR

NUMBER

public static final int NUMBER

TOKEN

public static final int TOKEN

QUOTE

public static final int QUOTE

COMMENT

public static final int COMMENT

NEWLINE

public static final int NEWLINE

ESCAPE

public static final int ESCAPE

MAXSIZE

public final int MAXSIZE

ttype

public int ttype
Contains a constant indicating what type of token has been returned from calls to nextToken(). Current valid return values are TT_WORD for strings, TT_NUMBER for numbers, and the character value if an individual character was specified as an ordinaryChar().

sval

public java.lang.String sval
When ttype is set to TT_WORD, sval contains a string.

nval

public double nval
When ttype is set to TT_NUMBER, nval contains a double.

eol

public boolean eol
true if end of line was hit during last nextToken() call

lineno

public int lineno
Constructor Detail

ReaderTokenizer

public ReaderTokenizer(java.io.Reader r)
Calling this constructor sets the Tokenizer to use the default parsing described above.
Method Detail

resetSyntax

public void resetSyntax()
Reset all parsing rules to the default

parseNumbers

public void parseNumbers()
Tags all values from zero through 9, period and dash as numbers because these values are legal in numbers

parseNumbersAsWords

public void parseNumbersAsWords()
Tags all values from zero through 9, period and dash as word chars.

wordChars

public void wordChars(int start,
                      int end)
Tags all values from start to end as legal values which can appear in TT_WORD

whitespaceChars

public void whitespaceChars(int start,
                            int end)
Tags all values from start to end as whitespace characters. These characters are ignored during parsing.

whitespaceChar

public void whitespaceChar(int c)
Tags the value c as a whitespace character. This character will be ignored during parsing.

newlineChar

public void newlineChar(int c)
Tags the value c as a end of line character. When this character is encountered, eol is raised.

quoteChar

public void quoteChar(int c)
Tags the value c as a quote character. When a pair of these characters are encountered, all characters between them are returned.

escapeChar

public void escapeChar(int c)
Tags the value c as an escape character. The character following the escape character is treated as an ordinary character.

commentChar

public void commentChar(int c)
Tags the value c as a quote character. When this character is encountered, all characters following it to the end of line are ignored.

ordinaryChar

public void ordinaryChar(int c)
Tags the value c as a single character token.

putback

public void putback(char c)
Pushes the character c back onto the stream so that it can be parsed by the next call the nextToken(). One character may be pushed back.

pushBack

public void pushBack()

read

public char read()
Read a single character

lineno

public int lineno()
returns the current line number of the file being parsed.

nextToken

public int nextToken()
Retrieve the next recognized token.

charType

public int charType(int val)
Returns type of the specified character.
Valid values are: WHITESPACE WORDCHAR NUMBER TOKEN QUOTE COMMENT NEWLINE

setType

public void setType(int val,
                    int type)
Specifies how the internal presentation of a character should be specified.
Parameters:
val - the character
type - the internal representation. Valid values are: WHITESPACE WORDCHAR NUMBER TOKEN QUOTE COMMENT NEWLINE

eol

public boolean eol()
returns true of end of line has been reached

NCSA Portfolio

NCSA Portfolio, Copyright 1997-1999, National Center for Supercomputing Applications, University of Illinois Urbana-Champaign, All Rights Reserved