: Class ReaderTokenizer

Overview

Package

Class

Tree

Deprecated

Index

Help

NCSA Portfolio

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: INNER | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

ncsa.util
Class ReaderTokenizer

java.lang.Object
  |
  +--ncsa.util.ReaderTokenizer

public class ReaderTokenizer
extends java.lang.Object

This object takes an incoming text stream from a Reader and attempts to parse tokens from it.

By default:

- Characters from 0 through ' ' (space) and commas are considered whitespace characters.

- The quote character is " (double quote) and is used to delimit multi-word strings which are returned in sval, with ttype set to TT_WORD The value of nval is indeterminate.

- The characters {}[]/ are returned in ttype as their character value.

- The newline character is set to \n

- The default comment character is #. The # and all characters following it to the end of line are ignored.

- Character strings are collected and returned in sval, with ttype set to TT_WORD. The value of nval is indeterminate.

- Numbers are recognized and parsed as whole, decimal, hexadecimal or exponential return in nval as a double, with ttype set to TT_NUMBER. The value of sval is indeterminate.

- When the end of the Reader's input is reached, all calls to nextToken() will return TT_EOF in ttype.

These defaults can be changed by calling methods within this class. See the method descriptions for more information.

This was written because StreamTokenizer can't be subclassed effectively (try it, you can't), and we needed to be able to parse hexadecimal and exponential text strings.

Field Summary

static int COMMENT


boolean eol
          true if end of line was hit during last nextToken() call

static int ESCAPE


int lineno


int MAXSIZE


static int NEWLINE


static int NUMBER


double nval
          When ttype is set to TT_NUMBER, nval contains a double.

static int QUOTE


java.lang.String sval
          When ttype is set to TT_WORD, sval contains a string.

static int TOKEN


static int TT_EOF


static int TT_NUMBER


static int TT_WORD


int ttype
          Contains a constant indicating what type of token has been returned from calls to nextToken().

static int WHITESPACE


static int WORDCHAR


Constructor Summary

ReaderTokenizer(java.io.Reader r)
          Calling this constructor sets the Tokenizer to use the default parsing described above.

Method Summary

int charType(int val)
          Returns type of the specified character.
Valid values are: WHITESPACE WORDCHAR NUMBER TOKEN QUOTE COMMENT NEWLINE

void commentChar(int c)
          Tags the value c as a quote character.

boolean eol()
          returns true of end of line has been reached

void escapeChar(int c)
          Tags the value c as an escape character.

int lineno()
          returns the current line number of the file being parsed.

void newlineChar(int c)
          Tags the value c as a end of line character.

int nextToken()
          Retrieve the next recognized token.

void ordinaryChar(int c)
          Tags the value c as a single character token.

void parseNumbers()
          Tags all values from zero through 9, period and dash as numbers because these values are legal in numbers

void parseNumbersAsWords()
          Tags all values from zero through 9, period and dash as word chars.

void pushBack()


void putback(char c)
          Pushes the character c back onto the stream so that it can be parsed by the next call the nextToken().

void quoteChar(int c)
          Tags the value c as a quote character.

char read()
          Read a single character

void resetSyntax()
          Reset all parsing rules to the default

void setType(int val, int type)
          Specifies how the internal presentation of a character should be specified.

void whitespaceChar(int c)
          Tags the value c as a whitespace character.

void whitespaceChars(int start, int end)
          Tags all values from start to end as whitespace characters.

void wordChars(int start, int end)
          Tags all values from start to end as legal values which can appear in TT_WORD

Methods inherited from class java.lang.Object

equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

TT_WORD

public static final int TT_WORD

TT_NUMBER

public static final int TT_NUMBER

TT_EOF

public static final int TT_EOF

WHITESPACE

public static final int WHITESPACE

WORDCHAR

public static final int WORDCHAR

NUMBER

public static final int NUMBER

TOKEN

public static final int TOKEN

QUOTE

public static final int QUOTE

COMMENT

public static final int COMMENT

NEWLINE

public static final int NEWLINE

ESCAPE

public static final int ESCAPE

MAXSIZE

public final int MAXSIZE

ttype

public int ttype

Contains a constant indicating what type of token has been returned from calls to nextToken(). Current valid return values are TT_WORD for strings, TT_NUMBER for numbers, and the character value if an individual character was specified as an ordinaryChar().

sval

public java.lang.String sval

When ttype is set to TT_WORD, sval contains a string.

nval

public double nval

When ttype is set to TT_NUMBER, nval contains a double.

eol

public boolean eol

true if end of line was hit during last nextToken() call

lineno

public int lineno

Constructor Detail

ReaderTokenizer

public ReaderTokenizer(java.io.Reader r)

Calling this constructor sets the Tokenizer to use the default parsing described above.

Method Detail

resetSyntax

public void resetSyntax()

Reset all parsing rules to the default

parseNumbers

public void parseNumbers()

Tags all values from zero through 9, period and dash as numbers because these values are legal in numbers

parseNumbersAsWords

public void parseNumbersAsWords()

Tags all values from zero through 9, period and dash as word chars.

wordChars

public void wordChars(int start,
                      int end)

Tags all values from start to end as legal values which can appear in TT_WORD

whitespaceChars

public void whitespaceChars(int start,
                            int end)

Tags all values from start to end as whitespace characters. These characters are ignored during parsing.

whitespaceChar

public void whitespaceChar(int c)

Tags the value c as a whitespace character. This character will be ignored during parsing.

newlineChar

public void newlineChar(int c)

Tags the value c as a end of line character. When this character is encountered, eol is raised.

quoteChar

public void quoteChar(int c)

Tags the value c as a quote character. When a pair of these characters are encountered, all characters between them are returned.

escapeChar

public void escapeChar(int c)

Tags the value c as an escape character. The character following the escape character is treated as an ordinary character.

commentChar

public void commentChar(int c)

Tags the value c as a quote character. When this character is encountered, all characters following it to the end of line are ignored.

ordinaryChar

public void ordinaryChar(int c)

Tags the value c as a single character token.

putback

public void putback(char c)

Pushes the character c back onto the stream so that it can be parsed by the next call the nextToken(). One character may be pushed back.

pushBack

public void pushBack()

read

public char read()

Read a single character

lineno

public int lineno()

returns the current line number of the file being parsed.

nextToken

public int nextToken()

Retrieve the next recognized token.

charType

public int charType(int val)

Returns type of the specified character.
Valid values are: WHITESPACE WORDCHAR NUMBER TOKEN QUOTE COMMENT NEWLINE

setType

public void setType(int val,
                    int type)

Specifies how the internal presentation of a character should be specified.

Parameters:: val - the character; type - the internal representation. Valid values are: WHITESPACE WORDCHAR NUMBER TOKEN QUOTE COMMENT NEWLINE

eol

public boolean eol()

returns true of end of line has been reached

Overview

Package

Class

Tree

Deprecated

Index

Help

NCSA Portfolio

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: INNER | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

Field Summary
`static int`	`COMMENT`
`boolean`	`eol` true if end of line was hit during last nextToken() call
`static int`	`ESCAPE`
`int`	`lineno`
`int`	`MAXSIZE`
`static int`	`NEWLINE`
`static int`	`NUMBER`
`double`	`nval` When ttype is set to TT_NUMBER, nval contains a double.
`static int`	`QUOTE`
`java.lang.String`	`sval` When ttype is set to TT_WORD, sval contains a string.
`static int`	`TOKEN`
`static int`	`TT_EOF`
`static int`	`TT_NUMBER`
`static int`	`TT_WORD`
`int`	`ttype` Contains a constant indicating what type of token has been returned from calls to nextToken().
`static int`	`WHITESPACE`
`static int`	`WORDCHAR`

Constructor Summary
`ReaderTokenizer(java.io.Reader r)` Calling this constructor sets the Tokenizer to use the default parsing described above.

Method Summary
`int`	`charType(int val)` Returns type of the specified character. Valid values are: WHITESPACE WORDCHAR NUMBER TOKEN QUOTE COMMENT NEWLINE
`void`	`commentChar(int c)` Tags the value c as a quote character.
`boolean`	`eol()` returns true of end of line has been reached
`void`	`escapeChar(int c)` Tags the value c as an escape character.
`int`	`lineno()` returns the current line number of the file being parsed.
`void`	`newlineChar(int c)` Tags the value c as a end of line character.
`int`	`nextToken()` Retrieve the next recognized token.
`void`	`ordinaryChar(int c)` Tags the value c as a single character token.
`void`	`parseNumbers()` Tags all values from zero through 9, period and dash as numbers because these values are legal in numbers
`void`	`parseNumbersAsWords()` Tags all values from zero through 9, period and dash as word chars.
`void`	`pushBack()`
`void`	`putback(char c)` Pushes the character c back onto the stream so that it can be parsed by the next call the nextToken().
`void`	`quoteChar(int c)` Tags the value c as a quote character.
`char`	`read()` Read a single character
`void`	`resetSyntax()` Reset all parsing rules to the default
`void`	`setType(int val, int type)` Specifies how the internal presentation of a character should be specified.
`void`	`whitespaceChar(int c)` Tags the value c as a whitespace character.
`void`	`whitespaceChars(int start, int end)` Tags all values from start to end as whitespace characters.
`void`	`wordChars(int start, int end)` Tags all values from start to end as legal values which can appear in TT_WORD

ncsa.util Class ReaderTokenizer

TT_WORD

TT_NUMBER

TT_EOF

WHITESPACE

WORDCHAR

NUMBER

TOKEN

QUOTE

COMMENT

NEWLINE

ESCAPE

MAXSIZE

ttype

sval

nval

eol

lineno

ReaderTokenizer

resetSyntax

parseNumbers

parseNumbersAsWords

wordChars

whitespaceChars

whitespaceChar

newlineChar

quoteChar

escapeChar

commentChar

ordinaryChar

putback

pushBack

read

lineno

nextToken

charType

setType

eol

ncsa.util
Class ReaderTokenizer