Skip to content

ChrisKatsaras/SGML-JFlex-Parsing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A1 CIS*4650
Christopher Katsaras

The assignment focuses on parsing an SGML file using JFLEX. After tokenizing then file, output matching specification is sent to output.txt. If there are errors in the file (misaligned tags), disregard output in output.txt and look at error.txt to see errors.

*********
Build/Run
*********

type $ make or make all to build project

type $ make clean to remove .class files and Lexer.java

To run, type $java Scanner < nameofinputfile

************************
Assumptions/ limitations
************************

1.Tags are assumed to be completed. What is meant by this is, there are not tags partially written like <Doc  or /Doc>

2.If there is an error in the file (due to misalignment) the user must look in the error.txt file to see which close tags did not match

3.If an open tag is has attribute value paris then it must be written like so <name attribute = "value"> 
<name attribut = "value value"> will not work, as this is something Professor Song did not specify.

4.The JFLEX library is located at /usr/local/Cellar/jflex/1.6.1/bin/jflex . I have tested this on linux.socs.uoguelph.ca to
confirm but if for any reason this is not the case on the machine you are running, then this path must be changed.

5.It is also important to note that I used the base code provided by Fei Song to start this project. 

************
Testing plan
************

This test file will show a success case where we output valid tags:
This test file also tests all tokens specified for the assignment (open,close,word,number,hyphen,apostrophy,punctuation)

<DOC>
123ABC
mp3
-0.3
+111
+2.34
john's O'Reiley world'cup
father-in-law's
data-base
father-in-law
Hey-Dave-how-is-it-going
---
/ + = ''

</DOC>

<chris align = "chris">
this stuff will not get printed to output
</chris>

<DATE 

>
fei song
</dAtE>


and produces this output in output.txt:

OPEN-DOC
WORD(123ABC)
WORD(mp3)
NUMBER(-0.3)
NUMBER(+111)
NUMBER(+2.34)
APOSTROPHIZED(john's)
APOSTROPHIZED(O'Reiley)
APOSTROPHIZED(world'cup)
APOSTROPHIZED(father-in-law's)
HYPHENATED(data-base)
HYPHENATED(father-in-law)
HYPHENATED(Hey-Dave-how-is-it-going)
PUNCTUATION(-)
PUNCTUATION(-)
PUNCTUATION(-)
PUNCTUATION(/)
PUNCTUATION(+)
PUNCTUATION(=)
PUNCTUATION(')
PUNCTUATION(')
CLOSE-DOC
OPEN-DATE
WORD(fei)
WORD(song)
CLOSE-DATE


An invalid test case like this:

<DOC>
<POP>
</DOC>
</POP>

produces this output in error.txt:

ERROR! Close tag DOC does not match open tag!

and produces this output in stdout:

Open tag does not match closing tag named DOC
Unmatched open tags left are: [DOC]


Releases

No releases published

Packages

No packages published