The hardest part is the schema grammar. A schema grammar for a code file would have to look something like:
For a declaration statement:
DeclStmt
|- Decl
|- Specifier (*A)
|- Type
|- Name
|- Init
|- Expr
...expression
But for some reason at point *A, I get nothing from my automated Java XML parser. I’m able to find the “DeclStmt” declaration but not the Decl.
This bug led to 70 minutes of confusion. I might be led to change the namespace of all the entities of the files to see if that works best to deal with this one problem.
Fixing this will involve a few steps: generating new XML reader code, finding surprises when viewing the results, returning to step 1. XML reader code is generated using an XML Schema file, where the code author takes meticulous steps to describe how an XML file is supposed to look. It may look like this:
<schema>
<complexType name="document">
<sequence><element name="decl_stmt" type="DeclStmtType"></sequence>
</complexType>
<complexType name="DeclStmtType">
<sequence><element name="decl" type="DeclType"></sequence>
</complexType>
<complexType name="DeclType">
<sequence>
<element name="specifier" type="string"/>
<element name="type" type="string"/>
<element name="name" type="string"/>
<element name="init" type="InitType"/>
</sequence>
<complexType>
</schema>
How to read this in within Java or C++ involves getting some library that can translate code like this, into Java or C++ classes that help you work with the data inside “DeclStmtType”s and “DeclType”s. These libraries:
- take the information in the schema
- create Java or C++ classes that resemble the value stated in “name”
- (those words ending in “Type” in the example above)
- nest elements that are included by reference in each named type as fields or accessor methods left behind in the class.
The programmer can take these accessor methods, and after reading other files that “are based on” the schema, the programmer can access whatever is left in memory after the parse is over. The library does seatwork required in expressing in memory the content of other XML files that follow your schema.
Provided the schema file was written correctly, the XML library should do its job and correctly represent the example source code file from the previous example as XML in memory, so I can do things like analyze it.
It Takes Skill
Sometimes, things do not go over as easy. Having to wait until after the entire file is done, and whether it matches one-to-one the intent of the schema, can hide errors – in either my intent in the design of the file, or my design of the program reading memory.
It takes skill to write Schema XSD (the file type of schema) that precisely matches your intent. If your intent does not align with what the schema reads, your generated code will not pick up the right details. I want more control over errors and what problems can come up, but I also want a different paradigm of processing called “implicit adoption”. I’ll talk about this more in my next post, but whenever a certain depth of the program is reached, I want to stop doing deeper processing, and pick up at certain points that I specify.