It was quite simple, to begin with. Instead of creating a standard recursive descent parser, I decided to take advantage of BASIC's 'line number-command list' structure, and create a simple regex to parse each line. I added a couple of checks to ensure that the command separator (the colon) wasn't mis-presented within quoted strings.
CodeLine.parse = function(str) {
var split = str.match(/^\s*(\d+)\s+(.*)\s*$/m);
if (split) {
var line = new CodeLine(split[1]);
var colon = split[2].split(':');
for(var instr=0;instr<colon.length;++instr) {
var numOfQuotes = (colon[instr].split('"').length - 1);
if (numOfQuotes & 1 == 1) { // there's an odd number quote, so we failed to correctly split
// join this and the next text together to rebuild the string(BUGWARN)
var cmd = colon[instr] + ":" + colon[instr+1];
line.addInstruction(CodeInstruction.parse(cmd)).lineNumber = line.lineNumber;
++instr;
} else {
line.addInstruction(CodeInstruction.parse(colon[instr])).lineNumber = line.lineNumber;
}
}
return line;
}
return null;
}
From here, I studied the syntax of each BASIC command to create one, or sometimes two, regexes to differentiate between them.
if (args = str.match(/^goto\s+(\d+)/i)) {
this.method = new cbGoto(args[1]);
...
} else if (args = str.match(/^for\s+(\w+)\s*=(.*)\s*to\s*(.*)/i)) {
this.method = new cbForLoop(args[1], args[2], args[3], 1);
} else if (args = str.match(/^for\s+(\w+)\s*=(.*)\s*to\s*(.*)\s*step\s+(.*)/i)) {
this.method = new cbForLoop(args[1], args[2], args[3], args[4]);
...
When combined, these steps build up a program parse tree, ready for evaluation. When each command is encountered a callback function is invoked with the current program state, and the operation performed. To handle the GOSUB library, I simply added a system-level callback at the appropriate line number to invoke the HTML5-specific code.
Et voila!
Well, nearly! I still need to properly parse the expressions... And that's where I am now. Awaiting some time to write/discover/incorporate a parser.