New scriptformatter.js fix and a GIT repo

A while ago I posted about a user script for beautifying JS code. The point of having a user script is to make it possible to debug all-on-one-line obfuscated/compressed code with Opera Dragonfly.

Parsing and formatting JavaScript is not terribly hard, but there are some language quirks that cause trouble. My goal was always to implement a "real" parser that deals with the input character by character – but keep it as simple as possible and only do what was required to output reasonably formatted and syntactically valid JavaScript.

I've never written a parser before, so I don't know how solid this one really is. I also know there are other scripts that do similar things out there, and they might be better or worse than mine. However, since that first blog post I've been using the script (and its PHP equivalent) for two more years, on the web's most complex JavaScripts, fixed bugs whenever I noticed it breaking anything – the bottom line is that I think it's pretty reliable and it's one of the tools that is most useful to me.

To explain a little bit how it works: it grabs one character at a time from the input script, and keeps a record of the context – whether it is inside a string, inside a comment and so on:

var CODE = 0;  /* normal JS code */
var STRING_DBL = 1;  /* double quoted string */
var STRING_SGL = 2;  /* single quoted string */
var REGEXP = 3 ; /* regexp literal */
var ESCAPE = 4 ; /* some escape char (backslash) */
var MULTI_LINE_COMMENT = 5 ;
var SINGLE_LINE_COMMENT = 6 ;

For every character, it considers whether this character will change the context from one mode to another. For example, if we're in normal code and see a " character, we switch to "double quoted string" mode – and vice versa, if we're already in double quoted mode we switch back to "normal code" mode. If, however, we're in "escape" mode (typically after a backslash inside a string) and see a " character the script knows that this double quote does not terminate the string.

So far, so good. There is one ambiguous character in JavaScript that causes us trouble: the forward slash. In the expression

var b=a/5;

it is a divisor, while in the expression

var b=/a5/;

it marks the boundary of a regular expression. Hence, if we are in code mode and see a forward slash, it takes some extra thinking to tell if we should enter "regular expression" mode or not.

Last week, the forward slash was causing me trouble again. Looking at a problem here, I noticed that the formatting got all messed up after this statement:

var C=unescape(B.replace(/.*?/([^/?]+)?.*/,"$1"));

Did you find the usage of an un-escaped forward slash in the regexp character class weird? Apparently,

/[/]/

is a valid regular expression. 😮

Having fixed that, I thought it was about time to do what I should have done ages ago – script formatter version tracking. Hence, the scripts (both the JS and PHP versions) have a new home on http://github.com/hallvors/javascript-formatter. Feel free to use it and/or contribute changes.

Advertisements

6 thoughts on “New scriptformatter.js fix and a GIT repo

  1. Looking excellent :)Seems to cope with the odd situations I can come up with, but just in case, my favourites are // and /* comments in positions where you might expect a regex, such as during an assignment. These can also be used to throw an off character in where you might otherwise insert an indent:var baz = true, c = false;var foo = //bar/*o*/(/*o*/function/*o*/p/*o*/()/*o*/{var/*o*/a/*o*/=/*b*/c/*o*/,/*o*/b/*o*/=/*o*//b/g/*o*/;/*o*/}/*o*/)/*o*/(/*o*/baz/*o*/)/*o*/;The script copes with it, but it might give you ideas for more corner cases.However, it does show that the script inserts additional pointless linebreaks after terminating ";" characters. Would it be possible to avoid adding those linebreaks if they already exist?

  2. Originally posted by tarquinwj:

    Would it be possible to avoid adding those linebreaks if they already exist?

    It would certainly be trivial to insert some forward-looking statement checking the next character and omitting the newline. The only reason I've chosen not to, is that I'm rarely bothered by this behaviour since I only use scriptformatter.js as a user script while debugging one-liners 🙂 Hence, I so far preferred not slowing down the script with an extra check for existing newlines. Since it's now out there on a version tracking repo anyone preferring a different behaviour should feel free to add it 😉

  3. Comparison tests would be really cool – both performance and correctness would be interesting to look at. A cursory first glance at that JS file at least shows that the script is considerably more ambitious on user options and output aestetics 🙂 I have not used the jsbeautifier.org site much, I did however try to use the Fiddler plugin and was disappointed because it broke too many scripts. If it has improved on the correctness and corner case handling fronts I'd like to try it again.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s