It's been almost three years since my Web
Framework Manifesto. In those three years, I've
founded the Lift Web
Framework
project, we've shipped Lift 1.0, I've seen a fair number of projects
built with Lift and I've had the honor to interact with a fair number
of excellent engineers at the largest social networking sites.
While Lift has pushed the envelop and allowed developers to
write
high performance, highly interactive web sites, there's a lot more than
needs to be done. Specifically, the impedance mismatch
between
browser-resident data, server-resident data, and the persistance layer.
Additionally, the current crop of persisance mechanisms,
relational databases, are ill-suited to tasks including social
networking and interactive gaming.
A Lift retrospective
Lift
1.0 is out the door. The Lift community has grown beyond
1,000
members. I've built or been part of more than a dozen
Lift-related projects. I'm very happy with how Lift has
evolved.
But, what is Lift?
What is Lift?
When I started the
Lift project, I envisioned Lift being all things, from soup to nuts,
that a web developer would need to build web apps. That
included
an OR mapper, interfaces to Facebook, PayPal, OpenID, etc. As
Lift evolved and I heard from the community and other folks started
contributing to the Lift code base, it became clear that Lift-style
abstractions on top of all the things that people would want to do with
web apps was not the right answer. While there are a bunch of
Lift modules, the core of Lift is Lift Utils and Lift WebKit.
These two modules provide developers a way to securely
abstract
away the HTTP request/response cycle and focus on the business logic of
their applications. In the form of SiteMap as well as pattern
matching and extractors, security and access control is primarily
handled by declarative code. Developers can trust that by the
time a request gets to the view, the access control rules have been
enforced.
Lift's support for Ajax and Comet (a.k.a, Ajax Push or long
polling) is unparalleled.
There is no other framework that allows you to build a real
time
chat app in less than 30 lines of code. Lift's abstractions
of
long polling combined with Scala's Actors provide an amazingly simple
way to build web sites that update automatically and scale very
effectively, and without deadlocks and other hallmarks of
multi-threaded code. Lift's pattern of binding HTML elements
to
functions
provides a secure, unified, and extremely concise mechanism for
developers to write standard HTML as well as Ajax apps.
Lift offers unparalleled security
Lift apps are secure by default. Here is a partial
list of Lift's security features:
- The Java Virtual Machine is impervious to buffer overflow
attacks
- Scala
is a strongly typed language so you will always know the type of the
parameter passed to your method (e.g., you will not get a List[String]
when you were expecting a String)
- Request parameters are
treated as UTF-8 and all Strings are URL decoded before they reach
application level methods (unless the developer actively requests
earlier access to the data)
- Lift builds all web pages as well
formed XHTML rather than a String or a stream of characters or bytes.
This means that the developer doesn't have to think about
escaping strings to HTML, it's done correctly and automatically.
If the developer wants to include non-escaped characters in
the
output, the developer must affirmatively use the Unparsed directive.
This means that the developer must actively do something to
open
a cross site scripting hole in the application.
- Lift apps are
built with Lift's mapper or JPA. In both persistence
frameworks,
SQL query parameters are properly escaped. The developer must
actively choose to send a raw query String to the database before the
app is vulnerably to an SQL injection attack.
- Lift's forms
mechanism associates randomly generated GUIDs with form elements and
functions. When forms are submitted containing the GUID, the
function is invoked. This means that replay attacks are not
possible with Lift apps and it means that only the form fields
presented to the user can be submitted back and cause code to execute
on the server. An attacker cannot add fields to a POST and
have
those fields cause code to be executed on the server.
Further,
Lift's default select and multi-select generators will reject any
options that were not originally presented when the form was generated.
All this means is that parameter tampering attacks are very
difficult against Lift apps.
- Lift's SiteMap provides unified
menu generation and access control. All pages have access
control
rules and if the rules do not evaluate to true, links to the page will
not be presented. When a page is requested, the access
control
rules are evaluated and if they do not succeed, the request will not be
serviced. The access control rules are defined declaratively
and
can be easily audited in a code review.
All of the above
means that Lift apps are secure. I've been through a number
of
penetration tests with Lift apps that I've put into production.
The pen testers have never found a material security
vulnerability in any of the apps they tested. There's not one
single vulnerability in the OWASP to 10 that's crept into a Lift app
I've worked on. I've heard similar reports from other folks
who
have gone through pen tests with their Lift apps.
The other stuff
The
other parts of Lift are valuable and helpful, but they are not core to
Lift. Having built in support for XMPP and AMQP as well as
PayPal
and OpenID makes building apps nice. There is the Lift
Widgets
module for stuff like calendars, etc. This all makes building
rich Lift apps easy because so many of the pieces are there.
Additionally, the Lift WebKit has no dependencies on the
other
Lift modules. This means that the other Lift modules could
have
been written by external parties. This modularity is nice if
you
want to use your own persistence layer, your own JavaScript libraries
(rather than jQuery or YUI, the Lift defaults), etc.
Share nothing is fail
One
thing that has become very clear after spending lots of time with Lift
and watching other folks adopt Lift: share nothing is fail.
The
share nothing architectures are thin layers on top of relational
databases. The amount of work that it takes to build secure,
complex apps
in share nothing is amazingly high. Share nothing apps must
either be primarily stateless or they must put the state someplace.
Both choices lead to seriously suboptimal results in the goal
is
to build secure or interactive applications.
The
primarily stateless model
of share nothing apps is that there's a cookie that represents the
current session. The information in the session itself is
usually
the primary key of the current user. All the other
information
regarding state is based on the URL and any parameters that are passed
as part of the request. The difficultly in building
multi-form
wizards or shopping carts or other things with state is huge.
Is
the state kept in cookies or in hidden fields? If so, the
state
is subject to parameter tampering. Putting aside the security
issues, the developer effort required to marshal and unmarshal state on
each request is significant. Even if the developer gets it
right,
making any changes to the code becomes a seriously difficult issue.
The
other option for share nothing is to push state into memcached or the
RDBMS. This is a more secure method because the state cannot
be
altered changing parameter or cookie values. However, the
issues
related to marshaling remain. Further, there is pressure put
on
the RDBMS unless memcached is used to store state, but memcached is a
cache and subject to cache misses which means that state can go away.
Sure it works to marshal via the RDBMS, but then you have a
single point of pressure in your application and when the RDBMS hits
the wall, your application will come to a grinding halt... think fail
whale here.
There are a number of other success web frameworks, including
Wicket, WebObjects, and Seaside, that are highly stateful.
Many high volume sites (including Apple's store) run on these
systems and have a very good uptime record.
Frameworks still evolving
At the end of the day, web frameworks are still evolving.
Rails'
awesome leap forward in developer productivity provided the catalyst
for other folks to critically evaluate the web development process.
While I believe Lift has a bunch of really good concepts
built
into it, I think there are other better ways to describe web
applications. I expect web frameworks as a whole to be a
growing,
evolving category over the next 10 years until we've been able to
capture the semantics of web development and refined and reduced the
semantics into composible structures.
Models
Sometimes, software abstractions grow out of the definitions
of the underlying computing systems. For example, C
abstracted assembly language. C++ abstracted C.
Java abstracted C++ (more or less and in the case of
templates, a whole lot less.) Sometimes, software abstracts
the needs of the application developer. SQL and COBOL are
good examples. But as the application needs change, the
tools and frameworks must change to support the application demands.
SQL is an example of
an excellent response to the real world needs of business applications.
SQL is great for ERP. SQL in the abstract is good
for more recently
popular applications such as social networking. In practice,
SQL
databases are not good at dealing with social graphs once the social
graphs exceed a certain size. However, it's become clear that
SQL does not solve all the problems on the web. The rise of
CouchDB, Google App Engine/BigTable, Amazon's SimpleDB, etc. indicates
that for the web, the relational model is not the right fit.
Just as Rails signaled a shift in web frameworks, the spate of
new persistence technologies is signaling that it's time for us to
reexamine the way we store data and the way that we reflect the flow of
data from the edge of the network, typically the browser, through the
business logic (increasingly browser resident), through the
server-application into persistence and/or other systems that will
trigger the flow of data to other edge devices. That's a long
way of saying, "moving this data is hard and we don't have a clue how
to do it right."
The problem is further amplified by the rise of new client
technologies: DHTML (the real stuff) and Flash/Flex/Air.
Interactive client apps are pretty darned hard to build and
even harder to keep up to date with servers that have different
business logic models and different persistence models (not to mention
different object models.) While Adobe is doing a good job of
providing data synchronization tools for Air apps, I don't think
proprietary systems will have the reach that open systems built on open
standards do.
There needs to exist a unified model for building real-time
interactive web applications from the persistence layer, through the
messaging layer, through the business logic layer, out to the client.
Lift is part of this equation. Goat Rodeo will
become the other part of the equation.
Goat Rodeo will provide:
- A compiler-checked, Scala based unified model for
describing data structures that can be consumed by Scala, JavaScript
and anything else that speaks JSON... these items are called Qs (quanta
of information)
- A distributed transaction model based on the ZooKeeper
project. This will support 20K transactions per second.
- A scalable persistence layer, most likely built on Cassandra
- A transaction model based on Software
Transactional Memory that is exposed to the developer using
Scala's for comprehension
- References that support the storage of Qs, doubly linked
lists of Qs or maps of Qs with persistence definitions of local,
ZooKeeper, and long term
- Association of methods with Qs such that the methods can be
emitted in native Scala or JavaScript... the code is a subset of Scala
and can be pushed to environments (e.g., the browser
With the above building blocks, you get:
- Scalable persistence
- Distributed actors
- Synchronization of data and data model from browser through
long term storage (yes, I'll be working to make sure this works with
Lee's jsync.)
- Unified interprocess communications, even across
heterogeneous processes
- What I believe will be a very scalable system for social
networks and other social, interactive web apps to build on top of
Right now, Goat Rodeo is in the noodling and building phase.
I'm working on the Scala compiler plugin that does the
enforcement of Q types as well as the generation of serialization and
deserialization. I'm also wiring up ZooKeeper and Cassandra
such that the play with transactional niceness with each other.
I am not developing Goat Rodeo in a vacuum. Goat
Rodeo will be a back end for Lift's new Record system (along with JDBC
and JPA). I am working on a real time information sharing
system so that people can annotate and link pictures and other
information from the Scala Lift Off as a test bed and driver for Goat
Rodeo.
I expect that by end of summer, Goat Rodeo and its associated
Lift integration will be ready for someone who isn't me to play with.
In the mean time, if you're interested in helping to shape
the direction of Goat Rodeo, please drop by the Lift list
and share your thoughts.
Oh... and if you're wondering about the name... just think
about the job of managing the piles of data that come into your
application... it's a...