Imagine that you have developed an interesting new testing technique, software analysis algorithm, or visualisation approach. You want to do what you can to encourage practitioners and organisations to adopt it. For example, you might attempt to secure some funding to begin a start-up, to develop a self-sustaining company around the tool.
At this point, things become complicated. What business model will you choose? There is the traditional “client-side” approach; you produce a tool, you sell it to customers, or let customers download it for free and offer support for a fee.
Then there is the “server-side” approach [1]. You host the tool on a central server, and clients can use the tool by uploading their data / giving your tool access to their code repositories. This approach has a huge number of benefits. It is easier to track access to your tool, there is a single point at which to apply updates, fix vulnerabilities, add features etc. You can collect accurate usage statistics, and can offer support in a more direct way. If you wish to come up with some sort of pay-per-use arrangement, you do not need to mess about with messy web-linked license files, as so many client-side tools still require.
It was surprising to me that such centralised tools are not more prevalent.
The one major factor seems to be privacy and security. For many organisations, their source code can in effect represent their entire intellectual property. Making this available to a third-party represents an unacceptable risk, regardless of how useful an external tool might be. It is only in a few cases (e.g. Atlasssian’s Bitbucket), where there seems to be a sufficient degree of trust to use server-side tools.
Does this mean that, if you have a tool, you must immediately go down the route of a client-side IDE plugin? Otherwise you immediately seemingly rule out an overwhelming proportion of your user-base.
The flip-side is that, as soon as you release a client-side tool, it becomes harder to deploy within a large organisation, and your business case has to be overwhelming (which is not the case for most academic tool).
To me this is a huge “barrier to impact” for academic software engineering tools. Are there workarounds?
One way to operate the server-side option would be to perhaps have a “data-filtering” step that removes any sensitive information (e.g. identifiers) from the code, but maintains the information required for analysis. Though I’m sceptical that this would be enough to satisfy potential customers.
Are there any more reliable cryptographic approaches that enable a client to use a server-side tool, without the server being able to make use of the data? Could there be something akin to the “Zero Knowledge Protocol” [1] that could be applied to server-side software analysis tools?
[1] Ghezzi, Giacomo, and Harald C. Gall. "Towards software analysis as a service". Automated Software Engineering-Workshops, 2008. ASE Workshops 2008. 23rd IEEE/ACM International Conference on, 2008.
[2] Quisquater, Jean-Jacques; Guillou, Louis C.; Berson, Thomas A. (1990). "How to Explain Zero-Knowledge Protocols to Your Children" (PDF). Advances in Cryptology – CRYPTO '89: Proceedings 435: 628–631.