Cloud Services

To Rewrite or Not To Rewrite

Back in 2017, when I was looking around for my next challenge, I wrote this article while preparing for an Amazon interview. I just came across this and thought it was a good story to share about the early days of Power BI SaaS offering.

Most decisions are made with analysis, but some are judgment calls not susceptible to analysis due to time or information constraints.  Please write about a judgment call you’ve made recently that couldn’t be analyzed.  It can be a big or small one, but should focus on a business issue.  What was the situation, the alternatives you considered and evaluated, and your decision-making process?  Be sure to explain why you chose the alternative you did relative to others considered.

An existential question most engineering leaders face at least once in their career is whether to reuse or rewrite a core piece of their technology due to changing trends impacting their business. Recently as a Group Engineering Manager for Microsoft’s PowerBI team I had to make a similar decision. The choice is usually a hard one since you don’t always have the data required to make an analyzed decision and it is more a judgement call based on past experiences and future trends on which you need to take a bet on. 

Before I describe the specific decision taken, it would be helpful to provide some context as to where we were as a business. PowerBI’s core value proposition is to provide self-service BI tools targeted at business users. Towards this goal, we had released PowerBI 1.0 as an addon for Office 365 users which leveraged the analysis and reporting capabilities of Excel on the web. As part of that effort, the visualization stack used for the charting capabilities were based on an existing implementation from Excel. For the web application, our team had transformed it into a server hosted rendering component fitted with a client side support for drawing of the charts. The team built a new concept called Visual Representation Markup for communicating the shape (or geometry) of charts on the server side to the client. Nine months into the project, the MVP of the visualization stack was released along with release of PowerBI 1.0. At that time the team was aware of the high technical debt they had accrued due to the complex nature of the stack. One of the primary areas of concern was the performance of the stack since it was a chatty service and each user interaction required several round trips to the service from the client. The team had drawn up a plan to address these performance issues and implement other optimization techniques.  

After the initial release of PowerBI 1.0, senior leadership team challenged with slow user adoption decided to pivot PowerBI from an Excel based workload to a more generic SaaS service. The service was focused on allowing business users to connect to a variety of data sources and help them gain insights using interactive charts. As a part of this pivot, I was promoted to the role of Group Engineering Manager for the SaaS service and I inherited the visualization stack. Given the intense pressure to release the new service, we had to decide the execution strategy for our visualization stack. Our choice at that point was either –  

  • double down on the existing server based rendering stack and improve the performance of the service, or 
  • invest in rewriting the visualization stack on a client based technology leveraging the best of breed charting solutions available in the market like D3.js, C3.js, Highcharts, etc. 

One of the main reason for the server based stack was the desire to use a single implementation across desktop and web. The goal behind it was to provide a consistent experience for our users. Given the team’s and my previous experience with web technologies and the requirements around experience, we trusted our instincts that a rewrite using the latest HTML5/JavaScript technologies would make the team more agile and help meet our goals both on user requirements and engineering cost. After an assessment of the available frameworks, we chose to use D3.js which had industry wide adoption and strong community support. The technical challenge was to wrap this framework within our visualization stack and implement the experience consistency. We didn’t think a continued investment in the server based stack would have resulted in a better outcome. The reasoning was based on two fronts – 

  • User Experience – a client based implementation would provide the best experience in terms of performance and fluidity. Making a chatty server based implementation perform better would have been prohibitively expensive. 
  • Engineering Cost – Leveraging community built solution would make our engineering cost lower since we could completely devote our focus on making the integration solid and invest more in the user experience.  

Armed with a few early prototypes demonstrating the proposed new architecture, we undertook the task of convincing senior leadership that this would be the right call. Microsoft has long struggled with the “not-invented-here” syndrome where distrust in community solutions caused it to invest in building everything internally. But based on our persistence and presentation of the pros & cons, we convinced the leadership to provide us a 3-month window to deliver a MVP to replace the existing stack. Based on an aggressive execution plan we delivered a performant and user delightful experience with the new stack on time. With the engineering agility afforded by the new stack, we were able to react quickly to user feedback and make our offering one of the most compelling BI solutions in the market with close to 6 million subscribers in the first year since launch. 

Programming

Running a SaaS/Web app for zero cost

Recently I had to help a friend with a web application which they wanted to be implemented very quickly and invest as low as possible for keeping the application running (they are on a shoe string budget). I looked around at the various PaaS/IaaS/SaaS building blocks they could use and most of them provided a free option. There are two types of free options in the market – time based vs quota based. Given my friend wanted to see if her idea had any legs, a quota based option was the best for her (having to pay because your usage went up is a good problem to have). So finally the solution was built using the following technologies/services –

  • Technology – Modern web application implemented using the latest JavaScript framework (AngularJS).
  • CI/CD – We used Github for code repository, bug/work item tracking and milestone planning. As for continuous integration, we leveraged the Travis CI (travis-ci.org) integration with Github.
  • Data storage – We used Firebase (firebase.google.com) since it provided the best JSON document storage which also had the added benefit of –
    • integration with the latest web frameworks (AngularJS)
    • realtime change notifications (this feature is amazingly good)
    • the free plan provides upto 1GB storage.
  • Image/File storage – We used Cloudinary (cloudinary.com) for storing images which provides one of the best in class solution for image storage, image manipulation and caching. As for regular file storage, Firebase provides a 5GB storage for files in the free plan.
  • Hosting – We used Firebase for this again since it provided custom domains in it’s free plan. Being part of Google, their hosting is backed by Google’s CDN system.
  • Telemetry – We used Localytics (localytics.com) for logging telemetry, measuring and tracking usage for the application. They provide a free subscription for upto 10,000 MAU.
  • Backend Services – any web application will require some server side processing (mainly for security reasons). Azure (portal.azure.com) provides a free plan for running basic web services in a shared environment. Their Azure Functions is also a great alternative for rapid deployment of backend functionality. They integrate really well with Github repositories which enables a reliable CI/CD integration.
  • Error Tracking – any web application needs a way of tracking errors faced by customers in the browser. There are several options available in this space, but we ended up using Rollbar (rollbar.com) since it was one of the best with good support for tracking deployments and deduping errors across deployments. Also the free plan allows you to push upto 5000 events per month (a good incentive to keep errors low in your application).

So with the above laundry list of areas needed to implement a web application, we created a web application which costs zero dollars to keep running. Pretty sweet deal!

PowerBI

Realtime in PowerBI

We just released an amazing set of capabilities for bringing realtime data to PowerBI. I am very proud at how we analyzed the market, defined our strategy, partnered with third party players to make the most critical aspect of an IoT deployment – Analytics as simple as possible. Check out these links for more information –

Programming

No more back button screw ups!

I just blogged about the AJAX navigation feature we added to IE8 which will greatly help providing a better AJAX experience with respect to navigation. Remember those frustrating moments when you want to share the web address of the page you are seeing in the maps or the home on redfin.com and your friend gets the first/unrelated page? This will hopefully help web developers fix it in the future.

http://blogs.msdn.com/ie/archive/2008/07/14/ie8-ajax-navigation.aspx

Programming

Never miss an error value

One of the most frustrating part of debugging is when at a certain point the return value of a call is just a generic error code (eg. E_FAIL) or false without proper last error set. The failure might have happened deep in the call stack and now you have to step into each call to figure out exactly where it failed. This arises most of the times due to the fact your API return type and the return type of something you call doesnt match forcing you to use generic failure code. But most of the times, you can map error codes of one type to another (example). For example, consider a simple example
 
1   HRESULT SomeAPI()
2   {
3       HANDLE hThread = NULL;
4       HRESULT hr = S_OK;
5       hThread = CreateThread(…);
6       if (NULL != hThread)
7       {
8            if (SomeAPI2()) // SomeAPI2 sets the last error code
9            {
10                 hr = SomeAPI3();
11          }
12          else
13          {
14               hr = E_FAIL;
15          }
16     }
17     else
18     {
19          hr = E_FAIL;
20     }
21     return hr;
22  }
 
Now, after calling SomeAPI, if the return value is E_FAIL, I am at loss as to where the failure was. But instead if the programmer was a bit more diligent and changed lines 14, 19 to more meaningful return values using HRESULT_FROM_WIN32(GetLastError()), I would have a better idea as to what failed (an error in creating the thread or failure in SomeAPI2. This I believe greatly helps in debugging issues faster. This also causes the side effect of losing the error value up in the call stack, for example if this code segment did some file-system specific operations (assuming no other calls in the stack did this), if I get to know the error was something related to files (ERROR_FILE_NOT_FOUND), I would instantly know which call/code segment to investigate deeper.
Programming

Never trust a pointer parameter (especially strings)

 
Recently I was working on an API which took a string as one of the parameters. The code did something like
 
1 STDAPI FooBarAPI(LPCWSTR pszArg)
2 {
3      bool fOpAllowed = IsSomeOpAllowed(pszArg);
4  
5      if (true == fOpAllowed)
6      {
7         DoSomeOp(pszArg);
8      }
9 }
 
Now, there was a security issue (or inconsistency depending on what you are doing) lurking in this API. I shouldnt be performing an action based on a decision using the string passed to me. Why? Since there is a window of oppurtunity between 3 & 7 when the caller could change what pszArg points to. Its always recommended to copy pointer data into a local copy before performing any action since that gaurantees you that the data cannot be changed in the middle of your function.