The Continuing Transformation of the Tech Industry

Assumed Axioms:

  1. Technology has impacted the world in 3 waves (List of companies by founding year)
    1. Hardware – Building the infrastructure
    2. Software – Easily scalable businesses e.g. social media
    3. Hybrid – Combining real-life and assuming always available internet.
  2. Software is eating the world (source) – “More and more major businesses and industries are being run on software and delivered as online services”
  3. Startups and tech in general can be thought as an expanding fractal, reaching all areas of our lives (fractal idea).
  4. The rise of open source can be explained as eating it’s way up the stack as each layer becomes commoditized.

The Past: 1980-1995 – The Transformation of the computer Industry

source: Andrew S Grove – Only the Paranoid Survive

The industry:

  • Had been vertially aligned. i.e. A handful of suppliers, delivered the full end to end package to allow using the hardware and software.
  • A significant change was the standardization of the IBM PC
  • This and other changes allowed shifting to a horizontal marketplace, allowing companies to specialize within each layer and to save costs and earn more profits.
    Microsoft/Intel probably being the biggest winners.

The Present

Any company essentially bundles a bunch of processes/people that allows solving problems within one encapsulated entity, I’ve blogged before about banks as a stack:

The Future

  1. What are the stacks of services that currently deliver value end to end?
  2. Is that market vertical or horizontal?
  3. What model is likely to win under which scenario? You may even see a mix.
    e.g. Apple has vertical integration of iphones, with some horizontal experimentation allowed within the app store, which they then take-over.
  4. You may even find that what wins in the early-stages becomes replaced later.
    e.g. Tableau was excellent software and got deployed at 1000s of companies
    However once it proves the model, the more successful it becomes the easier it may be to replace with an open source alternative.

Prediction: Open Source Eats it’s way up the Pyramid

Stack of oranges in pyramid shape against grey background - CSF015212 -  Dieter Heinemann/Westend61

If you consider most companies as a stack, that software is expanding everywhere and that once it’s used by a significant number of users that an open source version becomes more likely. I think we will see open source companies appear at the bottom of the stack and continue to the top.

Question: What would stop it winning a layer?

Thoughts:

  • A small developer user base. If most users are not programmers, no one will write it.
  • A small user base. Without a wide base to support it, only 1% of users may pay or contribute for open source so it won’t be supportable. In fact if a piece of poor software is introduced it could result in a failed market. i.e. the open product is bad but good enough such that a “better” commercial alternative isn’t viable.

Technology Companies by Founding Year

Steve Case the Third Wave.
https://www.youtube.com/watch?v=u4WS4DhekWU

  • Hardware
  • Pure Software = Overnight
  • Hybrid – Real-Life+Software mixed. Internet integrated sometimes invisibly.

1963 – AMD BIG
1968 – Intel BIG
1974 – FoxConn BIG
1975 Microsoft BIG
1976 – Apple BIG

1977 – Oracle
1984 – Sybase Database, MathWorks Matlab, Cisco, Dell BIG
1985 – Corel Draw
1987 – McAfee virus, Wolfram, Peoplesoft ERP
1988 – Avast Security, Trend Micro Security
1989 – Citrix Remote
1990 – BusinessObjects SAP
1991 – AVG security
1991 – Macromedia Multimedia, Palm PDAs, SUSE linux
1993 – Informatica, Qlik Visualization, Redhat Linux, Tucows Domains, Nvidia
1994 – Amazon, RealNetworks Media
1995 – Cisco Webex, MySQL AB
1996 – Ancestry.com, Taleo HR, Sleepycat DB, Juniper Networks, F5 Networks
1997 – Blackboard Education, Kaspersky Lab Security, TIBCO
1998 – BroadSoft Networking, Symbian OS, VMware VMs, Google BIG
1999 – Apache Foundation, Basecamp, DivbX, Fieldglass, Napster, Salesforce
2000 – Tripadvisor, Mobipocket, Sportradar, Jetbrains Java, zipcar
2001 – Foxit, LumenVox Speech
2002 – Shazam, Linkedin BIG, Meetup, GoPro
2003 – Docusign, LogMeIn, Palintir, ServiceNOW, Splunk, tableau, Skype, Tesla
2004 – APIgee, Canonical OS, SugarCRM, Facebook BIG
2005 – Automattic WordPress, Infobright DB, Mozilla Corp, WorkDay HR, Etsy
2006 – MuleSoft, OpenDNS, Xero Accounting
2007 – Heroku, Lucidworks Search, ZenDesk HR, ZeroTurnaround Java, FitBit
2008 – Balsamiq, Github BIG, airbnb BIG, Twilio
2009 – Grindr, NetObjects, PagerDuty, Okta ID, SendGrid Email, WhatsApp BIG, Uber BIG
2010 – Datadog Monitoring
2011 – Firebase, Hortonworks Data, Zoom Video, Twitch
2012 – G2crowd, Looker, PlanGrid, Instagram BIG
2012 – Databricks, Docker

Developers, Code Cowboys and Architecture Astronauts

Developers, Code Cowboys and Architecture Astronauts.
Slums, Skyscrapers and Ghost Cities.
Similar to constructing buildings, there are (at least) three approaches to software development:

devs-coders-cowboys

Coders = Slums – Quickly built using material and knowledge at hand to develop for a small audience quickly. Good ideas will be copy-pasted from one area to another and modified to suit the individuals needs. We can cover a lot of ground quickly but it doesn’t scale, plumbing and electricity break down.

Developers = Skyscrapers – Construction takes longer, the outcome can result in a uniformity, often piecing together existing architectural concepts or libraries into a fairly standard shape. We can scale to a higher level (density of people) but we need more upfront planning and less individuality.

Architecture Astronauts = Ghost Cities = Master architects devise grand schemes of hugely scaleable systems but there are fundamental flaws in the plan and often the need of actual end users are ignored.

If this conceptual metaphor holds, what could we learn from the building industry?

  • Don’t employ a coder when you need an architect?
  • Sometimes you need to clear a slum, displeasing those residents to replace it with an efficient residential building, which will take time and investment?
  • Building quality needs enforced by external parties?
    Similar to governmental building inspections.
  • Always get the core plumbing right, the facade/paint can be changed later?
  • …?
  • Is there anything they could learn from software development?

Perhaps the most important thing is to decide which category you are aiming for.

 

Trillion Dollar Coach – Bill campbell – Book

Overall: 8/10

I initially breezed through this book in a week thinking it mostly contained nice stories and glib niceties. However going back to  trillion-doll-coach-bill-campbell

write the Book Notes took 3 weeks, I found myself scribbling down page upon page compared to my usual amount of notes. Upon reflection, it was a really good book with lots of good points and anecdotal stories to help remember them. If Bill was getting this many “easy” parts right, I can see how the overall impact would have been large.

My main take-away: Team First – A company is formed from teams, get the right people, built an envelope of trust, support and love them.

Actions:

  • Improve work roadmap meetings. Given we have the whole team present, anything but very effective use of that time is a waste.
  • Re-read project aristotle
  • Think about what makes a good meeting
  • Always set a measurable goal, sometimes a Big-Hairy-Goal to stretch people

Book Notes:

  1. Caddie and CEO
  2. Title makes you a Manager, your people make you a leader.
  3. Built an envelope of trust
  4. Team first
  5. The power of love
  6. The yardstick
  • Caddie and CEO
    • background and some hero-worshiping
    • Teams are building block of a company, not individuals.
    • pg26 Raises interesting possibility of coaches for managers.
      Given the leverage, why are there not coaches providing real-time feedback
    • Mentor vs coach
  • Title makes you a Manager, your people make you a leader.
    • A managers authority emerges as they establish credibility with subordinates, peers and superiors.
    • “It’s the people” – Support = Respect + Trust
      • Support = tools, info, roadmap, training
      • Respect = Career goals / Life choices
      • Trust = Autonomy and Decision Making
    • Lying in bed at night, the CEO should worry most about his staff
    • One to ones and staff meeting are critical.
    • Trip Reports – Having one person tell a personal story of their weekend at a Monday meeeting
    • Decision Making“Making the right decision is important
      Just as important is getting the whole team there.”
    • Managers job to run decision making process, ensure voices heard, cut tie-breaks when stuck and to remind everyone of purpose and root truths.
  • Built an envelope of trust
    • The first thing some managers focus on is building a product or getting people working. The priority should be to build trust.
    • Bill saw the world as a network of people with different skills,
      learning to trust each other as a primary mechanism of achieving goals.
    • Psychological Safety – The ability for a team member to voice crazy ideas and feel safe from negative repercussions has been found to be critical to success.
    • Coach the coachable – pg86 “A coach is someone that tells you what you don’t want to hear, who has you see what you don’t want to see, so you can be who you have always known you can be.”
    • Honest, humility, perseverance and constant openness to learning.
    • Leadership is about service to something that is bigger than you.
    • Practice active-listening
    • Diane Greene – “When I’m really annoyed or frustrated with what someone is doing, I step back to think about what they are doing well and what their value is”
    • No gap between statements and facts. Give feedback close to the time, in public if good, in private if negative. Always give it from a place of love.
  • Team first
    • Work the team, then the problem.
      When faced with aproblem or opportunity, the first step is to ensure the right team is in place and working on it.
    • Pick the right players. The ability to learn fast, a willingness to work hard, integrity, grit, empathy, and a team first attitude.
    • Bill saw peer relationships as critical and instituted a regular survey amongst peers at google to asses performance at job/relationships/meetings/leadership/innovation.
    • Winning depends on having the best team and the best teams have more women.
    • Identify the biggest problem, the elephant in the room, bring it front and centre and tackle it first.
    • Listen, Observe and fill the communications gaps
  • The power of love
    • Get to know and care for people as individuals
    • Cheer people and their successes

The Unicorn Project – Gene Kim – Book

 

The Unicorn Project Book

Overall 5/10 – Having read and loved the phoenix project, I had high expectations for this book, perhaps too high. It felt like the same message and story regurgitated to sell another book. Perhaps if I hadn’t seen most the ideas before elsewhere it would have felt newer and more impactful.

Book Notes:

  • Compared to the previous book there is a lot more emphasis on people skills,
    it’s great to see this highlighted in a book for programmers where that kind of networking isn’t as common.
    Examples:

    • The “rebellion” team was formed as a ragtag coalition of people that wanted to make a difference
    • Kurt operated at the edge of permitted staff behaviour to get the resources the team needed
    • Maxine visited people outside her own department in person to build alliances
    • She asked how they completed their work and helped them find where they fitted into the overall flow to increase throughput overall
    • Sarahs toxic behaviour and the need for psychological safety
  • Some problems seem highly exaggerated to reach foregone conclusions to point at fashionable technology.
    • For example getting a working build takes weeks = containers.
    • Concurrency issue = Immutability and Functional programming solves the day
    • I wouldn’t disagree that those technologies are great for some problems, it just seemed they were thrown into the book to namedrop.
  • Near the very end it proposes that large companies can outpace their smaller rivals as they have the relationships, resources and data.
    I’m not sure I entirely buy that. One of the hardest things to change is values and perceptions.
  • Project Shamu sounded interesting, taking 23 API calls that have their own SLAs and reducing them to one dependency without caching. I wondered what technology this was referring to but googling didn’t help. Any ideas?

The Five Ideals

These are the ideals presented at the back of the book. I can certainly agree on their importance:

  1. Locality and Simplicity
  2. Focus, Flow and Joy
  3. Improvement of Daily Work
  4. Psychological Safety
  5. Customer Focus

jq – An open source implementation of q

During this lockdown I was due to take some holidays, originally to visit Pisa with Elaine. Instead of visiting Pisa I took a week off to code, for me it was just as much fun, I’m not sure Elaine agreed. This is the outcome of that week:

jq-language

jq – http://timestored.com/jq/

So far it’s extremely limited, casting, parsing, list definitions and a handful of operations. It has however been insightful. The first 2 days were spent hashing out code to make the pure fundamentals work in any way possible. On the 4th day I began to realise some very verbosely implemented operations could be done in a much simpler way. Then I began to see such savings again and again. Perhaps after the first decade I would have it whittled down to Arthurs two-pager.

An inordinate amount of fun was had when I discovered I could host the application fully in browser  as doppio provides a method of running a full JVM:

jq Online Sandbox – http://timestored.com/jq/

So far it’s useful for basic snippets but I really think such a safe and easily launched environment would be great for onboarding new users to the language.

The Understated Simplicity of Good Code

Bad Code Accretes

Sometimes while reading code, I get the impression that the person:

  1. Kept throwing more code at the problem until it “worked”.
  2. They never for a moment stepped back and thought about making it simpler.

This small thought can then be applied on a larger scale to languages themselves. PHP was quickly thrown out there at version 1 and early on added to as new features were needed. Java/C#/javascript almost all of them have grown by adding features over time. How many have went back and removed significant features?

Great Code Simplifies

Contrast that approach to Ken Iverson in this video from 1974:

iverson

I went from application to application trying to use the same techniques. The most encouraging thing is that they would work. After 2-3 years during which time the language had grown by accretion, it grew and grew, eventually I found it was shrinking.

Essentially the idea was once you look at enough different applications you begin to see what is the general notion. So I came to generalisations that allowed me to take out whole chunks of special things I had put in.

Furthermore to my surprise it turns out the general ideas are usually much simpler to understand than any of the special cases.

Modern Languages are Simplifying Common Cases

Looking at some of the recent changes for example arrow-operators in javascript, records/lambdas in java you can see this attempt to go back and simplify and reduce the noise for getting common actions performed. The questions is will many remove the old noise.

An Example from KDB

I find it worth mentioning how KDB supplies the user with handles to send data. Here we open a handle h to send a query to a remote process and get the result.

q)h:hopen `:localhost:5000;
q)h "2+2"
4
q)h
7

That last line shows that the handle is 7. Why is KDB using 7 for handles?
Because linux maps files/sockets etc. using those exact same integers. In fact in kdb standard out/error can be used as 0/1. When people first encounter this, they find it confusing, possibly because they are coming from other languages that wrap handles ten layers deep in abstractions. I can’t help but imagine:

  • Some coders take hours to work out what code can be removed
  • Other developers like Arthur may never consider introducing unnecessary abstractions in the first place

 

Please for the sake of your reviewers take a moment before pushing code to ask yourself, can this be made simpler.

SRE – Site Reliability Engineering – BOOK

6/10 – Overall.    8/10 for early chapters, 4/10 for later chapters.
The first 100 pages were excellent but the later chapters were a mixed bag, partially due to rotating authors. I skim-read the later chapters as they mostly focussed on a broad spectrum of not closely related topics.
Chapters that covered topics I interact with were too shallow to interst me, while many chapters were not of interest to me. Perhaps if I was an SRE rather than a developer I would have found the entire book better.

Key Takeways for Me

  1. Every large firm I’ve worked at has been structured incorrectly and had the wrong metrics for measuring stability.
    In banks, the productiodevops-wall-thrown support team has typically been tasked with “zero outages” whilst the developers are incentivised to develop and release as quickly as possible, with some front-office “quant-devs” not being held accountable for stability at all. With the handover method looking like throwing it over a wall:
  2. This book suggests a much better approach:
    Rather than pace vs stability, agree a global “Error Budget” target for everyone. using SLOs/SLIs that if not met can result in moving responstargetibilities back and forth from DEV to SRE owned. Importantly the target e.g. between 99.8% and 99.9% uptime should have an upper and lower bound, it should NOT be an absolute. If you go above it, developers should be taking more risks, below, developers should work on stability.
  3. 100% is the wrong reliability target. I always intuitively knew this but the book provided useful arguments. e.g. If you build 100% reliable but users wifi is 99% reliable, you wasted a lot of effort that users could never benefit from and that took time away from other work.

Book Notes

Note the full book is actually available online here.
An outage is NOT a bad thing, it is an expected part of innovation.

Monitoring

  • Alerts – Immediate human action required
  • Ticket – Human action required within few days to prevent damage
  • Logging – For forentsics/diagnostics only
  • MTTF – Mean Time To Failure
  • MTTR – Mean Time To Repair
  • Humans add latency. MTTR speed critical to availability -> automation is best.

Google Specific Terms

  • Campus > Data centre > cluster > row > rack > server
  • Borg – Automates resources for applications
  • Chubby – Uses paxos to provide global locks
  • Users -> GFrontEnd -> AppFrontEnd -> AppBackEnd -> DB  (all coordinate via Load Balancer / DNS)

Embrace Risk

  • Time Availability = uptime / (uptime+downtime)
  • Aggregate Availability = successful Requests / Total Requests
    This metric is more ususal when there are regional outages etc.
  • There are different types of failure
    • Global outages, regional outages
    • Full outages, partial funcitonality
    • Choose which you want
  • Error Budget = Control loop to manage release velocity
  • Error Budget – Aligns incentives

SLOS

  • SLI – Service Level Indicators – Measure a level of service e.g. latency/availability
  • SLO – Service Level Objective – A range of values that is measured by an SLI e.g. average response <100ms
  • SLA – Agreement – agreed with customers, including consequences for missed SLOs
  • Choosing Targets:
    • Don’t base it on current performance (it could be way off)
    • keep it simple
    • Have as few as possible
    • Keep a safetly margin (tighter internal number)
    • Don’t overachieve, each “9” is costly
  • Percentiles – are better measurement than averages in case of long tail

Toil

  • -> Manual repetitive work devoid of enduring value, that could be automated
  • Toil = Lower morale, career stagnation, slower progress
  • Some amount of toil is unavoidable and can even be calming

Automation

Automation allows super-linear scaling of users vs human effort.

Levels of automation:

  1. Fully automated  – DB self identifies problem and preemptively resolves it
  2. Internally maintained – Generic – script shipped with database
  3. Externally maintained – Generic – shared DB recovery script
  4. Externally Maintained – System Specific – A script on someones desktop
  5. No Automation

Simplicity

  • Less code = Less maintenance
  • Simplicity = Stability

The later chapters held less of interest.
“You want a data recovery system NOT a data backup system.”

SRE Engagement Model – Not all services require SRE attention as they don’t need high reliability and availability. Those teams get given advice and documentation.

Accelerate -The Science of Lean Software and Devops – Book

Overall 8/10 – Good book that presents good ideas and clear evidence for why.
I was aware of slightly over half the best practices from this book but not all of them have been adopted by large firms. I picked up a few actions I’d take away but really the usefulness in this book may be in presenting it as evidence to try and drive change in others.

accelerate-book

Book Notes:

Measuring Performance:

  • Use capabilities to measure performance not maturity levels as maturity suggests mission complete.
  • (Scrum) Velocity is only a capacity planning tool
  • Utilization isn’t the correct measure, it should not be 100%
  • Should measure global outcome to ensure teams are not pitted against each other
  • Software Delivery Performance Depends on:
    • Lead time
    • Deployment Frequency
    • Mean Time To Restore
    • Change Fail %

Measuring and Changing Culture

  • Don’t try to change how people think, first change what people do (or change the people :))
  • Westnam Theory: Orgs with better information flow function more effectively
  1. Level 1 – Things we just know
  2. Level 2 – Culture – We can debate these within the team, e.g. importance of security
  3. Level 3 – Written artifacts and established processes

Culture Types:

  1. Pathological – based on power
  2. Bureaucratic – based on rules
  3. Generative – based on performance

Continuous Delivery

Key Principles

  1. Build quality in
  2. Work in small batches
  3. Automate repetition
  4. Relentlessly pursue continuous improvement
  5. Everyone is responsible
  6. Foundations:
    1. Comprehensive config management
    2. Continuous Integration – Small daily branch merges
    3. Continuous Testing

What Works:

  • Version control
  • Test Automation
  • Test data management
  • Trunk based development

Architecture

Goal is loose coupling to ensure bandwidth between teams isn’t swamped with implementation details.
cohesion-coupling
Can the team by itself without speaking to outsiders:
– Change architecture significantly
– Do a deployment? now? during business hours? anytime?

Critical = Tesability and Deployability
Systems are loosely coupled and can be developed and validated independently.

Management Practices

Components of Lean Management

  • Limit work in progress
  • Visual Management
  • Feedback from production
  • Lightweight change approvals

CAB – doesn’t work to increase stability!
External approvals are negatively correlated with lead time, deploy freq. and restore time.
Lean Management <-> Software delivery performance, becomes a virtuous cycle.
Lean: Build -> Measure -> Learn

Capabilities

  • Small batches
  • flow of work from requirements to user known by team
  • Actively seek user feedbck
  • Authority to create/change specs during dev without approval

Sustainable

  • Invest in employee development
  • Foster supportive work environment (no blame)
  • Ask employees what’s preventing them from achieving their objectives
  • Give time to experiment and learn

Factors Causing Employee Burnout:

  • Work overload
  • Lack of control
  • Insufficient rewards
  • Community breakdown
  • Unfairness
  • Value conflicts

Transformational Leadership

  • Vision – Clear understanding of where to be in 5 years
  • Inspiring Communication – Says things that make employee proud to be part of org
  • Intellectually Stimulates – Challenges my assumptions, makes me rethink principles
  • Supportive – Considers and acts to benefit my feelings
  • Personal Recognition – Commends me when I do a good job

 

Key Takeaways for Me:

  1. Most the suggestions from other books I’ve read and that I had seen work myself were correct. The large survey conducted by these authors gives me the evidence to back up my opinions.
  2. Action: In my current work, we need to find a way to get the 3 critical measurements improved. Increased release frequency and lower overhead change management would seem to be the highest effort/reward.
  3. The importance of loosely-coupled architecture gives me a clearer way to conceptualise interactions between teams and why it’s important. (limited bandwidth)