Independent IT review
Downloads
3. Why are we here?
3.1. Cultural context
The RP/RT programmes existed within a cultural context at the CQC which defined “how we do things round here”. Some of this context may have influenced the chances of success. Particular cultural concerns are described in the paragraphs that follow.
3.1.1. Lack of a data-first culture
Culturally data is not seen as a strategic asset, but the CQC is a “data business”, handling (and being reliant upon) large volumes of data at every step of the Service Value Chain (SVC) from contact, notification, registration through assessment, inspection and finally enforcement. Accurate and timely data is also critical for reporting externally, such as the legal register and ratings, and statutory reports like State of Care. RP has been built without a Data-First approach – i.e. sufficient upfront consideration of the downstream reporting requirements. This has had unintended consequences as the data is used and surfaced differently and fundamental data changes were made as part of the move to the SAF (for example rating at Assessment Service Group (ASG) level rather than location level.
It is felt that the lessons of not giving enough time or funding/resources to consider the underlying data model and the downstream reporting requirements of any project/change have not yet been learnt and are not baked into the culture at CQC. Technology changes are not seen as fundamentally an exercise to modify the flow of data in the business. Data personnel are engaged only as a secondary activity rather than seen as fundamental to the initial scope and design.
3.1.2. Lack of adherence to standards
There are many technical, business case, project/programme management and service management standards that should underpin large scale organisational and technical change programmes. Organisations are expected (as compliance to best practice) or mandated (by a stronger control – e.g. legal enforcement, making funding contingent on compliance etc) by central government to adopt these standards. Some of these will be referred to throughout this IIR.
For example, the RT programme negotiated to be exempt from the Government Functional Standard for Digital (GDS), which among other things expects digital services to be accessible and inclusive “ensuring that any potential user is able to use the service regardless of their personal characteristics, situation, capabilities or access needs, and is given equal access and opportunity to do so” [iv]
3.1.3. Lack of clear accountability and controlled governance
It was reported to this IIR that the “voice of the business” was eroded during the timeline of this programme:
- Dilution of clinical leadership: There was a significant reduction in clinical leadership at the senior level. Key figures such as the Chief Inspectors for primary care and health left, and their roles were either not filled or were combined with other responsibilities, leading to a dilution of clinical expertise and challenge at the senior leadership level.
- Impact on decision-making: The reduction in clinical leadership led to a lack of practical and clinical input in decision-making processes.
- Disconnect between intentions and practicality: There was a disconnect between the intentions of the new IT system and the practical needs of the organization. The system was designed to be highly intelligent and automated, but it did not adequately capture the qualitative and subjective information necessary for effective regulation.
- Churn of senior leaders: There was a high turnover of senior leaders, which further exacerbated the lack of continuity and stability in leadership. This churn affected the organization's ability to maintain a clear and consistent direction.
Internally, this was referred to as "toxic positivity" as leaders maintained an overly optimistic narrative that conflicted with the realities on the ground.
3.1.4. Reliance on contract staff
The programme consisted of layers of contract resources (also referred to as contingent labour), with the CDO as the overseeing permanent CQC resource. This resulted in significant costs and a lack of continuity in knowledge and information transfer. It is inevitable that contractors would lack a deep understanding of the organisation's operations and needs. The organisation spent a lot of internal resources and time bringing contractors up to speed, which was time consuming for the permanent staff who had to manage this alongside their regular duties.
With the reliance on external staff, there was not enough internal expertise and investment in understanding the long-term implications of decisions made during the programme. This lack of internal ownership made it harder to achieve successful implementation and buy-in from the staff as the change management was effectively outsourced which is not an effective strategy.
3.1.5. Siloed working, lack of collaboration and social capital
It was reported that the original programme was well set up in terms of understanding the interdependencies between the different functions of the CQC but after the 2022 reset the timelines drove a new approach that created silos. The following quote from a very experienced CQC staff member gives voice to this concern:
The whole RP programme was split into discrete units of work, e.g. Enforcement, Assessment, Registration (2 parts) and Contact.... There did not seem to be a functioning programme board that coordinated this. Registration and Enforcement built their systems on the old ways of assessing providers, yet Assessment had to build their system on the new SAF. This was never going to work in practice
Studies (Appendix 6) have shown that remote working can make collaboration and communication much harder and hinder the building of Social Capital which is crucial for effective teamwork. It is understood that the programme did make efforts to meet face to face but the default way of working at CQC is homebased.
3.2. Framework to answer the question
The 5 stage IT Infrastructure Library (ITIL – internationally recognised best practice in IT Service Management) service lifecycle is a simple and powerful framework to analyse the history of the RP Programme and answer the question “What went wrong”:
- Service Strategy: This stage focuses on defining the market, developing the service portfolio, and setting strategic objectives. It involves understanding customer needs, market spaces, and how to create value through services. This is the stage where investment is justified through the use of Business Cases.
- Service Design: This stage involves designing new IT services or modifying existing ones. It includes the design of service solutions, processes, policies, and documentation to meet current and future business requirements.
- Service Transition: This is the “Build or Buy” stage with procurement or recruitment taking place respectively. This stage ensures that new or changed services are effectively transitioned into operation. It includes planning and managing changes, release and deployment management, and ensuring that service knowledge is available and accurate.
- Service Operation: This stage focuses on the effective and efficient delivery and support of services. It includes incident management, problem management, request fulfilment, and operational monitoring.
- Continual Service Improvement (CSI): This stage aims to continually improve the effectiveness and efficiency of services and processes. It involves identifying and implementing improvements to services, processes, and overall service management practices.
3.3. The order of events in the Service Lifecycle
The first point to make reflects the headline statement from the initial findings from a lessons learned review which was carried out by the CQC PMO: “The Transformation Portfolio has resulted in an incomplete implementation of a new organisation and business process which is ineffectively supported by the IT systems”.
This is a critical framing of the issue and correctly orientates the thinking necessary to understand where we are and indeed what went wrong. IT systems must be designed to support existing or desired future business processes. Often, with the advances in digital technologies and thinking, IT systems can provide new opportunities for an organisation to transform their business processes beyond what was considered previously possible. However, the order must always be maintained that the business process (current or desired future) must be clearly and fully articulated in order that detailed designs can be drawn up (analogous to architects drawings of a house) which lead to technical specifications (analogous to structural engineering plans) which can in turn be provided to software engineers (analogous to house builders) to undertake the configuration and programming of software assets (in this case D365 and associated tools).
More simply put, this is an example where form must follow function - if you don't know the function(s) that the IT system must serve (or if the articulation of this function is unstable, changing, emergent etc) then it is impossible to design, build and test it to be in the correct form.
3.4. Immature business processes
The statement above of there being “an incomplete implementation of a new organisation and business process” means that it was very challenging to complete the design of the required underpinning IT support system which of course in turn made it impossible to build and test the same effectively.
Of the 5 levels of Business Process Maturity Model (BPMM)(further details at Appendix 2) , arguably, since the move to the Single Assessment Framework, the revised assessment process (which is at the heart of the end to end business process of the CQC) remains at the first, “Initial”, level where processes are ad hoc and chaotic - success depends on individual effort, and there is little to no process discipline, although there are notable exceptions, e.g. Oral Health.
3.5. Service Strategy and the Business cases
3.5.1. Introduction
The business case is the most critical instrument in any programme. It justifies the investment, linked to the organisational requirements (business needs, investment objectives) and establishes the appropriate controls and boundaries to give the programme the best possible chance of success. Given the importance of starting a programme well, this report gives greater weight to the Service Strategy phase of the Service Lifecycle than the other phases.
From around 2008 public sector business cases have been governed by the HMT treasury guide [v] establishing the 3 stage (Strategic Outline Case (SOC), Outline Business Case (OBC), Full Business Case (FBC)) and 5 case (Strategic, Economic, Commercial, Financial and Management) approach. In 2013 a programme of training, referred to as Better Business Cases (BBC) was established to ensure that public sector business cases were written and managed to the correct level to avoid the failures of the past. In the NHS it is not possible to gain approval for a business case that is part or fully funded from treasury funds without the author being accredited as a BBC practitioner.
A review of the RP/RT business cases against the BBC best practice shows a number of significant problems as shown below. Smaller, inconsistencies have been ignored and the focus is just on major concerns which would have had the effect of making the programme less likely to succeed.
3.5.1. Unclear Spending Objectives
The strategic outcomes and strategic benefits documented in the FBC (Economic case) are very high level and arguably very difficult to measure:
Our ways of working meet people’s needs because they are developed in partnership with them
We are an effective, proportionate, targeted, and dynamic regulator
There is improvement in safety cultures across health and care services and local systems that benefit people because of our contribution
One can argue that these strategy statements are intended to be high level and set the context for more detailed objectives to be established, which in turn will be subject to the SMART requirements for an objective.
Unfortunately, in the example of the FBC, whilst a clear effort was made to document spending objectives (2.5 Economic case, Spending Objectives) none of them were Time bound or Specific enough and no Measures were established for any of them. They are clearly all Relevant to the business needs but without being Time bound, Specific or Measured there is no way of knowing whether they are Achievable. Moreover, loosely defined objectives, as these are, make it impossible to hold the programme board to account for delivery as there is no clear guidance as to “what good looks like”.
Examples of spending objectives from the FBC are shown in Figure 6 below:
Programme Spending objective | Strategic Benefit | Benefit criteria |
---|---|---|
To improve our effectiveness, focus on reducing inequalities during design, implementation and ongoing operation in all our spending To improve our effectiveness, encourage others to reduce inequalities in access, experiences and outcomes for people who use, or need to use care services, through all our work | B1. People experience reduced health inequalities - related to a) access; b) experiences; c) health and care outcomes | Peoples human rights are upheld when using health and social care services, noting this is one important aspect of this benefit |
Figure 6: spending objectives excerpt from the FBC
3.5.2. Options appraisal
Best practice expects a long list of options to be considered within the Economic Case of the OBC (typically around 12) and a clear description of the options appraisal method.
The OBC only contains 3 options that were carried forward:
- Option 1: Do nothing
- Option 2: Technology replacement only
- Option 3: Business change programme (business process re-engineering and technology replacement)
and 3 that were discounted:
Cloud Hosting – this is not really a discrete option to address the business need, it is simply a method of hosting any solution. Although this option was discounted, the option selected is cloud hosted so it is difficult to make any sense of this statement.
Develop a new ECM system alongside the legacy CRM system– this is a fair option to appraise
Develop a bespoke CQC system - Although this option was discounted, the option selected is a development of a bespoke CQC system so it is difficult to make any sense of this statement.
Given the magnitude of this case, a wider range of options should have been generated using benchmarking with other organisations that are providing a similar service.
Within business case guidance [ii] an Options Framework is used to identify the long list, shown in figure 7:
Key Dimension | Description |
---|---|
Scope | The “what”, in terms of the potential coverage of the project. Potential scopes are driven by business needs, service requirements, and the scale of organisational change required to improve service capabilities. Examples include coverage in terms of: business functions, levels of service, geography, population, user base, and other parts of the business. |
Service Solution | The “how” in terms of delivering the “preferred” scope for the project. Potential service solutions are driven by available technologies, recognised best practices, and what the marketplace can deliver. These solutions provide the potential “products” (inputs and outputs) and as such, the enabling work streams and key activities required. |
Service Delivery | The “who” in terms of delivering the “preferred” scope and service solution for the project. Potential options for service delivery are driven by available resources, competencies, and capabilities – both internal and external to the organisation. Examples include: in-house provision, outsourcing, alliances, and strategic partners. |
Service Implementation | The “when” in terms of delivering the “preferred” scope, solution, and service delivery arrangements for the project. Potential implementation options are driven by deadlines, milestones, dependencies (between outputs), economies of scale, benefit realisation, and risk management. The optimal option provides the critical path for delivery of the agreed products and activities and the basis for the project plan. Options for implementation include: piloting, modular delivery, big bang, and phasing (tranches). |
Funding | The “funding” required for delivering the “preferred” scope, solution, service delivery, and implementation path for the project. Potential funding options are driven by the availability and opportunity cost of public funding, value for money, and the characteristics of the project. Potential funding options include public or private capital, the generation of alternative revenue streams, operating and financial leases, and mixed market arrangements. |
Figure 7: The Options Framework from the International Guide to Business Case Development
This framework is a useful prompt to ensure the full range of possible options is considered. For example, if this had been used then it would have generated the following type of thinking:
- Service Scope – should the whole of the CQC business process (Registration
through to Enforcement and Cancellation) be in scope or just subsets of it. - Service Solution - when the decision was made to replace some legacy systems why was this decision made and what further options were considered and discounted and why. E.g. should the solution include the external facing website provision? Should the solution be restricted to just the replacement of the legacy CRM solution.
And so on across all the dimensions of the Options Framework.
3.5.3. Cost drift
As can be seen from the following table in the FBC (figure 8), the Whole Life Costs (WLC) between the OBC and the FBC grew from £57.5M over 5 years to £131.8M over 10 years in line with a significant increase in the programme scope.
Item | Outline Business Case June 2020 | Full Business Case March 2023 |
---|---|---|
| Option 3 | Unchanged from OBC |
| 8 years to FY 27/28 (From FY 19/20) | 14 years to FY 2033/2034 (From FY 19/20) |
| Monetised benefits over 8 years: Total: £41.6m | Business Benefits (Monetised) Total: £122m |
| £57.53m including VAT comprising:
| £131.8m including VAT comprising:
|
Optimism Bias (Inc. VAT) | £5.10m Optimism Bias at 10%:
Revenue £4.08m | No Optimism bias applied |
| £1.74m (Inc. VAT) comprising:
Revenue £1.22m | No risk contingency applied |
| - £21.70m (best case, negative) - £30.68m (worst case, negative) | -£9.8m (NPV - Discounted) -£8.9m (Cashflow - undiscounted) |
Figure 8: Excerpt from the FBC showing financial headlines
There is a statement in the FBC that the recommended option is unchanged from the OBC. One of the options, that was rejected at the OBC stage was to upgrade the existing CRM technology and there is no evidence in the FBC that this or any other option (retained or rejected) was re-appraised in the context of the new business requirements (a full-scale organisational change and a fundamental change to its core processes).
The FBC states
Since 2020, the programme has been in the process of delivering the Option 3 approach
By the time of the FBC the CQC was already heavily financially committed to option 3 (replace the existing CRM with MS Dynamics 365 (D365)), the costs had increased significantly. The board considered the options and decided that moving forward was the right thing to do as the alternatives would present a greater cost.
3.5.4. Risk allocation process
The principle of risk allocation, as per best practice, involves several key elements (Appendix 17).
There is no mention of risk allocation within the OBC. The FBC contains a paragraph (4.2.2 “Commercial risks”) that identifies some risk without any explanation of how they are allocated, e.g.
We are aware that there is a potential risk for programme scope-creep which in turn could potentially result in the MS Dynamics 365 system not going-live in March 2024 as expected. This type of delay would result in the services currently delivered by the incumbent BAU suppliers being required longer than the anticipated decommissioning date.
It is reasonable that the customer would own the scope creep risk but there are no counterbalancing risks on the supplier side e.g. substandard delivery of products that fail user acceptance testing.
As there is no statement as to whom risks are allocated the default would be that the CQC would own all the commercial risks fully
3.5.5. Overestimated and missing benefits
The RT business case was established in the context of the organisation wide transformation of its Operating Model. Arguably the CQC was naïve about the extent and complexity of the technological changes required. This led to a lack of recognition of the need for continuous investment and improvement after the initial release of the product.
This is evidenced by the way in which the cashable benefits were stated in the FBC:
Decommissioning of existing systems (£46.1M) and staff efficiencies (£76.8M) over 10 years from 1 April 2024
Decommissioning savings
All of the decommissioning cost savings are profiled at 100% from 1 April 2024. This is too ambitious and is not usual practice, noting that this was just 1 year from when the FBC was signed off. It is more typical to have a tapered savings profile for decommissioned systems for a number of reasons:
- It is usual that there are underpinning contracts that end at different times, rather than all conveniently ending the date at which the organisation expects to make the savings.
- There are typically good reasons why some of the legacy technical services need to run in parallel for a period of time – e.g. incomplete data migration, deliberate partial delivery of future technical services (based on a Minimum Viable Product (MVP) approach to delivery), providing a business continuity solution during the early life of the new system in case roll back is required, etc.
Staff efficiency savings
Again, the full cost savings are profiled at 100% from 1 April 2024 and 50% of the Inspectors Staff Efficiencies are expected during FY 2023/24 with the FBC stating that these savings were already being made “We are achieving £1.3m staff efficiencies through the move to the new operating model and the deployment of the new ways of working throughout 23/24”. Again, these staff savings are so highly ambitious as to appear not credible.
It is worth noting that as of Dec 2024 the actual quantum of cost saving for the programme to date is £0.3M against a plan of over £10M.
3.5.6. The investment was not Value for Money
The purpose of the Economic Case within the five case model is to ask the question: “is the investment VfM” this is typically expressed in summary by a positive Net Present Value (NPV) indicating that there will be a return on the upfront investment (ROI) at least commensurate in cash releasing or non-cash releasing terms with the spend, discounted for changes in the value of money over time.
The NPV of the OBC was between £21.7M (best case) and £30.7M (worst case) negative against an spend of £57.5M. There is a risk contingency described but in the high-level summary (in the FBC) there is no statement about how the existing risks (of failing technology, out of date business processes) are estimated in monetary terms and how these would be changed by the investment.
So, the OBC indicates an extreme loss and in the author’s experience would not have been approved in this status – i.e. the answer to the Economic case question – is this investment VFM is No.
3.5.7. Project Plan in the FBC far too ambitious
Given that the programme effectively started in the summer of 2020 and (as of Dec 2024) has only delivered a small proportion of the products and benefits that it aspired to, it would suggest that the lessons learned during 2020 – 2023 (the project plan was written in Feb 2023) had not been applied and there was an inaccurate understanding of the number of projects that could be successfully completed within any given time period. The UK government best practice guidance [vi] for managing programmes (Managing Successful Programmes (MSP)) expects a lessons learnt report at the end of every major milestone.
3.5.8. Risks not fully exposed and hence managed
The programme risk log embedded into the FBC is very limited (15 risks) for a programme this complex costly with very limited articulation of the typical technical or change management risks. Without consideration of these risks, there would be no proactive attempt to mitigate or build strategies to avoid (e.g. by using commercial levers within contracts for third parties, or adjustments to governance models). Appendix 3 shows a best practice set of risk headings for a large-scale digital transformation programme colour coded to show which risk headings were considered in the FBC (n=8) from the total possible number of headings (n=24).
3.5.9. Lack of a Data and Reporting Strategy
The FBC makes no reference to a data strategy within the CQC. As a result, data validation, standards, access control, interoperability, the need for routine management merging/deletion etc have not been effectively defined.
The lack of agreed strategy has led to the following impacts:
- All delivery support was stripped out of the Data and Insight unit with a view that this would be handled centrally. The central team is not large enough to support this activity, meaning data projects are often left without support or having to rely on costly contingent labour.
- Many change projects moving/changing data at the same time without recognition of interdependency and sequencing, e.g. changing the unit of analysis for the CQC assessment and inspections from location (e.g. a physical site) to service (e.g. Surgery, Maternity etc). Whilst this may have been a laudable and desirable policy shift the impact on the underlying data model was enormous and insufficiently thought through in advance. This affected how the ratings were calculated and can be compared over time, the risk assessment processes etc. There was little understanding of the impacts on data and reporting of this change and the impact on internal and external users of CQC data (e.g. DHSC) and hence no funding stream established to resource the redevelopments required.
- There is a lack of ownership of data taxonomies across the organisation – e.g. who 'owns' the new ASGs? - who controls the list? who decides if new ones are needed and how they are operationally defined?
- There is confusion about the architecture and the role of different systems, such as whether the dataverse (within RP) or Enterprise Data Platform (EDP) should serve as the primary data warehouse and some data (e.g. externally sourced) remains on the legacy data warehouse. This lack of clarity complicates data management and affects data quality.
3.6. Service Design
3.6.1. Trying to hit a moving target
As has been studied in other lessons learned reports the CQC was undergoing change at many levels organisation as it recreated its operating model from sector specific assessments to the SAF with the aim of following a patient through their entire journey as they navigate different sectors of health and social care. The SAF methodology was emergent during the years 2020 to 2023 and consequently attempting to build the support platform to the business process that was not mature would have been similar to trying to hit a moving target. Inevitably technical designs would have had to be revised as decisions were made in relation to the business processes. This may have even led to wasted effort and rework. This confirmed by the statement in the FBC (3.1 Economic case).
the Regulatory Transformation Programme has undergone a series of scope changes in order to respond to changing circumstances and requirements since of the previous business case. This has resulted in increases in programme spend, and changes to quantitative and qualitative benefits
3.6.2. Design Decisions
Naturally, the design of the new system was influenced by policy requirements created by the Executive Team (ET). The goal was to create a system that supported a proactive, data-led regulatory approach. The business processes to support the policy intentions were still emerging during the design phase, which made it challenging to create a system that fully aligned with both the ET’s required policy and the actual workflows of the users.
There was a disconnect between the policy-driven design and the actual user experience. The ET’s policy dictated certain design choices that did not always align with the users' needs and workflows as per these 3 examples below:
3.6.2.1. Individual accounts in the Notification App:
The programme decided that Notifications should be submitted, via the Provider Portal (PP) through individual accounts rather than shared accounts. This decision was driven by the desire to know exactly who was completing the forms inside the portal.
User research indicated that the process of completing forms was often collaborative, with multiple people involved, that necessitated shared accounts.
There was an attempt to create different types of user roles within the portal to accommodate the collaborative nature of form completion. This included roles for administrative users who could complete forms and more senior users who could review and send them. But the efforts to achieve this ran out of time and the application was deployed without that functionality.
3.6.2.2. Design choice: Drive for Streamlined Reports:
The ET’s position was that the reports should be more streamlined and concise. At some point in the design process this was translated into a technical requirement of a character count (2000) restriction on the fields within the reports, which some users found unnecessary and restrictive.
Reports were compartmentalized into different sections, and there was a screen where all parts could be seen together. However, this did not come across as a single document, which some users found problematic.
Supervisors had checkpoints where they could see what had been written and suggest pre-writes to the authors. However, the authors themselves did not see the compiled report before it was sent for factual accuracy testing.
Overall, the push for shorter, more streamlined reports was intended to improve efficiency and reduce the time spent on rewriting, but it also introduced some challenges and resistance from users who found the new restrictions and processes difficult to work with.
3.6.2.3. Algorithm versus professional judgement
The new system moved from relying on professional judgement to using an algorithm or calculation-based scoring. This shift was driven by the ET’s policy decisions.
Many users expressed concerns about this change, feeling that it undermined their professional judgement. They reported having to manually adjust scores to reflect reality better, which caused frustration and rejection of the Assessment app.
The design of the scoring system dictated that certain breaches would automatically result in an "inadequate" rating for a quality statement, which some users found problematic.
The issue of scoring was described as "fraught," indicating significant tension and disagreement among stakeholders. The ET’s intention to adhere strictly to algorithm-based scoring in spite of the Policy team pushing hard for a more balanced approach, clashed with the users' preference for professional judgement.
3.7. Service Transition
Figure 9 below is another excerpt from the PMO report into RP/RT relating to the way in which the programme was governed.
Decision Point | Status of Business readiness | Decision date |
---|---|---|
Go Live 2 - Go Decision | Business Readiness: Overall: Silver Adoption: Bronze DPIA: Approved | 13/07/2023 |
Go Live 2.1 - Go Decision (4 attempts at Go / No-Go) | Business Readiness: Overall: Silver Adoption: Silver (NCSC adoption: Bronze) (Training: Bronze) System Support: Gold DPIA: Not required | 19/10/2023 |
Go Live 3.1 - Go Decision | Business Readiness: Overall: Not rated Adoption: Not rated (Training:NCSC 88% / SOAD - 20%) System Support: UAT Testing not complete / OAT - not complete DPIA: Not required | 01/02/2024 |
Figure 9: PMO analysis of the RP Go live decisions
This table provides evidence that “Go decisions” were taken in full knowledge that the status of readiness of the programme was poor or not effectively assessed. For example, Go Live 3.1 shows there was no documented assessment of the status of readiness (Overall and Adoption – not rated) and the User Acceptance Testing (UAT), Organisational Acceptance Testing (OAT) was not complete.
3.7.1. Build scope
Many people interviewed for this IIR described regular “descoping” of functional deliverables as the programme struggled to meet its deadlines from the programme reset in 2022 to April 2024
3.7.2. Commercial control of the building work
The CQC entered into contractual relationships with its main delivery partners based on a capped Time and Materials (T&M) approach rather than a Fixed Cost (FC) approach. The contracts were established as zero value with the costs being agreed relating to individual Statements of Work (SOW). There is nothing in the contract Terms and Conditions that would have prevented individual SOWs being based on FC, but they were mainly based on T&M.
In discussion with the main supplier, they were clear that they would never enter into a FC based contract. The NHS would rarely enter into a T&M contract as it moves all the risk of nondelivery to the customer (as per section 3.5.4).
3.7.3. Build quality
At least 5 major concerns with the quality of the build and quality assurance process have been noted, the first 3 have been independently studied by Microsoft [vii] and Littlefish [viii]:
- For part of the technical build a tool was selected (Canvas App) to build front end screens for user data input. There is a known limit to the number of controls that can be added to a Canvas App and performance (e.g. speed of page refresh) is known to degrade as this limit is reached. The app has been over-customized to the point where it reaches the limits of its capabilities.
- In the current implementation using Canvas apps, the file upload process involves caching the entire file before uploading it to the server. This approach leads to significant performance issues, especially with large files. This can cause delays and slow down the system. This is a known problem in using Canvas apps and has been documented by Microsoft [ix].
- The Canvas App's approach to integrating with SharePoint document libraries is not scalable, leading to known performance issues. This is a known problem in using Canvas apps and has been documented by Microsoftxiii
- Architectural choices – monolithic versus microservices. The RP design is monolithic, which complicates maintenance and updates. This design choice has led to significant technical debt and performance bottlenecks.
- A demographic matching process was programmed into RP based on a match of only the first name and surname of the identity. This led to a flawed matching of entities with RP with the creation of mismatched records.
3.7.4. User Acceptance Testing
UAT is a critical aspect of any deployment. This must be commensurate with the level of tailoring a Commercial Off the Shelf (COTS) system has been through -i.e. nobody would be expected to UAT Microsoft Office out of the box but if, for example, you built a complex spreadsheet with many worksheets, formulae and macros to semi-automate the decision making process for a bidding process against a national fund (as DHSC is known for doing so) it would be foolish to cut corners on the UAT phase as the effort needed to repair the mess created from hundreds of erroneous responses would be enormous compared to getting it right first time. The UAT phase is effectively the final safeguard (before deployment) against catastrophe.
Whilst the RP was built on a COTS product (i.e. D365) it was fully configured to the CQCs requirements and the out of the box functionality was not used. As such, by best practice, the UAT phase should have been fully resourced with the appropriate experts, provided enough time with an expectation that “the build” would not pass UAT on first pass.
Some testing was undertaken e.g. day long walkthroughs of the assessment app with assessors and inspectors at which improvements were made and a backlog of required changes recorded but even these events were insufficient to get to the point that UAT was satisfactory, given the scale and complexity of the solution. Throughout the RP programme the pressure on time and budget meant that it was impossible to undertake testing at a sufficiently detailed level and that that was done did not adequately address the issues.
Essentially the UAT processes was also where change management was happening in practice – i.e. the system was built to execute a policy position that changed the ways in which staff would be expected to work (e.g. automatic scoring of evidence). The change had not been effectively managed with the staff (as represented by their SMEs) and as such a conflict arose where SMEs were indicating that the new system (as a practical manifestation of a new policy/way of working) would not work in practice but they were expected to get on board with the new approach.
3.7.5. Deployment
3.7.5.1.Training
Training is a critical aspect of deployment planning and execution. System training was fragmented and inconsistent as a result of issues with trainer engagement and the absence of a cohesive training strategy.
There was insufficient distinction between the roles of the technical training team and the super users. The super user community was set up by the programme without knowledge of the training team’s existence. This community was intended to assist people having problems with RP which overlaps with the responsibilities of the technical training team, with super users sometimes getting access to information and pilot systems before the technical trainers.
Trainers were not sufficiently involved in the initial stages of the training development process, their organisational knowledge and deep expertise (e.g. in supporting people from an accessibility point of view) was ignored. This lack of engagement led to challenges in ensuring that the training content was accurate and effectively communicated to the users.
When undertaking IT training it is critical to have a training environment that accurately mimics the live production environment. This was not the case for RP which caused confusion and waste for learners.
The reliance on external learning consultants further complicated the situation, as it created a disconnect between the trainers and the training material.
3.7.5.2. Release and Deployment
Continuous Integration, Continuous Deployment
The RP Release and Deployment method utilised a fairly new concept in the IT industry called Continuous Integration, Continuous Deployment (CI/CD), which only became part of best practice in 2019 (ITILv4). This is a very powerful tool that automates the integration of code changes and deployment processes, allowing for quicker releases and updates. It automates some of the technical testing with various types of tests such as unit tests, integration tests, and end-to-end tests and its goal is to ensure that code changes do not introduce new bugs or break existing functionality. However, it cannot replace the need for UAT - while some aspects of UAT can be automated, such as predefined test cases and scenarios, it still requires manual intervention to validate the user experience and gather feedback.
To support CI/CD the CQC is using Azure DevOps, which is a comprehensive suite of development tools and services designed to support the entire software development lifecycle.
The CQCs instance of Azure DevOps is however incomplete (e.g. descriptions of live services only available for Assessment) and out of date in some areas (e.g. responsible personnel that have left).
It hasn’t been possible in this review to make any assessment of the integrity of the technical data within the tool or whether the controls (i.e. who can change, delete data and who can execute the “pipelines” (the way in which new developments are integrated into the live environment)) are appropriately assigned to trained staff working with an accountability framework but the history or RP releases has not always been positive as this quote illustrates:
Each time we were assured problems were to be fixed with an upgrade, we ended up with spectacular problems – version 7 upgrade being the worst.
Insufficient learning from pilots
The following quote illustrates a point that was mentioned many times over about a lack of sharing of lessons from deployment pilots to inform the rollout process:
The south was put forwards to pilot the system after it had been delayed repeatedly due to issues. When they did go live absolutely nothing of what was happening at the ground floor was ever shared, just happy messages about all the work they were doing to make things get better at tech level. It later turns out colleagues made it so abundantly clear that the workarounds and system was not fit for rolling out, but this feedback from staff was just ignored
Bypassing controls
Concerns were expressed that the delivery of key pieces of functionality were rushed through at the end of the programme to meet the deadline and normal checks and controls were bypassed for expediency
3.8. Service Operation
By this stage of the lifecycle all of the problems from the early phases manifested and the organisation just had to troubleshoot to the best extent possible. The SO phase of the life cycle followed the typical, expected arrangements with effective support from first line (Technical Support Officers), second line support (Apps Support) and escalation back to the external support partner as third line support. As the data in section 2.5.4 shows there were high volumes of support calls commensurate with the poor quality of the user experience and aspects of RP that didn’t work as planned.
The difficulties experience by the PP, necessitated the establishment of a separate team (Provider Portal Queries) who have responded to nearly 36,000 emails and escalated around 2,800 tickets for further investigation.
3.9. Continual Service Improvement
Following the closure of the RT programme in April 2024, a Service Improvement Programme was established. This was beyond the typical scope of CSI (which is to identify and implement improvements to services, processes, and overall service management practices) as its scope was to implement functionality that had been descoped in the RP programme. Implementing new functionality requires a full project or programme (depending on size) governance rather than the CSI model which is proposed in ITIL.
During this time there was an improvement programme team which was tasked with implementing functionality that had been descoped and a live services team that was fixing issues with the deployed software. Both teams were working on the same code base in different ways resulting in misalignment and functionality issues at the point of go live. This was costly to resolve impacting operational colleagues in using the system, technology colleagues trying to support and fix the system, and damaged the reputation of RP. This has been likened to “two surgeons working on one body at the same time without talking to each other”
Excellent work already in progress
CQC colleagues and partners have, of course, already spent countless hours engaging with end users to understand concerns and repairing technical issues with RP. The documents entitled Raised, Resolved, Reported and communication methods, like “Whilst I’ve Got You” video casts, along with the over 14500 incidents raised to the Service Desk which have been resolved provide evidence that many people have been working very hard to improve the experience for staff.
Notes
iv Government Functional Standard - GovS 005: Digital
v Guide to developing the Programme Business Case
vi Project and programme management - GOV.UK
vii Microsoft Root Cause Analysis CQC 1 (DS)
viii Care Quality Commission (CQC) Assessment Canvas App Review Report (1)
ix Common canvas apps performance issues and resolutions - Power Apps | Microsoft Learn