Clinical Review

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Anifrolumab (Saphnelo): CADTH Reimbursement Review: Therapeutic area: Systemic lupus erythematosus [Internet]. Ottawa (ON): Canadian Agency for Drugs and Technologies in Health; 2023 Apr.

Anifrolumab (Saphnelo): CADTH Reimbursement Review: Therapeutic area: Systemic lupus erythematosus [Internet].

Show details

Contents

< Prev Next >

Clinical Review

Executive Summary

An overview of the submission details for the drug under review is provided in Table 1.

Table 1

Submitted for Review.

Introduction

Lupus is an autoimmune disease that affects approximately 1 in 1,000 Canadians, and the most serious form of lupus is systemic lupus erythematosus (SLE).¹^,² The precise etiology and pathophysiology are unknown; however, females are more commonly afflicted than males at a ratio of 9:1.²^-⁴ Onset is primarily between the ages of 16 and 55, although the disease can present at any age.³ The symptoms of lupus can vary greatly.²^,³ Patients can experience fatigue and joint pain, which can be disabling, as well as neurologic, renal, and cardiovascular sequelae, rash, and a variety of other symptoms.² The disease has a variable course, and patients can cycle among a chronic state to flares (acute worsening of their condition) to remission.⁵ Long-term organ damage is the main risk factor for mortality and may occur from the disease pathology as well as during periods of low disease activity due to toxicity from treatment.⁶

SLE is treated with medications that are taken acutely on an as-needed basis, as well as chronically. The first-line drug among the chronically administered drugs is an antimalarial, which interferes with intracellular toll-like receptor signalling. Given that SLE is an autoimmune disorder, immunosuppressants also play an important role, and a variety are used (azathioprine, cyclophosphamide, methotrexate, mycophenolate, cyclosporine). Immunosuppressants are associated with multiple harms, including the risk of serious infection and malignancy, and they present significant tolerability issues for patients. Corticosteroids are used to reduce inflammation and pain.⁷ This treatment is well known for toxicities such as osteoporosis, psychiatric issues, cataracts, diabetes, hypertension, weight gain, hirsutism, glaucoma, among others, particularly when used chronically, and chronic use is avoided as much as possible, although immunosuppressants are relied on to treat flares.⁷

Anifrolumab is a human immunoglobulin G1 kappa monoclonal antibody that binds to the interferon-alpha and -beta receptor subunit 1 (IFNAR1).⁸ Anifrolumab also induces the internalization of IFNAR1, reducing the number of receptors available for binding and therefore reducing inflammation and immunological processes.⁸ Type I interferons play an important role in the pathogenesis of SLE.⁸ Approximately 60% to 80% of adult SLE patients have high levels of type I interferon–inducible genes, which are associated with increased disease activity and severity.⁸ Anifrolumab is administered as an IV infusion over 30 minutes every 4 weeks and is indicated in addition to standard therapy for the treatment of adult patients with active, autoantibody-positive SLE.⁸

The objective of this report is to perform a systematic review of the beneficial and harmful effects of anifrolumab 300 mg, administered as an IV infusion in addition to standard therapy for the treatment of adult patients with active, autoantibody-positive, SLE.

Stakeholder Perspectives

This section summarizes input provided by the patient groups who responded to CADTH’s call for patient input and from a clinical expert consulted by CADTH for the purpose of this review.

Patient Input

Four responses to CADTH’s call for patient input for the anifrolumab submission were received. These consisted of submissions from Arthritis Consumer Experts (ACE), Lupus Canada, Lupus Ontario, and a cooperative submission from the Canadian Arthritis Patient Alliance, Arthritis Society, and Canadian Skin Patient Alliance. Patient input was gathered from 148 responses to surveys of patients with lupus across Canada, including 34 respondents (88% female) from ACE, 112 (96.4% female) from Lupus Canada, and 2 respondents with SLE from Lupus Ontario. The cooperative submission conducted a focus group of 10 patients (90% female) with SLE. The submission from ACE also conducted an in-depth interview with 1 patient. None of the patients in the included submissions had experience with the treatment under review.

Patients reported that managing SLE was difficult given the severity of the physical symptoms, such as debilitating fatigue, pain, persistent headaches, and difficulty breathing. Respondents reported that current treatments are difficult to tolerate because of the many side effects, such as headaches, brain fog, additional fatigue, frequent infections, osteoporosis, gastric issues, insomnia, hair loss, weight gain or loss, mood swings, allergic reactions, nausea, anxiety, and tremors, as well as concerns about organ damage.

The key outcomes patients would like to see addressed by a new therapy are a reduction of side effects and the number of medications used; reduction in fatigue, flares, headaches, brain fog, joint and muscle pain, and rash and skin irritations; increased lifespan; overall improvement in quality of life (QoL); and improvement in sleep patterns. Patients would also like to see enhanced mobility, improved tolerance to UV light, productivity, and ability to work and carry out activities of daily living (ADLs) and social roles. Overall, it is clear that SLE significantly impairs health-related quality of life (HRQoL), impairs function, and elicits a number of serious symptoms.

Clinician Input

Input From Clinical Experts Consulted by CADTH

SLE is currently treated chronically with immune modulators such as high-dose corticosteroids, antimalarials, azathioprine, methotrexate, mycophenolate mofetil, cyclophosphamide, and cyclosporine and/or tacrolimus. The clinical expert consulted by CADTH identified side effects as the major limitation of current treatment, namely prednisone and immunosuppressants. Other unmet needs include nonresponse, lack of adherence, polypharmacy, chronic organ damage, and recurrent flares that cause progressive organ damage. Currently no treatments provide a long-term cure or long-term medication-free survival. According to the clinical expert, the current place in therapy for anifrolumab would be after nonresponse or toxicity with an antimalarial and an oral corticosteroid (OCS) or prednisone dependency. In patients with major organ involvement, anifrolumab could be used as a second-line therapy in combination with at least 1 immunosuppressive drug plus hydroxychloroquine after failure on standard of care. According to the clinical expert, the patients most likely to benefit from anifrolumab are those with moderately to severely active disease (e.g., active skin manifestations and polyarthritis), those who are prednisone-dependent or intolerant, and those for whom adherence to standard medication is an issue. In addition, the clinical expert noted that treatment effects with anifrolumab can be seen regardless of previous treatments, such as standard of care, and/or failure to successfully taper prednisone. The clinical expert identified those least likely to benefit from anifrolumab as patients with severe nephritis or a disease of the central nervous system (CNS); clinicians are less likely to use anifrolumab in place of standard of care because of the severity of illness in these cases.

In the opinion of the clinical expert, a clinically meaningful response to anifrolumab would be a meaningful reduction in disease activity as measured by clinical and laboratory outcomes such as autoantibodies, complement levels, hemoglobin levels, improvement in ADLs, reduction of signs and symptoms, and tapering of steroids. Treatment response should generally be assessed every 2 to 3 months for those with active disease. The rapidity of response depends on the treatment (e.g., corticosteroids are the most rapid). In the opinion of the clinical expert, treatment should be administered by a rheumatologist or physician with extensive experience in the diagnosis and management of SLE. Treatment should be discontinued in the case of nonresponse, life-threatening adverse events (AEs), or steroid dependency (e.g., an inability to taper prednisone after 4 to 6 months of treatment or an increased dose of prednisone for more than 3 months).

Clinician Group Input

The 20 clinicians who provided input for this review represented 2 clinician groups: the Canadian Network for Improved Outcomes for Systemic Lupus Erythematosus (CaNIOS) and the Toronto Lupus Program at the University of Toronto.

Over all, the views of the clinician groups were consistent with those of the clinical expert consulted by CADTH. The clinician groups indicated that an ideal treatment would have a meaningful impact on overall survival by reducing disease activity, risk of subsequent flares, use of an OCS, risk of AEs, and long-term complications, while inducing remission and improving HRQoL. The goal of treatment with anifrolumab should be the reduction of the daily prednisone dose to below 7.5 mg/day in the first 12 months of treatment or a 50% reduction of the initial baseline dose. Both clinician groups indicated that all patients with SLE would benefit from anifrolumab regardless of previous treatment history. According to the clinician groups, anifrolumab is expected to cause a shift in the current treatment paradigm as its novel interferon-blocking mechanism of action renders it most suitable for patients with serologically active disease, frequent flares, and “steroid dependence,” which is the population with the greatest unmet need.

Drug Program Input

The drug programs provide input on each drug being reviewed through CADTH’s reimbursement review processes by identifying issues that may affect their ability to implement a recommendation. The drug plans identified implementation issues related to considerations for initiation of therapy, continuation and/or renewal of therapy, discontinuation of therapy, prescribing, and generalizability. The clinical expert consulted by CADTH weighed evidence from 2 trials, TULIP-1 and TULIP-2, and other clinical considerations to provide responses to drug programs’ implementation questions. Table 4 provides more details.

Clinical Evidence

Pivotal Studies and Protocol-Selected Studies

Description of Studies

Two sponsor-submitted trials, TULIP-1 and TULIP-2,⁹^,¹⁰ were included in this review. The TULIP-1 trial (123 sites in 18 countries, N = 457) and the TULIP-2 trial (119 sites in 16 countries, N = 365) are phase III, multicentre, randomized, double-blind, placebo-controlled studies that evaluated the efficacy and safety of an IV treatment regimen of anifrolumab 300 mg in adult patients (aged 18 to 70 years) with moderate to severe autoantibody-positive SLE while receiving standard-of-care treatment. The primary objective was to evaluate the effect of anifrolumab 300 mg compared to placebo on disease activity as measured by the difference in the proportion of patients who achieve an improvement of 4 points or greater on the Systemic Lupus Erythematosus Responder Index (SRI-4) at week 52 for the TULIP-1 trial or a British Isles Lupus Assessment Group-based Composite Lupus Assessment (BICLA) response at week 52 in the TULIP-2 trial. In the TULIP-1 trial, the key secondary objectives were to evaluate the effect of anifrolumab 300 mg compared to placebo on:

the proportion of patients with SRI-4 at week 52 who were in the subgroup with a high results from a type I interferon gene signature test
the proportion of patients who achieved an OCS dosage of no more than 7.5 mg/day at week 40, which was maintained through week 52 in the subgroup of patients with a baseline OCS dosage of 10 mg/day or higher
the proportion of patients with a 50% or greater reduction in the Cutaneous Lupus Erythematosus Disease Area and Severity Index (CLASI) activity score at week 12 in the subgroup of patients with baseline CLASI activity score of 10 or higher
the number of patients who achieved a SRI-4 at week 24
the annualized flare rate through 52 weeks.

The key secondary objectives in the TULIP-2 trial were the same as TULIP-1 with the addition of:

the proportion of patients with a BICLA response at week 52 (replaces SRI-4 response at week 52)
the proportion of patients with a BICLA response at week 52 in the type I interferon gene signature test high subgroup
the proportion of patients with a 50% or greater reduction in joint counts at week 52 in the subgroup of patients with at least 6 swollen and at least 6 tender joints at baseline (the number of patients who achieved an SRI-4 at week 24 was removed).

Patients who were automatically considered nonresponders included those who withdrew or discontinued the investigational product, those received concomitant medications beyond the protocol-allowed threshold, those who required OCS doses beyond their baseline maximum dose, and those who had missing data for a component for 2 or more consecutive visits. While there was some variance between trials in terms of the participating countries, most sites in both trials were based in the US (40.7% in the TULIP-1 trial and 36.5% in TULIP-2) and Europe (37.9% in the TULIP-1 trial and 26.8% in TULIP-2), with no Canadian sites in the TULIP-1 trial and 2 Canadian sites in TULIP-2. Except for different primary outcomes and some variance in key secondary outcomes, the trials were similar in terms of blinding, randomization, inclusion and exclusion criteria, and drug administration procedures. Baseline patient characteristics, including age, race, sex, height, weight, and body mass index, were balanced between groups in both trials. The median ages of enrolled patients were 41 and 43 years in the TULIP-1 and TULIP-2 trials, respectively, and patients were predominantly female (92.3% in the TULIP-1 trial and 93.4% in TULIP-2) and white (71.3% in the TULIP-1 trial and 59.9% in TULIP-2). The TULIP-2 trial had a larger proportion of missing data on race (4.4% versus 0) compared to the TULIP-1 trial. The majority of patients tested high for the type I interferon gene (approximately 82% across groups and studies). SLE measures, including the Systemic Lupus Erythematosus Disease Activity Index 2000 (SLEDAI-2K), British Isles Lupus Assessment Group 2004 (BILAG-2004), Physician’s Global Assessment (PGA), CLASI, and joint count, were balanced between treatment groups and similar between studies. The median time from initial SLE diagnosis to randomization was highest in patients in the treatment arm of the TULIP-2 trial (mean = 130.2 months; standard deviation [SD] = 109.28). Cushingoid features were higher in the TULIP-1 trial compared to TULIP-2 (39% versus 26%, respectively) and there was a slightly higher number of patients with a baseline dose of OCS of greater than 10 mg in the TULIP-1 trial (56.3%) than in TULIP-2 (47%). Overall previous medication use at baseline was balanced between groups and between studies.

Efficacy Results

The key outcomes from the TULIP-1 and TULIP-2 trials are summarized in Table 2. In the TULIP-1 trial, the primary end point, SRI-4 response at week 52, was not statistically significant (36.2% in the anifrolumab 300 mg group versus 40.4% in the placebo group; treatment difference of 4.2%; 95% confidence interval [CI], −14.2% to 5.8%; P value = 0.412) and the key secondary end points did not demonstrate statistical significance for the SRI-4 interferon-test high subgroup (P value for the between-group difference = 0.549), maintained OCS dose (P value for the between-group difference = 0.180), CLASI activity (P value for the between-group difference = 0.054), and annualized flare rate (P value for the between-group difference = 0.258).

In the TULIP-2 trial, the primary end point of a BICLA response at week 52 was statistically significant in favour of the anifrolumab 300 mg group (47.8% in the anifrolumab 300 mg group versus 31.5% in the placebo group; treatment difference = 16.3%; 95% CI, 6.3% to 26.3%; P value = 0.0013). In addition, statistically significant differences in favour of the anifrolumab 300 mg group were reported for the key secondary end points of BICLA in patients with a high result on an interferon test, a maintained OCS reduction with a baseline OCS of 10 mg/day or higher, and a CLASI response with a baseline CLASI activity score of 10 or higher. However, no statistically significant differences were seen in the proportion of patients with 50% or greater reduction in joint count (42.2% in the anifrolumab 300 mg group versus 37.5% in the placebo group; between-group difference = 4.7%; 95% CI, −13.5 to 17.6; P value = 0.5469) and annual flare rate (0.43 in the anifrolumab 300 mg group versus 0.64 in the placebo group; ratio difference = 0.67; 95% CI, 0.48 to 0.94; P value = 0.0809) in the TULIP-2 trial.

The primary and key secondary end points were also measured in the subgroup of patients with an OCS dose of 10 mg/day or higher at baseline. However, statistical analyses were not conducted for this subgroup, except for the key secondary end point of maintenance of OCS reduction. Overall, a numerically higher proportion of patients in the anifrolumab group compared with the placebo group for this subgroup of patients achieved the primary and key secondary end points (except joint count reduction) in the TULIP-2 trial. In the TULIP-1 trial, the results were mixed, with only the outcomes of CLASI activity and annualized flare rate showing an improved response in the anifrolumab group compared to placebo.

In both studies, the difference in responses between the treatment groups was minimal for HRQoL (measured by the Short Form (36) Health Survey [SF-36], Lupus QoL, and 5-Level EQ-5D [EQ-5D-5L] questionnaires) and symptom scores (measured by the pain numerical rating score [NRS], and Functional Assessment of Chronic Illness Therapy–Fatigue [FACIT-F]). The proportion of patients who exceeded the estimated minimal important difference (MID) were only provided for the SF-36 and FACIT-F. In the TULIP-1 trial, at week 52, the proportion of mental component summary (MCS) responders (defined as change from baseline of at least 4.6 points, the MID for MCS), was 20.9% in the anifrolumab 300 mg group, and 16.7% in the placebo group, with a between-group difference of 4.2% (95% CI, −4.1 to 12.6), and the proportion of physical component summary (PCS) responders (defined as change from baseline of at least 3.4 points, the MID for PCS) in the anifrolumab 300 mg group was lower compared with the placebo group by 25% versus 26.7%, with a between-group difference of −1.7% (95% CI, −10.9 to 7.5). In the TULIP-2 trial at week 52, the proportion of MCS responders in the anifrolumab 300 mg group compared with the placebo group was 27.4% versus 21.2%, respectively, with a between-group difference of 6.2%; (95% CI, −2.71 to 15.2) and the proportion of PCS responders in the anifrolumab 300 mg group compared with the placebo group was 32.8% versus 24.4%, respectively, with a between-group difference of 8.4% (95% CI, −1.1 to 17.8). In the TULIP-1 trial, a slightly higher proportion of patients in the anifrolumab 300 mg group had reduced fatigue at week 52, as measured by the FACIT-F responder rate (defined as improvement from baseline to week 52 of > 3 points), compared with the placebo group (29.3% versus 26.8%, respectively; between-group difference = 2.4%; 95% CI, −0.9 to 17.9). The TULIP-2 trial also had a numerically higher proportion of patients in the anifrolumab 300 mg group who had reduced fatigue at week 52, as measured by the FACIT-F responder rate, compared with the placebo group (33.2% versus 24.7% respectively; between-group difference = 8.5%; 95% CI, 6.9 to 11.8).

Harms Results

Key harms reported in the TULIP-1 and TULIP-2 trials are summarized in Table 2.

Rates of AEs were similar across treatment groups and across trials (approximately 85% to 90% prevalence). In the TULIP-1 and TULIP-2 trials, the most common AEs were nasopharyngitis (20.0% and 15.6% in the anifrolumab 300 mg group versus 12.0% and 11% in the placebo group, respectively), upper respiratory tract infection (12.2% and 21.7% versus 9.8% and 9.9%), and urinary tract infection (12.2% and 11.1% versus 14.7% and 13.7%). Serious adverse events (SAEs) were more common in the placebo group versus the anifrolumab group across the TULIP-1 and TULIP-2 trials (13.9% versus 16.3% and 8.3% versus 17%, respectively). In the TULIP-1 trial, the most common SAEs were SLE (1.7% and 1.6%) and pneumonia (1.7% and 0.5%). In the TULIP-2 trial, the most common SAEs were pneumonia (1.7% and 3.8%), followed by SLE (0.6% and 3.3%).

Withdrawals were greater in the anifrolumab group versus the placebo group in the TULIP-1 trial (6.7% versus 3.8%, respectively). Withdrawals were lower in the anifrolumab group compared to the placebo group in the TULIP-2 trial (2.8% versus 7.7%, respectively). In the TULIP-1 trial, the most common reason for withdrawal in the anifrolumab group was herpes zoster (1.1%). In the TULIP-2 trial, the most common reason for withdrawal in the placebo group was SLE (1.6%) followed by pneumonia (1.1%).

There was a total of 2 deaths during the TULIP-1 study and 1 death in the TULIP-2 study. One patient in the anifrolumab 300 mg group of each trial had a fatal SAE of pneumonia during the treatment period. In the TULIP-1 trial, 1 patient in the placebo group had a fatal SAE of encephalitis during the follow-up period. The study investigators determined that these deaths were not related to the investigational product.

In the TULIP-1 trial, notable harms included hypersensitivity reactions (6.1% anifrolumab 300 mg versus 1.1% placebo), infusion-related reaction (8.9% versus 7.1%), herpes zoster (5.6% versus 1.6%), serious, nonopportunistic infections (5.0% versus 4.3%), malignancies (1.7% versus 0.5%), depression (2.8% versus 2.7%), and suicidal ideation or behaviour (1.1% versus 1.6%). In the TULIP-2 trial, notable harms included infusion-related reactions (13.9% versus 7.7%), herpes zoster (7.2% anifrolumab 300 mg versus 1.1%, placebo), serious, nonopportunistic infections (2.8% versus 5.5%), hypersensitivity (1.1% versus 0.5%), malignancy (0% versus 0.5%), depression (2.8% versus 1.6%), and suicidal ideation or behaviour (1.7% versus 4.4%). Herpes zoster was more common among the anifrolumab group across both trials, but no cases were considered SAEs. Depression was measured by the 8-item Patient Health Questionnaire (PHQ-8), but no clinically meaningful changes were observed for any group across either trial. Suicidal ideation and behaviour were measured by the Columbia Suicide Severity Rating Scale (C-SSRS). Overall, few patients reported suicidal ideation or suicidal behaviour at any time during the studies, with no imbalance observed between treatment groups.

Table 2

Summary of Key Results from Pivotal and Protocol-Selected Studies.

Critical Appraisal

A number of factors between the 2 pivotal trials contributed to bias or general uncertainty of the outcomes. The primary outcome for the TULIP-1 and TULIP-2 trials was the composite score of SRI-4 and BICLA, respectively. The decision to switch the primary end point in the TULIP-2 trial was based on the results of the TULIP-1 and MUSE trials, and this decision was made before the unblinding of the data in the TULIP-2 trial at week 52. The risk of operational bias is therefore low. Both trials followed the same procedures for blinding, database lock, unblinding, and data analysis, and concerns for potential investigator bias are low. The risk of confounding variables were accounted for through stratification (e.g., SLEDAI-2K score at screening, baseline OCS dose, and type I interferon gene signature test results). Baseline imbalances of these factors could affect efficacy and/or safety assessments of anifrolumab versus placebo. Overall baseline characteristics and disease activity scores (e.g., CLASI activity, SLEDAI-2K scores) were generally similar and balanced between groups across both trials; however, there was a greater percentage of patients with a CLASI damage score of 10 or higher in the treatment arm compared to placebo in the TULIP-2 trial (8.9% versus 4.4%, respectively), versus TULIP-1 (6.1% versus 4.3%), which may allow for greater leaps in improvement in patients with more severe disease for this outcome. Other concerns include potential ceiling effects for patients with lower disease activity scores (e.g., a patient with a baseline SLEDAI-2K score of 6 would be less likely to achieve a 4-point drop compared with someone who starts with a score of 12).

In the TULIP-1 trial, there were similar rates of withdrawal in both study arms (18.9% anifrolumab versus 19% placebo) while discontinuation was much lower in the treatment arm of TULIP-2 versus placebo (13.3% versus 25.3%, respectively). Discontinuations were primarily due to patient request, AE, lack of efficacy, and condition under investigation worsened. In the TULIP-2 trial, a slightly higher proportion of patients discontinued due to patient requests in the placebo group (10.4%) than in the anifrolumab group (6.1%) and more patients in the placebo group withdrew due to AEs (3.8% versus. 1.7%) and lack of efficacy (4.4% versus 1.1%) before the end of the study.

The sponsor adhered to its statistical testing hierarchy for the multiplicity adjustment, testing outcomes in sequence. Sensitivity analyses and multiplicity adjustments were only conducted in the TULIP-2 trial as the TULIP-1 trial did not meet its primary end point. The sponsor used a non-responder imputation approach in which patients who withdrew from the study or received restricted medications beyond the protocol-allowed threshold would be considered nonresponders. With this approach, when more patients withdraw in the placebo group, this may bias the results in favour of anifrolumab as these patients would be considered nonresponders whether or not they were responding at the time of withdrawal. The sensitivity analyses performed by the sponsor supports the findings of its primary analysis of the TULIP-2 trial, using approaches such as last observation carried forward (LOCF) as well as tipping-point analyses. LOCF was also used to impute missing data where individual components of the primary composite outcome were missing. Missing data rates were higher among the BILAG-2004 component for both studies.

The clinical expert consulted by CADTH agreed that the baseline patient characteristics of the TULIP-1 and TULIP-2 trials were reflective of patients they see in Canadian clinical practice for the present indication. Although the majority of patients in each study were enrolled in trial sites from the US and Europe, the population enrolled in the trial was consistent with the population expected to be treated in Canadian clinical practice. The clinical expert noted that prescribing patterns may differ between countries (e.g., higher use of nervous system medication, or use of mizoribine, which is not prescribed in Canada); however, no differences in treatment effects would be expected based on different disease-management practices. Additionally, American College of Rheumatology (ACR) criteria were used to identify patients with SLE in both trials, and these are rigorous criteria that are designed for use in clinical trials, rather than clinical practice. There is therefore a higher risk of misdiagnosis of SLE occurring in clinical practice, although the clinical expert consulted by CADTH for this review noted that diagnosis of SLE should be straightforward for clinicians with specialty training. Furthermore, the subgroup analyses (e.g., high versus low interferon-test results) had no statistical comparisons and even smaller sample sizes, which limits the generalizability to a broader population.

According to the clinical expert, improvements in organ damage or other longer-term outcomes (e.g., mortality) while on anifrolumab are unlikely to be detected during a 52-week double-blind treatment phase because of insufficient duration. The composite primary outcome, patients with an SRI-4 or BICLA response, would not be used routinely to assess patient status in clinical practice; however, the components of the composite would be an important part of the assessment of patients with SLE (e.g., clinical Systemic Lupus Erythematosus Disease Activity Index [SLEDAI] score). Given that anifrolumab has not been studied versus an active comparator, the efficacy and harms of this drug compared to the addition of other drugs used in the treatment of SLE is unknown. Although a variety of drugs are used chronically to manage SLE, none were specifically developed to manage this disease.

Other Relevant Evidence

Description of Studies (MUSE and Study 1145)

Two submitted studies provided in the sponsor’s submission to CADTH were considered to address the long-term efficacy of the treatment under review. These include a phase II, multinational, multicentre, randomized, double-blind, placebo-controlled study (MUSE)¹¹ and a phase II, single-arm, open-label, long-term extension (LTE) study to evaluate the long-term safety of anifrolumab (Study 1145).¹² Inclusion and exclusion criteria and baseline demographics were consistent with the TULIP-1 and TULIP-2 clinical trials. The primary efficacy end point for the MUSE study was the proportion of patients who at day 169 (week 24) achieved an SRI-4 response as defined in the TULIP-1 trial. Patients who were not able to taper their OCS dosage to less than 10 mg/day (prednisone or equivalent) or to a dosage equal to or less than their day 1 dosage by day 85 (week 12) and maintain this decrease until day 169 (week 24) were declared nonresponders for the primary end point. Subgroup analyses included the proportion of patients who tested positive on a type I interferon signature diagnostic test achieving an SRI-4 response with OCS tapering. Secondary efficacy end points included the proportion of patients achieving an SRI-4 response at day 365 and the proportion of patients on 10 mg/day or higher of oral prednisone (or equivalent) at baseline who were able to taper to no more than 7.5 mg/day at day 365 (week 52).

Study 1145¹² (N = 218) was a single-arm, open-label, long-term safety (up to 3 years; 70.6% of patients were treated for 30 months or longer) and tolerability study of anifrolumab 300 mg every 4 weeks by IV infusion in adult patients with chronic, moderate to severe SLE who were previously treated with any dose of anifrolumab or placebo in the MUSE trial. Safety assessments consisted of reporting all AEs, including treatment-emergent adverse events (TEAEs) and SAEs, as well as adverse events of special interest (AESIs). The primary end points of the study were the safety and tolerability of IV anifrolumab in adult patients with moderately to severely active SLE who were assessed primarily by summarizing TEAEs, SAEs, withdrawals due to adverse events, and AESIs. The secondary safety outcome included evaluating the immunogenicity results of anifrolumab by summarizing the proportion of patients who developed detectable antidrug antibodies (ADAs). Other outcomes were also assessed in the trial as exploratory efficacy outcomes; however, they are not reported further in this review. These included outcomes to evaluate the efficacy, pharmacokinetic, pharmacodynamic, and HRQoL impacts of anifrolumab.

Efficacy Results

In the MUSE study, A total of 34.3% of patients had an SRI-4 response with OCS tapering at week 24 in the anifrolumab group compared to 17.6% in the placebo group, with a statistically significant odds ratio (OR) of 2.38 (90% CI, 1.33 to 4.26; P value = 0.014). The proportion of patients with a high result on an type I interferon test who had an SRI-4 response with OCS tapering at week 24 was 36.0% for the anifrolumab group and 13.2% for the placebo group with an OR of 3.55 (90% CI, 1.72 to 7.32). The difference was statistically significant, with a P value of 0.004. For this secondary end point at week 52, a total of 51.5% of patients had an SRI-4 response with OCS tapering in the anifrolumab group compared to 25.5% in the placebo group, with an OR of 3.08 (90% CI, 1.86 to 5.09; P value < 0.001). For this secondary end point, a total of 56.4% of patients in the anifrolumab group on 10 mg/day or higher of oral prednisone (or equivalent) at baseline were able to taper to no more than 7.5 mg/day by week 52 compared to 26.6% in the placebo group, with an OR of 3.59 (90% CI, 1.87 to 6.89; P value = < 0.001).

Harms Results

In the MUSE trial, 84.8% of patients in the anifrolumab group and 77.2% of patients in the placebo group reported 1 or more TEAEs, the most common being headache, upper respiratory tract infection, nasopharyngitis, and urinary tract infection. Nasopharyngitis occurred at a higher frequency in the anifrolumab group (12.1%) than in the placebo group (4.0%).

The proportion of patients with 1 or more SAEs was similar between the anifrolumab and placebo groups, the most common being increased SLE activity and pneumonia. The most common AESIs were infusion, hypersensitivity, and anaphylactic reactions, which were reported in a greater proportion of the placebo group (5.9%) than in the anifrolumab group (2.0%). No deaths were reported in the anifrolumab 300 mg/day or placebo groups.

In the LTE (Study 1145) through to week 52, the total numbers of patient-years of exposure were 93.4 for the anifrolumab group and 84.3 for the placebo group. A higher proportion of patients in the anifrolumab group (65.7%) received the full course of treatment (13 doses) compared with those in the placebo group (53.5%). A total of 78% of patients (n = 170) experienced an AE, with the most common being nasopharyngitis (14.7%), bronchitis (13.8%), and upper respiratory tract infection (9.2%). A total of 22% (n = 48) of patients had a drug-related TEAE and 22.9% (n = 50) had 1 or more SAEs, with an exposure-adjusted SAE rate of 8.56 per 100 patient-years. The most common SAEs were increased SLE activity and pneumonia, each of which occurred in 2.3% of patients. The death of 1 patient from community-acquired pneumonia was determined by the investigator to be related to treatment. In terms of AESIs, 7 patients (3.2%) had infusion, hypersensitivity, or anaphylactic reactions, and 5 patients (2.3%) had latent tuberculosis. Five patients in Study 1145 had ADA-positive measurements at any time, 3 at baseline only and 2 persistently.

Critical Appraisal

In the MUSE study, a number of factors contributed to bias in favour of anifrolumab or general uncertainty. A higher proportion of patients in the placebo group used an OCS dosage of 10 mg/day or higher at baseline compared with those in the anifrolumab group (62.7% versus 55.6%, respectively). A risk of attrition bias may be present due to the greater number of withdrawals in the placebo group. The decision to classify discontinued patients as nonresponders in the primary analyses may have biased the results in favour of treatment. Furthermore, it was unclear whether the patients who discontinued were different from those who did not. The primary outcome, SRI-4, is a reliable and valid composite measure for disease activity and response in SLE. The primary outcome was measured at 24 and 56 weeks in the MUSE study, which provided data on long-term treatment effects. The clinical expert consulted for this review agreed that a treatment response is expected within 24 weeks. In terms of statistical analyses, multiplicity was not controlled across populations and there was no control for multiplicity in the secondary efficacy outcomes, which increases the likelihood of a type I error.

While baseline demographics of the patients in the MUSE trial were representative of moderately to severely active SLE in Canada, the high dropout rate in the placebo group may have led to patients who are less representative of the recruited population, decreasing the generalizability of the results of the study.

The extension study allowed for the investigation of long-term efficacy and harms. However, the absence of an active comparator limits the ability to draw causal conclusions. Furthermore, the analysis does not take account of the frequency or recurrence of AEs. As a greater proportion of patients in Study 1145 had previously been treated with anifrolumab in the MUSE study, observations based on frequencies of overall AEs in Study 1145 should be interpreted with caution. This could have resulted in a population of patients who were more tolerant of anifrolumab and therefore potentially less likely to experience harms. A high proportion of patients (36.2%) discontinued the study, which can increase the risk of attrition bias in favour of the intervention as patients who do not do well on an intervention tend to withdraw from studies. Although these patients were included in the safety analyses, their characteristics were not reported, making it unclear whether the patients who discontinued were different from those who did not.

Description of Study (TULIP LTE)

The TULIP LTE was a 3-year, double-blind, placebo-controlled study of adult patients who had moderately to severely active SLE at the start of the TULIP-1 and TULIP-2 studies. The TULIP LTE study enrolled patients who had completed the 52-week double-blind treatment period in either of the phase III studies (TULIP-1 or TULIP-2), met all TULIP LTE eligibility criteria, and were willing to continue into the extension study. Patients who received anifrolumab in the TULIP-1 or TULIP-2 trial and entered the LTE remained on anifrolumab. Patients who received placebo and entered the LTE were rerandomized 1:1 to receive either anifrolumab or placebo in the LTE. This resulted in an approximate ratio of 4:1 anifrolumab 300 mg (n = 435; of these, 257 patients treated with anifrolumab 300 mg continued on anifrolumab 300 mg) versus placebo (n = 112) in the LTE study. The primary objective was to characterize long-term safety and tolerability of IV anifrolumab in patients who completed the TULIP-1 or TULIP-2 trial (as measured by AESIs and SAEs, for example). The exploratory objectives were efficacy assessments of overall disease activity (SLEDAI-2K), OCS use, damage accrual (Systemic Lupus International Collaborating Clinics/American College of Rheumatology Damage Index [SDI]) and HRQoL. The LTE study consisted of a 156-week treatment period, after which patients continued in the study for another 8 weeks to complete a 12-week safety follow-up after receiving the last dose of the investigational product.¹³

Efficacy Results

The proportion of patients who achieved a reduction of 4 or more points in the SLEDAI-2K from baseline was consistently higher in the anifrolumab 300 mg group than in the placebo group. In the anifrolumab 300 mg group, 76.1% of patients who reached the week 52 visit and 90.0% of those who reached week 208 had a reduction of 4 or more points, compared with 69.5% and 81.8%, respectively, in the placebo group. In addition, greater improvements were seen from baseline to week 208 across all domains in the anifrolumab group compared to placebo.

In terms of OCS use, for each year of study, the mean OCS standardized area under the curve was lower for the anifrolumab 300 mg group compared to placebo.

In terms of organ damage, overall, 30% to 40% of patients had organ damage (i.e., SDI score ≥ 1), at baseline in the TULIP-1 and TULIP-2 trials. Organ damage remained stable in both groups throughout the LTE; at week 208 the mean SDI score in patients with a baseline SDI score of 1 or higher was 2.1 in the anifrolumab 300 mg and 2.0 in the placebo group.

HRQoL was measured by the SF-36 Version 2 (SF-36v2) and EQ-5D-5L. Larger improvements in HRQoL, as measured by SF-36v2 PCS and MCS response rates, were observed for the anifrolumab 300 mg group compared with patients in the placebo group. In terms of EQ-5D-5L, the improvements in QoL as measured by change from baseline were small but consistently higher for the anifrolumab 300 mg compared to the placebo group throughout the 4 years.

Harms Results

The safety profile for up to 4 years of exposure, including assessment of rare events, remains unchanged. In addition, there was no increase in malignancy, major adverse cardiac events, anaphylaxis, or active tuberculosis. During the 52-week period, 87.5% of patients in the anifrolumab group and 81.3% of patients in the placebo group reported 1 or more TEAEs, the most common being nasopharyngitis, urinary tract infection, upper respiratory tract infection, bronchitis, and headache.

The proportion of patients with 1 or SAEs was similar between the anifrolumab and placebo groups, the most common being infections and infestations. The most common AESI was nonopportunistic infection. Three deaths were reported in the anifrolumab group (1.2%) and 1 death was reported in the placebo group (0.9%). Overall, no new safety signals were identified.

Critical Appraisal

Demographics and baseline characteristics were generally well balanced between groups. At the start of the LTE study, fewer anifrolumab patients were on steroids compared to those on placebo. This may have contributed to bias in terms of reducing OCS use if a greater number of patients in the anifrolumab group were already not using an OCS. Approximately 72% and 68% of eligible anifrolumab and placebo patients, respectively, completing treatment in the predecessor studies (TULIP-1 and TULIP-2) were enrolled in the TULIP LTE. More patients on anifrolumab completed the 3-year extension (66% across all anifrolumab groups versus 48% in placebo). The differential dropout rate may have increased the risk of attrition bias in favour of anifrolumab.

Limitations regarding efficacy and HRQoL outcomes included the lack of formal statistical testing and were exploratory. Although a higher proportion of patients in the anifrolumab group had lower OCS use and improved SLEDAI-2K scores compared to those in placebo group, no firm conclusions can be drawn about the efficacy of anifrolumab and its steroid-sparing effect based on the presented data. Also, the ability to draw conclusions on the effectiveness of anifrolumab in preventing organ damage was limited due to the lack of statistical testing.

While the patient population was considered representative of patients with moderate to severe SLE in Canada, patients enrolled in the TULIP LTE had to have participated in the 52-week double-blind treatment period in 1 of the phase III studies (TULIP-1 or TULIP-2), making this a selective patient population as it included only those who were able to complete the TULIP studies and, while the baseline characteristics of the patients enrolled in the TULIP LTE might not differ from those enrolled in the TULIP-1 or TULIP-2 studies, results from the TULIP LTE cannot be generalized to all patients enrolled in the TULIP-1 and TULIP-2 trials.

Conclusions

The clinical expert consulted by CADTH, and the input received from the clinician groups for this review, indicated that the ideal treatment would have a meaningful impact on overall survival by reducing disease activity, risks of subsequent flares, use of an OCS, risks of AEs, and long-term complications, while inducing remission (low disease activity) and improving HRQoL. Two multinational, sponsored-submitted, double-blind, randomized controlled trials (RCTs), TULIP-1 and TULIP-2, were included in this review, along with 2 additional studies that provided long-term safety data. Results of the 2 pivotal RCTs were inconsistent with each other. In 1 study, anifrolumab statistically significantly reduced disease activity after 52 weeks compared to placebo, as measured by BICLA response. The second study showed no statistically significant difference in responses as measured by SRI-4. While 1 of the studies showed a difference in maintained reduction of OCS dosages to less than 7.5 mg/day and a reduction in cutaneous manifestations of lupus, the other did not. The inconsistent results contribute to uncertainty in forming conclusions regarding the impact of anifrolumab on disease activity, OCS dosage reduction, and CLASI reduction. Despite numerical improvements in symptoms and HRQoL across the included measures, these results were not tested statistically, and the improvements were generally the same between anifrolumab and placebo groups; the impact of anifrolumab on HRQoL is therefore unknown. The duration of the study was not sufficient to study the effects of anifrolumab on organ damage and survival. Data from the included studies do not raise any issues of tolerability or safety, although the extension study was limited by the lack of a control group.

Introduction

Disease Background

Lupus is an autoimmune disease characterized by inflammatory processes that can occur in various tissues and organs of the body.¹^,² Approximately 1 in 1,000 Canadians is afflicted with lupus.² The most common form is SLE.²^,⁵ Estimated incidence rates are 1 to 25 per 100,000 in North America.³ The age of onset is primarily between 16 and 55 years, with females of childbearing age more commonly afflicted than males (9:1).³^,⁴ Additionally, research suggests that people of African descent, in addition to Asian, Hispanic, and Indigenous peoples, are at increased risk for SLE and may exhibit more severe manifestations compared to white counterparts.³^,⁴ The etiology and pathophysiology are unknown.³ Given that lupus affects so many systems, its symptoms can vary greatly from patient to patient. Patients can experience fatigue and joint pain, which can seriously affect ADLs.² The most common manifestations are neurologic, renal, cardiovascular, rash, and a variety of other symptoms. Musculoskeletal (arthritis [e.g., joint involvement] and myositis) and mucocutaneous manifestations (severe skin rashes, hair loss, and ulcers in the oral and nasal cavities) occur in up to 95% and 80% of patients, respectively.¹⁴ The disease has a variable course, and patients can cycle among a chronic state, flares (acute worsening of their condition), and remission.⁵ Long-term organ damage is the main risk factor for mortality and may occur from the disease pathology as well as during periods of low disease activity due to toxicity from treatment. Aside from lupus nephritis, patients with lupus may develop early severe cardiovascular disease and have an increased risk of malignancy. Evidence suggests that SLE progression, organ damage, and death are a chain of events that can only be interrupted by better control of disease activity.⁶ The uncertainty of the disease course affects the HRQoL of patients, many of whom are unable to maintain a job or schooling because of their disease. Patients with SLE are diagnosed and treated primarily by rheumatologists, and in some cases, other specialties such as immunology. Diagnosis typically occurs through the presentation of key clinical manifestations and supporting laboratory tests.

Standards of Therapy

There is currently no long-term cure for SLE.² Instead, SLE is treated with medications that are taken acutely on an as-needed basis, as well as chronically.² Treatment varies from patient to patient and is generally guided by the predominant disease manifestation.³ The main treatments used are antimalarials, immunosuppressants, corticosteroids, and nonsteroidal anti-inflammatory drugs (NSAIDs). First-line chronically administered drugs are antimalarials, such as hydroxychloroquine, that interfere with intracellular toll-like receptor signalling. Given that SLE is an autoimmune disorder, immunosuppressants also play an important role, and a variety are used (e.g., methotrexate, azathioprine, mycophenolate, and cyclosporine). These drugs are all approved for other conditions and are used off-label for lupus. Immunosuppressants are well known for their toxic effects, such as serious infections (e.g., respiratory tract, urinary tract, and skin) and certain malignancies, and therefore present significant tolerability issues for patients. Opportunistic infections such as salmonella and herpes zoster are also common in SLE, given the altered immune status brought on by immunosuppressive and steroidal medications.¹⁵ OCS treatments are used to reduce pain and inflammation by decreasing the activity of overactive white blood cells. Prolonged and/or high doses of an OCS, namely prednisone, are also well known for toxic effects such as osteoporosis, psychiatric issues, cataracts, glaucoma, diabetes, hypertension, and many others, particularly when used chronically. Although they are relied upon for flares, the chronic use of an OCS is avoided as much as possible. B-lymphocyte–depleting therapies, such as belimumab and (off-label) rituximab, are also used in SLE given that B lymphocytes play a pivotal role in SLE. Belimumab is the only biologic approved for use in Canada, while rituximab is used off-label as a short-term treatment for acute flares (i.e., it is not suited for chronic management).¹⁶

The most important treatment goals are to minimize damage to major organs, most commonly the kidneys, prevent premature death; reduce symptom severity; improve HRQoL, and maintain independence and ADLs, such as employment.

Drug

Anifrolumab is a human immunoglobulin G1 kappa monoclonal antibody that binds to IFNAR1, blocking the activity of type I interferons such as interferon-alpha and interferon-beta.⁸ Anifrolumab also induces the internalization of IFNAR1, thereby reducing the number of receptors available for binding and reducing inflammation and immunological processes.⁸ Type I interferons play an important role in the pathogenesis of SLE.⁸ Approximately 60% to 80% of adult SLE patients have high levels of type I interferon–inducible genes, which are associated with increased disease activity and severity.⁸

Anifrolumab is indicated in addition to standard therapy for the treatment of adult patients with active, autoantibody-positive SLE.⁸ The Health Canada–recommended dose is 300 mg, administered as an IV infusion over a 30-minute period, every 4 weeks. The Health Canada–approved product monograph also states the infusion rate may be slowed or interrupted if the patient develops an infusion reaction. In the event of a serious infusion-related or hypersensitivity reaction (e.g., anaphylaxis), treatment should be discontinued immediately, and appropriate therapy should be administered. The sponsor-requested reimbursement indication for anifrolumab differs from the Health Canada indication. The sponsor’s reimbursement request is for anifrolumab in addition to standard therapy for patients with moderate to severe SLE (based on an SLEDAI-2K score ≥ 6), whose disease activity cannot be controlled despite an OCS dosage of 10 mg/day or higher of prednisone or its equivalent.

Anifrolumab was approved by the FDA on July 30, 2021, for the treatment of adult patients with moderate to severe SLE who are receiving standard therapy. It is currently under review by the National Institute for Health and Care Excellence and is authorized by the European Medicines Agency, the Therapeutics Goods Administration in Australia, and the Medicines and Health care products Regulatory Agency in the UK. Anifrolumab has not been reviewed previously by CADTH for any other indication.

Key characteristics of the biologic drugs used in the treatment of SLE are presented in Table 3.

Table 3

Key Characteristics of Anifrolumab, Belimumab, and Rituximab.

Stakeholder Perspectives

Patient Group Input

The information in this section is a summary of input provided by the patient groups who responded to CADTH’s call for patient input and from a clinical expert consulted by CADTH for the purposes of this review.

Patient Input

Four responses to CADTH’s call for patient input for the anifrolumab submission were received. These consisted of submissions from ACE, Lupus Canada, Lupus Ontario, and a cooperative submission from the Canadian Arthritis Patient Alliance, the Arthritis Society, and the Canadian Skin Patient Alliance. Patient input was gathered from 148 lupus patients across Canada, including 34 respondents (88% female) from ACE, 112 (96.4% female) from Lupus Canada, and 2 respondents with SLE from Lupus Ontario. The cooperative submission conducted a focus group of 10 patients (90% female) with SLE. ACE also conducted an in-depth interview with 1 patient. None of the patients in the included submissions had experience with the treatment under review.

Lupus was described as a chronic disease characterized by inflammation in 1 or more parts of the body. Those with lupus often experience flares — unpredictable bouts of increased disease activity resulting in symptoms such as debilitating fatigue, pain in muscles and joints, difficulty breathing, or persistent headaches. Respondents reported challenges in managing the physical symptoms of lupus, which can be severe and debilitating, particularly during disease episodes or flares. Treatments described in the submissions as those used to manage SLE include NSAIDs, antimalarial medications (hydroxychloroquine and chloroquine), corticosteroids, immunomodulation drugs (methotrexate, azathioprine, mycophenolate mofetil, and cyclophosphamide), rituximab, belimumab, and over-the-counter pain medications. Respondents indicated that current treatments are difficult to tolerate because of their many side effects, such as headaches, brain fog, additional fatigue, frequent infections, osteoporosis, gastric issues, insomnia, hair loss, weight gain or loss, and mood swings, allergic reactions, nausea, anxiety, and tremors, as well as concerns about organ damage.

According to the patient input received, respondents reported that they expect the following key outcomes from any new drug or treatment: reduction of side effects from medications such as weight gain; reduction in fatigue, joint and muscle pain, flares, rash and skin irritations, headaches, and brain fog; reduction in the number of medications used; increased lifespan; overall improvement in QoL; ability to engage in ADLs and social roles, improvement in sleep patterns; increased mobility and participation in physical activities; improvement in joint mobility; and improvement in tolerance to UV light.

Clinician Input

Input From Clinical Experts Consulted by CADTH

All CADTH review teams include at least 1 clinical specialist with expertise in the diagnosis and management of the condition for which the drug is indicated. Clinical experts are a critical part of the review team and are involved in all phases of the review process (e.g., providing guidance on the development of the review protocol, assisting in the critical appraisal of clinical evidence, interpreting the clinical relevance of the results, and providing guidance on the potential place in therapy). The following input was provided by 1 clinical specialist with expertise in the diagnosis and management of active, autoantibody-positive SLE.

Unmet Needs

According to the clinical expert, the major limitations of current treatments are the side effects of prednisone and immunosuppressants. Other unmet needs include nonresponse, noncompliance due to dosing schedules, polypharmacy, long-term organ damage, and recurrent flares that cause progressive organ damage (e.g., renal failure). Approximately 60% to 70% of patients do not have a positive long-term response to therapy without intermittent or continuous use of corticosteroids. This is a significant limitation due to the high burden of the side effects of this class of drugs. Patients are also frequently reluctant to increase corticosteroid doses during flares due to their awareness of these side effects. Nonadherence to therapy is a significant issue due to the serious consequences of flares, such as renal failure. No therapies provide a long-term cure or long-term medication-free survival in a majority of patients, and no therapies specifically address the underlying disease mechanisms in all patients.

Place in Therapy

The clinical expert noted that anifrolumab could lead to a paradigm shift, given its novel mechanism for treating SLE and preventing cytokine-induced inflammation. Anifrolumab would be used in combination with other treatments and potentially early in the disease course to control the disease with reduced side effects compared with standard of care. It is the clinical expert’s opinion that patients should begin treatment with antimalarials such as hydroxychloroquine and an OCS (e.g., prednisone) until nonresponse, toxicity, or prednisone dependency, at which point anifrolumab can be initiated. For patients with major organ involvement, anifrolumab can be offered after failure of standard of care to induce or maintain remission off prednisone with the use of at least an immunosuppressive drug plus hydroxychloroquine (if tolerant). The clinical expert added that anifrolumab treatment would likely assist patients for whom compliance with treatment is an issue.

Patient Population

Patients most suitable for treatment with anifrolumab would be those with active disease such active skin disease or polyarthritis because they are more likely to respond. The current therapy has not been studied in patients with severe nephritis or CNS disease and the clinical expert indicated that anifrolumab would not be considered standard of care in patients with these diseases until there is further evidence. Presymptomatic patients, or those who are not diagnosed with active skin disease or polyarthritis, should not be considered for treatment with anifrolumab until further evidence is available.

Patients with active disease are diagnosed based on their history, physical testing, and routine SLE lab testing such as antinuclear antibody tests. The clinical expert noted that patients diagnosed with active diseases are most likely to exhibit a response to treatment with anifrolumab regardless of previous treatments, such as standard of care and/or failure to successfully taper prednisone. The clinical expert also indicated that there are no issues related to diagnosis. However, active disease may be underdiagnosed if an SLE expert is not reviewing the patient.

Assessing Response to Treatment

According to the clinical expert, a clinically meaningful response to anifrolumab would be a meaningful reduction in disease activity as measured by clinical and laboratory outcomes. However, because each patient has target organ(s) for treatment, it is impossible to classify magnitudes of response to the treatment. The alternative is to monitor specific signs and assess symptoms to determine the response to treatment. Other indications of a clinically meaningful response include improvement of ADLs, stabilization of signs and symptoms, tapering steroid use, fatigue, and pain, which are important and significant issues for patients diagnosed with active disease. The clinical expert indicated that tapering steroid use without causing a disease flare is indicative of a positive response to treatment. Treatment response should be assessed every 2 to 3 months, generally.

Discontinuing Treatment

The decision to discontinue treatment should be based on an assessment of the treatment response. Specifically, treatment should be discontinued if there is failure to taper prednisone after 4 to 6 months of therapy; a prolonged increase in prednisone (greater than 3 months); disease flare after 3 to 6 months of remission; a lack of response to a short-term increase in prednisone (approximately 3 months); a life-threatening infection; or a severe infusion reaction that is unresponsive to conventional therapy and/or prophylaxis.

Prescribing Conditions

Rheumatologists should prescribe anifrolumab for patients, and if no local rheumatologist is available, another health care specialist may administer the drug after consulting a rheumatologist. An infusion centre is an appropriate setting for administering anifrolumab. Although no diagnostic test is required, a confirmed SLE diagnosis meeting the criteria outlined previously would be needed to permit treatment with anifrolumab.

Additional Considerations

The clinical expert indicated that there is a considerable need for new medications to decrease the side effects of current therapies, and the dependence on prednisone in particular. Prednisone not only has significant side effects, such as osteonecrosis, vertebral collapse, and cataracts, but significant psychological side effects that can affect all facets of life.

Clinician Group Input

Twenty clinicians representing the following 2 clinician groups provided input for this review: CaNIOS and the Toronto Lupus Program at the University of Toronto.

CaNIOS is a not-for-profit, group of Canadian clinicians and researchers in Ontario, Alberta, Manitoba, and Nova Scotia. Their overarching mission statement is to facilitate the care of Canadian lupus patients and to improve the outcome of lupus patients across Canada through collaborative research. CaNIOS members provide care for more than 4,000 SLE patients collectively.

The Toronto Lupus Program is lupus clinic that promotes expert care for patients with lupus, trains future rheumatologists, and facilitates research into the disease. More than 1,300 patients are registered in the lupus clinic, making it 1 of the largest centres for specialized lupus care and research internationally. Patients are referred to the clinic from all areas of Ontario.

Unmet Needs

According to clinician groups, existing standard-of-care treatment has failed to adequately control SLE disease activity. Patients with SLE have a higher mortality rate, particularly in the first 3 decades after diagnosis. SLE has a profound effect on HRQoL and is a significant cause of loss of work productivity, sick leave, and physical disability. SLE and its treatment, particularly steroids, lead to significant irreversible damage in multiple organs. The immunosuppressive drugs currently in use frequently fail to induce a complete remission or do so only after prolonged exposure. Recurrent flares are common and result in significant organ damage over the total disease duration, requiring prolonged use of immunosuppressive drugs. Newer medications that help induce remission more quickly and prevent flares are urgently needed.

SLE is associated with onerous health care costs, and no immunosuppressive treatments are currently available through special access programs. Current SLE treatment continues to rely heavily on steroids, which are major drivers of organ damage, increasing the burden on the health care system. Almost 80% of lupus patients exhibit a relapsing-remitting or persistent active disease course requiring large and chronic doses of steroids. Cohort studies have clearly demonstrated the failure of the current standard-of-care treatment to maintain remission in SLE patients. Often remission is induced by steroids and fails upon tapering the steroid dose. Aggressive use of steroids, along with the currently available immunosuppressants (methotrexate, azathioprine, cyclosporin, and mycophenolate mofetil), have been associated with recurrent infections in SLE patients, requiring multiple hospital admissions and imposing a significant health care burden. The lack of effective treatment has also been the culprit for multiple hospital admissions in many patients with SLE.

Place in Therapy

Anifrolumab employs a mechanism of action that targets the interferon pathway, which is central in lupus pathogenesis. The active interferon pathway characterizes 60% to 80% of patients with SLE. According to the clinician groups, anifrolumab is expected to cause a shift in the current treatment paradigm as its unique mechanism of action renders it most suitable for patients with unmet needs, including the subpopulation of patients with serologically active disease, frequent flares, and “steroid dependence.” The goal of treatment with anifrolumab should be the reduction of the daily prednisone dosage to below 7.5 mg/day in the first 12 months of treatment or a reduction by 50% of the initial (baseline) dose.

Based on current knowledge, anifrolumab should be used as an add-on treatment in combination with pre-existing drugs, antimalarials, glucocorticoids, and immunosuppressives. Specifically, it should be used in refractory cases in which treatment goals have not been achieved in a reasonable time. Clinician groups agree that it is reasonable to expect a meaningful impact on disease activity with anifrolumab for multiple organ systems, not just musculoskeletal and mucocutaneous systems. When a combination of antimalarials, low-dose glucocorticoids, and immunosuppressive therapy is not effective, or other factors (e.g., intolerance) are prohibitive, anifrolumab should be offered.

Patient Population

SLE affects more than 1 in every 1,000 Canadians, primarily women of childbearing age (the female-to-male patient ratio is 9:1), typically presenting between the ages of 14 and 45 years. People of different ethnicities can develop SLE but those of African descent and Hispanic and Indigenous populations are affected much more often compared with their white counterparts.

According to CaNIOS, patients with the greatest unmet need are those who have not reached remission within 3 to 6 months of initiating standard of care; patients dependent on steroids (e.g., those who cannot withdraw or reduce their daily prednisone dose to below 7.5 mg/day); patients who experience frequent flares from any organ or system; and patients for whom adherence is a major factor in treatment failure. These patients represent approximately 10% to 20% of the general SLE population. A significant proportion of such refractory patients can be expected to respond in the first 12 months. Patients who experience frequent flares (> 1 per year for more than 2 or 3 years) from any organ or system are the most likely to have an activated “interferon signature” as demonstrated by recent studies. In such patients, CaNIOS would recommend anifrolumab as an add-on to existing therapies with the goal of reducing the frequency and intensity of flares and optimize prognoses.

Patients best suited for treatment with anifrolumab include those intolerant to standard-of-care medication or who have failed this therapy, those who experience frequent flares from any organ system, and patients who are steroid-dependent. For patients without private access, there are currently no available options after treatment failure. Steroid-dependent patients will also incur significant costs to the health system. Anifrolumab should become available to such patients through public access. This is not a significant departure from current practice but addresses the management of patients who are refractory to current therapies. There should be an opportunity to treat patients with frequent flares or steroid dependency in a subsequent line of therapy.

Patients who are least suitable are sustained in remission under antimalarials alone or in combination with immunosuppressives and low-dose prednisone (< 7.5 mg/day) Patients may be identified by a physician with SLE expertise and assessed before receiving anifrolumab. SLE diagnosis can be challenging and may evade detection for years in cases that are nonspecific or involve spontaneously remitting symptoms. Most required diagnostic tests are available in Canada through hospital- or community-based laboratories. Underdiagnosis may occur, particularly in mild cases. Patients may be diagnosed using clinical as well as laboratory criteria. Serologic activity (increased anti–double-stranded DNA [anti-dsDNA] and/or decreased complement C3 and/or C4 proteins) can be assessed in most hospital- and community-based labs, and these tests are widely available in Canada.

Assessing Response to Treatment

According to the clinician groups, the outcomes used to determine response to treatment in academic centres are similar to those used in most clinical trials. These include structured indices such as the SLEDAI-2K and the British Isles Lupus Assessment Group (BILAG). Both indices assess a variety of manifestations from various organs and systems as well as laboratory parameters relevant to lupus activity. Other measures include the PGA, which relies on the physician’s impression as expressed on a standardized scale. Other outcomes include the decrease in the daily prednisone dose and the delay in damage accumulation, as well as the normalization of serologic activity.

A clinically meaningful response to treatment should include any of the following: reduction in the severity and frequency of symptoms (disease activity) as reflected by a SLEDAI-2K and/or BILAG score, reduction of the daily prednisone dosage to less than 7.5 mg/day, and a reduction of the frequency and intensity of flares.

These outcomes will lead to a significant improvement of the patients’ prognosis. Response to treatment should be assessed every 4 months. Sufficient time for outcomes to be observed would be at least 12 months.

Discontinuing Treatment

Treatment should be discontinued immediately in cases of allergy and/or intolerance and after 12 months if no response is demonstrated, if the daily prednisone dose exceeds 7.5 mg (or more than 50% from baseline) in steroid-dependent patients, and if severe flares requiring treatment escalation (particularly with glucocorticoids and/or immunosuppressives) continue to occur in patients with frequent flares.

Prescribing Conditions

Hospital and specialty infusion clinics with experience in the IV administration of biologic drugs are the most appropriate settings for anifrolumab infusion. Physicians with expertise in the management and treatment of patients with SLE would be required to monitor patients treated with anifrolumab.

Additional Considerations

The ideal treatment would have a meaningful impact on overall survival by reducing disease activity, risk of subsequent flares, use of an OCS, risk of AEs, and long-term complications, while inducing remission (low disease activity), and improving QoL.

Drug Program Input

The drug programs provide input on each drug being reviewed through CADTH’s reimbursement review processes by identifying issues that may affect their ability to implement a recommendation. The implementation questions and corresponding responses from the clinical experts consulted by CADTH are summarized in Table 4.

Table 4

Summary of Drug Plan Input and Clinical Expert Response.

Clinical Evidence

The clinical evidence included in the review of anifrolumab is presented in 2 sections. The first section, the systematic review, includes pivotal studies provided in the sponsor’s submission to CADTH and Health Canada, as well as those studies that were selected according to an a priori protocol. The second section includes sponsor-submitted LTE studies and additional relevant studies that were considered to address important gaps in the evidence included in the systematic review.

Systematic Review (Pivotal and Protocol-Selected Studies)

Objectives

To perform a systematic review of the beneficial and harmful effects of anifrolumab 300 mg, administered as an IV infusion, in addition to standard therapy for the treatment of adult patients with active, autoantibody-positive SLE.

Methods

Studies selected for inclusion in the systematic review included pivotal studies provided in the sponsor’s submission to CADTH and Health Canada, as well as those meeting the selection criteria presented in Table 5. Outcomes included in the CADTH review protocol reflect those considered to be important to patients, clinicians, and drug plans.

Table 5

Inclusion Criteria for the Systematic Review.

The literature search was performed by an information specialist using a peer-reviewed search strategy. The literature search for clinical studies was performed by an information specialist using a peer-reviewed search strategy according to the PRESS Peer Review of Electronic Search Strategies checklist.¹⁷

Published literature was identified by searching the following bibliographic databases: MEDLINE All (1946—) via Ovid and Embase (1974—) via Ovid. All Ovid searches were run simultaneously as a multifile search. Duplicates were removed using Ovid deduplication for multifile searches, followed by manual deduplication in EndNote. The search strategy comprised both controlled vocabulary, such as the National Library of Medicine’s MeSH (Medical Patient Headings), and keywords. The main search concept was Saphnelo (anifrolumab). Clinical trials registries searched included the US National Institutes of Health’s clinicaltrials.gov, WHO’s International Clinical Trials Registry Platform (ICTRP) search portal, Health Canada’s Clinical Trials Database, and the European Union Clinical Trials Register.

No filters were applied to limit the retrieval by study type. Retrieval was not limited by publication date or by language. Conference abstracts were excluded from the search results. Appendix 1 provides detailed search strategies.

The initial search was completed on March 1, 2022. Regular alerts updated the search until the meeting of the CADTH Canadian Drug Expert Committee on June 22, 2022.

Grey literature (literature that is not commercially published) was identified by searching relevant websites from the Grey Matters: A Practical Tool For Searching Health-Related Grey Literature checklist.¹⁸ Included in this search were the websites of regulatory agencies (US FDA and European Medicines Agency). Google was used to search for additional internet-based materials. Appendix 1 provides more information on the grey literature search strategy.

In addition, the sponsor of the drug was contacted for information regarding unpublished studies.

Two CADTH clinical reviewers independently selected studies for inclusion in the review based on titles and abstracts, according to the predetermined protocol. Full-text articles of all citations considered potentially relevant by at least 1 reviewer were acquired. Reviewers independently made the final selection of studies to be included in the review, and differences were resolved through discussion.

Findings from the Literature

Two reports were identified from the literature for inclusion in the systematic review (Figure 1). The included studies are summarized in Table 6. A list of excluded studies is presented in Appendix 2.

115 citations were identified in the literature search, of which 2 were considered potentially relevant reports. Six potentially relevant reports were identified from other sources. Of these, 6 full-text reports were retrieved for scrutiny. In total, 4 reports of 2 studies were included in the CADTH review (Figure 1).

Figure 1

Flow Diagram for Inclusion and Exclusion of Studies.

Table 6

Details of Included Studies.

Description of Studies

Two sponsor-submitted trials, TULIP-1 and TULIP-2, were included in this review. The TULIP-1 trial (123 sites in 18 countries) and the TULIP-2 trial (119 sites in 16 countries) are phase III, multicentre, randomized, double-blind, placebo-controlled studies evaluating the efficacy and safety of anifrolumab in adult patients (aged 18 to 70 years) with moderate to severe autoantibody-positive SLE while receiving standard-of-care treatment. Patients in both trials had severe to moderate disease, with a SLEDAI-2K score of 6 points or more; severe disease activity in 1 or more organs or moderate activity in 2 or more organs as measured by BILAG-2004 organ domain scores of 1 or more A items or 2 or more B items; and a PGA score of 1 or more. Patients continued to receive their existing SLE therapy, consisting of either 1 or any combination of OCSs, antimalarials, and/or immunosuppressants at baseline, with the exception of an OCS (prednisone or equivalent) with tapering as part of the protocol. The primary objective was to evaluate the effect of anifrolumab 300 mg compared to placebo on disease activity as measured by the difference in the proportion of patients who achieve an SRI-4 at week 52 for TULIP-1 or BICLA response at week 52 in TULIP-2. In the TULIP-1 trial, the key secondary objectives were to evaluate the effect of anifrolumab 300 mg compared to placebo on the following:

the proportion of patients with SRI-4 at week 52 in the type I interferon gene signature test high subgroup
the proportion of patients who achieved an OCS dosage of no more than 7.5 mg/day at week 40, which was maintained through week 52 in the subgroup of patients with baseline OCS dosage of 10 mg/day or higher
the proportion of patients with a 50% or greater reduction in CLASI activity score at week 12 in the subgroup of patients with baseline CLASI activity score of 10 or higher (moderate to severe disease)
the proportion of patients with SRI-4 at week 24
the annualized flare rate through 52 weeks.

The key secondary objectives in the TULIP-2 trial were the same as TULIP-1, with the addition of following objectives:

the proportion of patients with a BICLA response at week 52 in the type I interferon gene signature test high subgroup
the proportion of patients with a 50% or greater reduction in joint counts at week 52 in the subgroup of patients with at least 6 swollen and at least 6 tender joints at baseline.

The switching of the primary end point was based on the TULIP-1 and MUSE study results, which demonstrated that the BICLA had produced consistent results across time. This switch took place after data collection for the TULIP-2 trial was completed but before the unblinding of the results at week 52. Other major reasons for this switch included the ability of the BICLA to capture both partial and complete improvements; its required improvement in all organ systems affected at baseline; and equal weighting applied to all organs in its scoring.

While there was some variance between trials in terms of the participating countries, the majority of sites in both trials were based in the US and Europe, with no Canadian sites in the TULIP-1 trial and 2 Canadian sites in the TULIP-2 trial. Enrolment took place June 9, 2015, to June 16, 2017, for the TULIP-1 trial and July 9, 2015, to September 27, 2018, for the TULIP-2 trial. Both trials included a screening period of up to 30 days to confirm eligibility of patients and a 52-week double-blind treatment period. At week 52, patients either continued the study for another 8 weeks to complete a 12-week safety follow-up after the last dose of the investigational product (given at week 48) or, if eligible, enrolled in a separate LTE study (described in the Other Relevant Evidence section). The total study duration could be up to approximately 64 weeks (including screening period) for patients who did not enrol in the LTE study and up to approximately 56 weeks (including screening period) for those patients who enrolled in the LTE study.

A total of 457 eligible patients in the TULIP-1 trial were block-randomized in a 1:2:2 ratio to receive a fixed IV dose of 150 mg anifrolumab (N = 92), 300 mg anifrolumab (N = 180), or placebo (N = 184). This CADTH review focuses only on the 300 mg anifrolumab and placebo groups, as the 150 mg anifrolumab dose was not part of the requested reimbursement criteria to CADTH and not approved in the Health Canada Notice of Compliance and is therefore beyond the scope of this review. A total of 365 eligible patients in the TULIP-2 trial were block-randomized in a 1:1 ratio to receive a fixed IV dose of anifrolumab 300 mg (N = 180) or placebo (N = 182). Patients in both trials were stratified by disease severity (SLEDAI-2K score < 10 points versus ≥ 10 points), OCS dose at baseline (< 10 mg/day versus ≥ 10 mg/day prednisone or equivalent), and results of the type I interferon gene signature test (high versus low).

Patients in both trials received the investigational product every 4 weeks for a total of 13 doses (week 0 to week 48), with the primary end point evaluated at the week 52 visit (Figure 2 and Figure 3 depict study designs of the TULIP-1 and TULIP-1 studies, respectively). At the time of randomization, patients were taking either 1 or any combination of an OCS, antimalarial, and/or immunosuppressant. From week 0 (day 1) to week 12, patients were permitted only 1 burst of corticosteroids for an increase in SLE disease activity or to control non–SLE-related disease (e.g., asthma). Patients receiving more than 1 burst during the first 12 weeks of treatment were considered nonresponders for subsequent assessments of disease activity, regardless of the reason for the burst (SLE or non-SLE activity). Patients treated with concomitant medications beyond the protocol-allowed threshold (restricted medications) or who prematurely discontinued the investigational product were also considered nonresponders for any binary efficacy outcomes. Steroid tapering to a target OCS dose of no more than 7.5 mg/day was required to be attempted in all patients with a baseline OCS dose of 10 mg/day or higher. This commenced at week 8 and continued stepwise until the target dose was reached, except in the event of disease worsening as defined by changes to the SLEDAI-2K, CLASI, and number of active and/or swollen joints. Tapering the OCS dose beyond the target of 7.5 mg/day up to week 40 was permitted based on disease activity. Steroid tapering was not permitted after week 40.

The study was unblinded upon database lock after the last patient last visit. The analyses included all data captured during the study, regardless of whether the study treatment was prematurely discontinued, or delayed, and/or irrespective of protocol adherence. In both trials, the database lock occurred when the last patient reached the week-52 visit, at which point all available data were extracted, cleaned, coded, validated, and unblinded. Both trials were sponsored by AstraZeneca Inc.

A diagram detailing the different phases of the TULIP-1 trial including the screening, stratification, randomization, treatment period from 0 to 52 weeks, the different treatment arms (anifrolumab 150mg versus anifrolumab 300mg versus placebo IV), steroid tapering, study follow-up, and long-term extension study portion.

Figure 2

Flow Chart of TULIP-1 Study Design.

TULIP-1 trial included a screening period of up to 30 days to confirm eligibility of patients, after which patients were randomized in to receive either a fixed IV dose of 150 mg anifrolumab, 300 mg anifrolumab, or placebo. The double-blind treatment period was 52-week. At week 52, patients either continued the study for another 8 weeks to complete a 12-week safety follow-up after the last dose of the investigational product (given at week 48) or, if eligible, enrolled in a separate LTE study. The total study duration could be up to approximately 64 weeks (including screening period) for patients who did not enroll in the LTE study and up to approximately 56 weeks (including screening period) for those patients who enrolled in the LTE study.

A diagram detailing the different phases of the TULIP-2 trial including the screening, stratification, randomization, treatment period from 0 to 52 weeks, the different treatment arms (anifrolumab 300mg versus placebo IV), steroid tapering, study follow-up, and long-term extension study portion.

Figure 3

Flow Chart of TULIP-2 Study Design.

TULIP-2 trial included a screening period of up to 30 days to confirm eligibility of patients, after which patients were randomized in to receive a fixed IV dose of 300 mg anifrolumab or placebo. The double-blind treatment period was 52-week. At week 52, patients either continued the study for another 8 weeks to complete a 12-week safety follow-up after the last dose of the investigational product (given at week 48) or, if eligible, enrolled in a separate LTE study. The total study duration could be up to approximately 64 weeks (including screening period) for patients who did not enroll in the LTE study and up to approximately 56 weeks (including screening period) for those patients who enrolled in the LTE study.

Amendments and Protocol Deviations

The study protocol of the TULIP-1 trial was amended 3 times and the TULIP-2 trial was amended 5 times after start of patient recruitment.

TULIP-1 amendments included:

Amendment 1 (April 9, 2015) — no substantial changes made
Amendment 2 (February 1, 2016) — the addition of HIV testing at screening
Amendment 3 (March 23, 2016) — updates to restricted medications washout periods to provide additional clarification; the washout periods for anakinra, apremilast, atacicept, belimumab, and blisibimod (AMG 623) were corrected; the order of restricted medications was made alphabetical
Amendment 4 (May 18, 2016) — clarification to the study design and the inclusion and exclusion criteria from Amendments 2 and 3 that were not incorporated.

TULIP-2 amendments included:

Amendment 5 (May 23, 2019) — BICLA response at week 52 replaced SRI-4 as the primary end point and 2 key secondary end points were updated: SRI-4 response at week 52 in the interferon-test high only subpopulation was replaced with the BICLA response; and SRI-4 at week 24 was replaced with an organ-specific assessment of joints. In addition, the statistical methodology regarding analysis of the primary and key secondary end points, the testing strategy, and power estimation were updated. The reason for the amendments were to better measure the efficacy of anifrolumab and inform clinicians about the specific effect of anifrolumab on joint disease. The use of prescription and nonprescription NSAIDs in the Japanese population was clarified and the modified BILAG-2004 disease activity scoring was added.

Two sites were closed over the course of both trials (1 site per trial) due to noncompliance with protocol procedures and specifications (i.e., blinding plan). Data from these sites were not included in analyses or summaries.

Populations

Inclusion and Exclusion Criteria

Key inclusions and exclusion criteria are listed in Table 6. The inclusion and exclusion criteria for the TULIP-1 and TULIP-2 trials were the same. Patients in both trials were 18 years of age or older, had severe to moderate disease, with a SLEDAI-2K score of 6 points or more; severe disease activity in 1 or more organs or moderate activity in 2 or more organs as measured by BILAG-2004 organ domain scores of 1 or more A items or 2 or more B items; and a PGA score of 1 or more. Patients continued to receive their existing SLE therapy, consisting of either 1 or any combination of OCS, antimalarials, and/or immunosuppressants at baseline, with the exception of OCS (prednisone or equivalent) for which tapering was part of the protocol.

Baseline Characteristics

Key demographic and disease baseline characteristics are presented in Table 7 for both trials. Baseline patient characteristics, including age, race, sex, height, weight, and body mass index were balanced between groups and were similar between the 2 studies. The TULIP-1 and TULIP-2 trials had median ages of 41 and 43 years, respectively, and were predominantly female (92.3% and 93.4%) and white (71.3% and 59.9%). The TULIP-2 trial had a greater percentage of Asian patients (60 of 362 [16.6%] versus 16 of 384 [4.4%]) and more missing data on race (4.4%) compared to the TULIP-1 trial (0). The proportion of patients aged older than 65 years was 6.1% in the anifrolumab 300 mg group, and 3.3% in the placebo group in the TULIP-1 trial, and 2.8% and 0.5% in the treatment and placebo groups of the TULIP-2 trial, respectively. In both trials, the largest proportions of patients were enrolled in the US (40.7% in the TULIP-1 trial and 36.5% in TULIP-2) and Europe (37.9% in the TULIP-1 trial and 26.8% in TULIP-2). In the TULIP-1 trial there were no Canadian sites, but in the TULIP-2 trial, 2 Canadian sites were added. Within each geographic region, the proportions of patients were generally balanced across treatment groups. The majority of patients had a high results on a type I interferon gene test (approximately 82% across groups and studies).

SLE characteristics (SLEDAI-2K, BILAG-2004, PGA scores, CLASI, and joint counts) were balanced between treatment groups and were similar between studies. The median time from initial SLE diagnosis to randomization was highest in patients in the treatment group of the TULIP-2 trial (mean = 130.2 months; SD = 109.28). Cushingoid features were higher in the TULIP-1 trial compared to TULIP-2 (39% versus 26%). There was a slightly higher number of patients with an OCS dose of 10 mg or higher at baseline in the TULIP-1 trial (56.3%) compared with the TULIP-2 trial (47%). Overall previous medication use at baseline was balanced between groups and between studies.

Table 7

Summary of Baseline Characteristics of Included Trials (Full Analysis Set).

Interventions

Both pivotal trials were similar in terms of design (e.g., blinding, randomization, and drug administration procedures). Block randomization using an interactive voice or web response system was used to randomize patients in a 2:2 ratio (TULIP-1) or a 1:1 ratio (TULIP-2) to receive a fixed IV dose of 300 mg anifrolumab or placebo. The investigational products, anifrolumab 300 mg or placebo, were administered via a controlled IV infusion pump into a peripheral vein over at least 30 minutes every 4 weeks. The preparation of anifrolumab and placebo was performed by an unblinded qualified person (e.g., study nurse or pharmacist) at the site. When diluted, anifrolumab and placebo appeared identical and were administered by blinded study-site personnel. There was no mention of allowable dose reductions, interruptions, or delays for tolerability from the sponsor.

In addition to the investigational product, all patients were receiving standard-of-care treatment at the start of the study in concordance with European Alliance of Associations for Rheumatology and ACR management guidelines. Permitted medications included OCS, intramuscular and intra-articular corticosteroids, antimalarial medication, immunosuppressants, prescription and nonprescription NSAIDs, acetaminophen, low-dose Aspirin, and topical therapy. Patients were allowed to adjust their concomitant medication use under certain circumstances (described in the Concomitant Medications section).

For both trials, the total study duration could be up to approximately 64 weeks for patients who were not enrolled in the LTE study (including screening period) and up to approximately 56 weeks (including screening period) for those patients who were enrolled in the LTE study. Exposure to treatment was defined as the number of days between the start and the end dates of administration of the investigational product plus the dosing frequency time: duration of exposure (days) = (last dosing date + 28 days) – first dosing date + 1.The total number of patient-years of exposure was the sum of duration of exposure (in days) of all patients in the respective treatment group divided by 365.25 (days/year).

Database lock and unblinding occurred after the last patient completed week 52 (visit 14 or early discontinuation visit) in both trials. Blinding of patients and investigators was maintained after the database lock at week 52 until the last patient visit in the LTE study.

Steroid Burst and Tapering

In both trials, from baseline to week 12, patients were allowed to receive only 1 burst of corticosteroids for an increase in SLE disease activity or to control non–SLE-related disease (e.g., asthma or chronic obstructive pulmonary disease exacerbation). Patients receiving more than 1 burst during the first 12 weeks of treatment were considered nonresponders for subsequent assessments of disease activity, regardless of the reason for the burst (SLE or non-SLE activity). Beginning at week 8, tapering to a target OCS dosage of no more than 7.5 mg/day was attempted in all patients with a baseline OCS dosage of 10 mg/day or higher. Tapering continued stepwise until the target was reached, unless at least 1 of the following criteria were met:

SLEDAI-2K activity that worsened compared to baseline in major organ systems (renal, CNS, cardiopulmonary, vasculitis, fever, thrombocytopenia, or hemolytic anemia, or gastrointestinal activity)
newly-affected organ system(s) based on the SLEDAI-2K, excluding serological abnormalities (double-stranded DNA antibodies, hypocomplementemia)
moderate to severe skin disease as reflected by a CLASI activity score of 10 or higher
moderate to severe arthritis disease as reflected by an active joint count of at least 8 tender and/or swollen joints.

Investigators had the option to continue tapering the OCS dosage beyond the target of 7.5 mg/day up to week 40 based on disease activity. If a patient had an increase in disease activity secondary to OCS tapering, the dose could be increased up to a maximum of the baseline OCS therapy dose from week 8 up to week 40 without the patient being considered a nonresponder for subsequent assessments of disease activity. Patients who required an OCS dose above their baseline level could continue in the study but were considered nonresponders for subsequent assessments of disease activity.

Concomitant Medications

All patients in both trials received at least 1 concomitant medication, including SLE-related treatment. Within the TULIP-1 trial the most common concomitant medications were:

hormonal preparations (excluding sex hormones) (92.3%)
alimentary tract and metabolism medications (79.9%)
antiparasitic products, insecticides, and repellents (74.0%)
nervous system medications (63.0%).

Within the TULIP-2 trial, the most common concomitant medications were:

systemic hormonal preparations (excluding sex hormones) (92.3%)
antiparasitic products, insecticides and/or repellent medications (83.4%; namely, hydroxychloroquine)
antineoplastic and immunomodulating drugs (71.5%)
alimentary tract and metabolism medications (65.2%).

Use of nervous system medication (and oxycodone and Vicodin in particular,) was higher in the placebo group compared with the anifrolumab 300 mg group (57.7% versus 49.4%, respectively).

Within the TULIP-2 trial, during the investigational product administration and after investigational product discontinuation, a higher proportion of beyond-protocol concomitant medication use was reported in the placebo group versus the anifrolumab group (25.3% versus 16.7% and 14.3% versus 5.0%, respectively). This was driven primarily by the use of prednisone or prednisone equivalents. As mentioned, these patients were considered nonresponders for binary end points in the efficacy analysis at subsequent visits. More patients in the placebo group (≥ 10%) received medications for the nervous system, and antiparasitic and insecticide and/or repellent medications (e.g., hydroxychloroquine) compared with those in the treatment group.

Medications considered necessary for the patient’s safety and well-being could be given at the discretion of the investigator. Permitted medications were allowed adjustments.

Medications that led to immediate discontinuation of the investigational product were cyclophosphamide, interferon therapy (alpha 2a and 2b, beta 1a and 1b, and pegylated interferons alpha 2a and 2b), investigational drugs, biologic immunomodulators (including, but not limited to, belimumab, abatacept, or rituximab), live or attenuated vaccines (the sponsor recommended that investigators ensure all patients were up to date on required vaccinations before entry into the study), plasmapheresis, Bacille Calmette-Guéri vaccine, any immunoglobulin therapy, and IV corticosteroids exceeding 1 g of methylprednisolone or equivalent.

Outcomes

A list of efficacy end points identified in the CADTH review protocol that were assessed in the clinical trials included in this review is provided in Table 8 and summarized in the following section. A detailed discussion and critical appraisal of the outcome measures is provided in Appendix 4.

Table 8

Summary of Outcomes of Interest Identified in the CADTH Review Protocol.

Disease Activity

Disease activity is measured by several SLE instruments such as the SRI-4 and BICLA and their components (e.g., SLEDAI-2K, BILAG-2004, and PGA). Assessments of disease activity and organ damage were performed at baseline and every 4 weeks until the end of each trial. Details of how each scale was calculated are outlined in the following sections. Evaluation of disease activity and organ damage across study sites were completed by trained investigators and designated site personnel. The Disease Activity Adjudication Group (also known as the Central Review Group) determined eligibility during screening and throughout the study to confirm SLEDAI-2K, BILAG-2004, and PGA scoring and the quality and accuracy of efficacy assessments completed by the investigators. The Disease Activity Adjudication Group consists of medically qualified individuals and support staff who assisted in the ongoing central review of disease activity assessments in the pivotal trials. For all measures, baseline was defined as the last measurement before randomization and dose administration on day 1.

Improvement of 4 points or Greater on the Systemic Lupus Erythematosus Responder Index

The SRI-4 response at week 52 was the primary composite end point of the TULIP-1 trial and a secondary end point of the TULIP-2 trial. SRI-4 was assessed at baseline and every 4 weeks until week 52 in both trials. The Systemic Lupus Erythematosus Responder Index (SRI) is a composite outcome that is rated dichotomously: whether a patient has or has not achieved response. The end point is designed to detect improvements without worsening in disease manifestations and disease activity. The SRI composite index comprises the SLEDAI-2K, BILAG-2004, and PGA measurement tools for SLE. Organ systems are weighted unequally with the Systemic Lupus Erythematosus Disease Activity Index (SLEDAI) scale (e.g., arthritis improvement considered greater than rash improvement), and only complete improvements are captured. A score of 6 or higher is considered moderate to severe disease activity. The SRI-4 is achieved when all 5 of the following components are met:

reduction of 4 or more points from baseline in the SLEDAI-2K
no new organ systems affected as defined by 1 or more new BILAG-2004 A item or 2 or more new BILAG-2004 B items compared to baseline
no worsening from baseline in patients’ lupus disease activity (where worsening is defined by an increase of 0.30 points or more on a 3-point PGA visual analogue scale [VAS])
no permanent premature discontinuation of the investigational product
no use of restricted medications beyond the protocol-allowed threshold on or before the date of last week-52 assessment used to derive SRI-4.

The SRI-4 has been correlated with other measures of disease activity, biomarkers, and HRQoL measures.²¹ However, the SRI-4 has been shown to be less responsive to change than the BILAG or PGA for musculoskeletal SLE.²²

British Isles Lupus Assessment Group-based Composite Lupus Assessment

A BICLA response at week 52 was the primary composite end point of the TULIP-2 trial and a secondary end point of the TULIP-1 trial. BICLA was assessed at baseline and every 4 weeks until week 52 in both trials. In contrast to the SRI, improvement in the BICLA is guided by the BILAG-2004 and worsening is assessed using the BILAG-2004, SLEDAI-2K, and PGA.²³ The BILAG-2004 can discern inactive disease, partial or complete improvement, and deterioration of disease activity, while the SLEDAI- 2K requires complete resolution of disease activity of the specific element to capture improvement.²³ With this end point, organ systems are weighted equally. Any improvement (partial or complete) had to be achieved in all BILAG-2004 organ systems affected by the disease from baseline. BICLA was achieved when all 5 of the following components were met:

improvement in involved BILAG organs (A [severe] and B [moderate]) at baseline (e.g., reduction of all baseline BILAG-2004 A to B, C, or D and baseline BILAG-2004 B to C or D, and no BILAG) with no worsening (where worsening is defined as 1 or more new BILAG-2004 A items or 2 or more new BILAG-2004 B items)
no worsening from baseline in SLEDAI-2K, with worsening defined as an increase from baseline of greater than 0 points in SLEDAI-2K
no worsening from baseline in the patients’ lupus disease activity, with worsening defined by an increase 0.30 points or more on PGA VAS (scale of 0 to 3)
no discontinuation of investigational product
no use of restricted medications beyond the protocol-allowed threshold before assessment.

The difference between anifrolumab and placebo in the proportion of patients achieving a BICLA response was assessed longitudinally over time up to week 52.

In addition, time to a BICLA response sustained up to week 52 was measured in TULIP-2. Time to a BICLA response was defined as the first BICLA response visit that is sustained up to, and including, week 52. A patient was considered to have achieved a BICLA response sustained up to week 52 if response was achieved at week 52 with “time to” defined as the first time point where a BICLA response was achieved when maintained through week 52.

British Isles Lupus Assessment Group 2004

The BILAG-2004 is a component of both the SRI-4 and BICLA. Individual assessment of the BILAG-2004 was a secondary end point in both trials. The BILAG-2004 was also used to evaluate the annualized flare rate, which was defined as either 1 or more new BILAG-2004 A items or 2 or more new BILAG-2004 B items compared to the previous visit; these are defined as severe and moderate flares in the literature, respectively.²⁴ BILAG-2004 assessments took place every 4 weeks starting from baseline to week 52. BILAG-2004 grades were presented by organ system and global scores were also provided. BILAG index scoring (BILAG-2004 version September 1, 2009)²⁵ was used in the central review process. BILAG system scores were assigned scores of A, B, C, D, or E at all study visits by strictly following this index scoring. BILAG-2004 global scores were derived by summing the numerical-score equivalents for each organ system, with A = 12, B = 8, C = 1, D = 0, and E = 0. Results from the original scores are used to calculate the primary efficacy end points in both trials. Although the BILAG was developed based on the principle of physicians’ intention to treat, the treatment had no bearing on the scoring index within the trials and was based solely on active manifestations.

The BILAG-2004 is an updated version of the original BILAG that grades clinical features as being new, the same, worse or improving, and incorporates severity in the scoring.²⁶ The classic BILAG had 8 domains and consisted of fewer items that were more related to damage than to disease activity and did not properly include disease activity in the gastrointestinal or ophthalmic systems.²⁴ The BILAG-2004 is an ordinal scale of 97 clinical and laboratory variables covering 9 organ systems (general, mucocutaneous, neuropsychiatric, musculoskeletal, cardiorespiratory, gastrointestinal, ophthalmic, renal, and hematologic), with scores ranging from A (severe disease) to E (never involved) for each organ system. BILAG-2004 records disease activity across the different organ systems by comparing the immediate past 4 weeks to the 4 weeks preceding them. The first 7 organ systems (except renal and hematologic) contain clinical parameters that are assessed by the treating physician as new (4), worse (3), the same (2), improving (1) and not present (0). The assessment is based on disease manifestation, the physician’s intention to treat, and categorization of disease activity (e.g., grades A to E, where A is most severe and E is never present). The renal and hematologic scoring is based on laboratory values. A total score is not usually calculated. The BILAG-2004 gives equal weight to all affected body systems and can measure incremental improvements or worsening within a body system, unlike the SLEDAI-2K, which can only record clinical manifestations as absent or present. The BILAG-2004 requires improvement in all baseline manifestations within a system to result in a change in that system’s BILAG-2004 level. For example, a patient with skin eruption and severe mucosal ulceration at baseline must show improvement in both symptoms to result in a change in the BILAG-2004 mucocutaneous index level.²⁷ Appendix 4 provides further details.

The BILAG-2004 tool has been found to be valid, reliable, and sensitive to change over time.²⁷^-²⁹ The BILAG-2004 index is a valid measure of disease activity and was recommended for use in clinical trials and outcome studies.²⁹ It has been found to be more responsive to change than the SLEDAI-2K.²⁷ In terms of clinically meaningful difference, a minor improvement is considered a change from grade A to B or grade B to C; Minor deterioration is considered a change from grade C to B.³⁰ In terms of flare index, the BILAG-2004 had better inter-rater reliability than did the Safety of Estrogens in Lupus Erythematosus National Assessment flare index and PGA; however, agreement was less consistent with mild and moderate flares than with severe flares.²⁴

Modified BILAG-2004

In the TULIP-2 trial, modified BILAG rules were used in the sensitivity analyses for the primary end point of BICLA, and for flares. The modified BILAG assessment utilizes modified BILAG-2004 index scoring rules. The modified BILAG uses an algorithm that eliminates categories, such as BILAG A and BILAG B, which result from manifestations assessed as “same” when there is neither improvement nor worsening from the last visit’s assessments. The Disease Activity Adjudication Group differentiated these A and B scores by reviewing all BILAG-2004 index scores for each patient’s visits, using the modified BILAG-2004 index scoring rules. According to the sponsor, the resulting categories from the modified BILAG are more clinically relevant in a clinical trial setting when measuring disease activity that remained at the “same” level of improvement compared to previous visits. The modified BILAG rules and the review process and scoring as well as references used that justify the modification are detailed in a charter; however, these were not provided to CADTH.

Systemic Lupus Erythematosus Disease Activity Index 2000

The SLEDAI-2K is a component of both the SRI-4 and BICLA. In both trials, individual assessment of the SLEDAI-2K was a secondary end point. SLEDAI-2K assessments took place at baseline and every 4 weeks (28 days) until week 52. A certified investigator or designated physician assessed each manifestation as being either “present” or “absent” in the previous 4 weeks. The assessment also includes blood and urine sampling for assessment of the SLEDAI-2K laboratory categories. SLEDAI-2K scores were derived from the sum of the scores for all items and evaluated using the difference in mean change from baseline longitudinally over time to week 52. Scores for the SLEDAI organ systems were derived in the same way as SLEDAI-2K but using the scores for the respective items only. For each SLEDAI organ system, the proportion of patients with an improvement (i.e., a SLEDAI organ system score less than the corresponding score at baseline) at week 24 and week 52, respectively, will be assessed for patients with an organ system involvement at baseline (i.e., a SLEDAI organ system score greater than 0).

The SLEDAI-2K is a 24-item weighted score of lupus activity that ranges from 0 to 105, with higher scores indicating greater disease activity and 0 indicating inactivity.³¹ It is a modified version of the original SLEDAI that allows for persistent active disease in alopecia, mucous membrane ulcers, rash, and proteinuria to be scored.³² The SLEDAI-2K is based on the presence of 24 descriptors in 9 organ systems that are defined by the investigator as “present” or “absent” in the patient in the past 4 weeks and incudes the use of laboratory samples. It is a weighted instrument, in which descriptors are multiplied by a particular organ’s “weight.” For example, renal descriptors are multiplied by 4 and CNS descriptors by 8, and these weighted organ manifestations are totalled into a final score.

SLEDAI-2K scores are valid and reliable assessments of lupus disease activity, but less responsive to change compared with other measures such as the BILAG-2004 and PGA.²⁷^,³³ Clinically meaningful responses are + 3 to + 4 points for worsening, −1 to −2 points for improvement,³⁴ and + 3 points for associated flares.³⁵ More details are provided in Appendix 4.

Clinical Systemic Lupus Erythematosus Disease Activity Index 2000

Clinical SLEDAI-2K scores were secondary end points in both trials. In both trials, the clinical SLEDAI-2K score was the sum of the scores for the SLEDAI-2K vasculitis, arthritis, myositis, rash, alopecia, mucosal ulcers, pleurisy, and pericarditis items. Measurement of clinical SLEDAI-2K followed the same schedule as the SLEDAI-2K. The “clinical” SLEDAI-2K score is the SLEDAI-2K assessment score without the inclusion of points attributable to any urine or laboratory results, including immunologic measures.⁹^,¹⁰ Its use could permit earlier clinical decisions to be made without waiting for immunologic measures. In both trials, in any circumstance in which the clinical SLEDAI-2K score was used, sites had to subsequently update the SLEDAI-2K assessment when laboratory data became available so that the full SLEDAI-2K score was made available to the sponsor.

Physician’s Global Assessment

Individual assessments of the PGA were secondary end points in both trials. The difference between anifrolumab and placebo in the mean change from baseline in PGA (measured on a VAS ranging from 0 to 3) were assessed by visit every 4 weeks until week 52.

The PGA uses a VAS scored between 0 and 3, with physicians asked the following question: How do you assess your patient’s current disease activity?” Possible answers are 0 = none, 1 = mild, 2 = moderate, and 3 = severe. When scoring the PGA, the score from the previous visit is reviewed and the mark moved relative to the score from the previous visit. This is a global assessment, factoring in all aspects of the patient’s lupus disease activity. It does not reflect medical conditions not associated with lupus. Any disease rated greater than 2.5 is very severe. The instrument is similar to a logarithmic scale, with greater distances or demarcations possible among milder to moderate symptoms.

The threshold for “no worsening” on the PGA is a change of less than 0.3 points based on the SRI-4.³⁶ In a trial for epratuzumab, a significant improvement was considered a 20% decrease in PGA score evaluated after 12 months of treatment.³⁷

Maintenance of OCS Dose of No More Than 7.5 mg From Week 40 to 52 in Patients With a Baseline Dose of 10 mg or Greater

In both trials, a key secondary end point was the difference in the proportion of patients with a baseline OCS dosage of 10 mg/day or higher of prednisone or equivalent in the anifrolumab group versus the placebo group who maintained OCS reduction to no more than 7.5 mg from week 40 to week 52. Patients who achieved the reduction and were able to maintain it to week 52 were considered responders. A maintained OCS reduction is defined as meeting all the following criteria:

achieve an OCS dosage of no more than 7.5 mg/day prednisone or equivalent by week 4
maintain an OCS dosage of no more than 7.5 mg/day prednisone or equivalent from week 40 to week 52 (a maintained OCS dose is defined as no dose increase (i.e., no dose greater than the dose at week 40 plus 1 day) between week 40 plus 2 days and week 52, inclusive
the date of last assessment used for efficacy analysis (SLEDAI-2K, PGA, and BILAG-2004) in the time window of week 52 will be used as the date of week 52; if no such assessment falls into the respective time window, then the target date for the time point will be used instead
no permanent premature discontinuation of the investigational product
no use of restricted medications beyond the protocol-allowed threshold on or before the date of week 52.

If any of these conditions were not fulfilled or could not be evaluated at week 52 (e.g., due to missing values) the patient was defined as a nonresponder.

Health-Related Quality of Life

Measures of HRQoL were secondary end points in both trials. HRQoL assessments that aligned with this CADTH review included the Short Form (36) Health Survey Version 2 (SF-36v2), the Lupus QoL scale; and the EQ-5D-5L. The difference between anifrolumab and placebo in the mean change from baseline in HRQoL measures were analyzed using descriptive statistics.

Short Form 36-item Health Survey

The SF-36 was administered at baseline, every 8 weeks, and at week 52. The SF-36 is a generic, self-reported health assessment questionnaire that has been used in clinical trials to study the impact of chronic disease on HRQoL.³⁸ It yields scale scores for 8 health domains, and 2 summary measures of physical and mental health: the PCS and the MCS. Only the PCS and MCS will be reviewed for this CADTH review. According to the sponsor, the meaningful change threshold was defined as 3.4 points on the PCS and 4.6 points on the MCS.

According to a literature review of 8 studies,³⁹ anchor-based MIDs for improvement were estimated to be from 2.1 to 2.4 for summary scores in patients with SLE. These estimates are consistent with estimates from other rheumatological conditions (2.5 to 5 points for summary scores).

Lupus Quality of Life

The Lupus QoL was assessed at baseline and every 12 weeks till week 52. The Lupus QoL is a 34-item SLE-specific HRQoL measure.⁴⁰ The instrument consists of 8 domains: physical health (8 items), pain (3 items), planning (3 items), intimate relationships (2 items), burden to others (3 items), emotional health (6 items), body image (5 items), and fatigue (4 items). Domain scores were derived when at least 50% of the items were answered. The mean raw domain score was the total of the item response scores of the answered items divided by the number of answered items. A nonapplicable response was treated as unanswered. The mean raw domain scores were transformed to domain scores (ranging from 0 as worst QoL to 100 as best QoL) as mean raw domain score divided by 4 and multiplied by 100.

Anchor-based MIDs ranged from 2.4 to 8.7 for deterioration and from 3.5 to 7.3 for improvement. MIDs derived using distribution-based approaches based on an SD of 0.5 ranged from 12.9 to 16.7.³⁹

EQ-5D-5-Level

The EQ-5D-5L was assessed at baseline and every 12 weeks till week 52. The EQ-5D-5L comprises 5 dimensions: mobility, self-care, usual activities, pain/discomfort and anxiety/depression, and includes 5 response levels of severity (no problems, slight problems, moderate problems, severe problems, and unable to/extreme problems) in each of the dimensions.⁴¹ The UK value was used for all patients in the study. The EQ-5D-5L health states were converted into a single index value using values sets from the EQ-5D-5L Crosswalk Project. The questionnaire also includes a VAS (the EQ VAS), in which the patients were asked to rate their health on a scale of 0 to 100, with 0 being worst imaginable health state and 100 being best imaginable health state.

The EQ-5D-3L has been shown to be a valid measure that can discriminate between patients with higher disease activity (SLEDAI score > 5) versus lower disease activity (SLEDAI score ≤ 5). However, it is not able to discriminate between patients with higher disease damage versus those with lower damage and is not responsive to longitudinal changes in disease activity based on SLEDAI scores. In addition, multiple studies have shown that the EQ VAS was not responsive to self-reported changes in health.³⁹^,⁴² SLE-specific MIDs for the EQ-5D-5L have not been reported.

Mortality

Mortality was documented in the TULIP-1 and TULIP-2 trials as the number of patients who died by the end of the study period as part of the safety analysis.

Morbidity: Organ Damage

The SDI was a secondary end point in both trials. The difference in mean change in SDI global score from baseline to week 52 was used to evaluate the effect of anifrolumab 300 mg versus placebo on irreversible damage in SLE patients.

The SDI was developed to assess irreversible damage in SLE patients independently of its cause (SLE activity, therapy, comorbidities) but occurring after disease onset. Damage is usually defined as a clinical feature that must be continuously present for at least 6 months to receive a score. The SDI consists of 42 items in 12 domains, with a maximum score of 46 (higher scores denote more damage). At SLE diagnosis, the SDI score is 0. Damage according to the SDI score is defined as an SDI global score of 1 or higher, while no damage is defined as an SDI global score of 0.⁴³ The SDI is defined for 12 organ systems (possible scores): peripheral vascular (0 to 5), ocular (0 to 2), neuropsychiatric (0 to 6), renal (0 to 3), pulmonary (0 to 5), cardiovascular (0 to 6), gastrointestinal (0 to 6), musculoskeletal (0 to 7), skin (0 to 3), endocrine (diabetes) (0 to 1), gonadal (0 to 1), and malignancies (0 to 2). The SDI global score is the sum of the damage scores for all 12 organ systems. Postbaseline categories used for the presentation of change in damage are “no change,” “+1 point,” “+2 points,” and “+3 or more points.”

An SDI score of 1 or higher indicates worsening.⁴³ The SDI is a valid and reliable instrument.⁴⁴^,⁴⁵ The SDI was found to be a predictor of mortality and SDI scores have been shown to increase with disease duration.⁴⁶ Correlation with the SLEDAI and BILAG was low, although 1 study found strong correlation with the SLEDAI.⁴⁴

Reduction in Symptoms

Cutaneous Lupus Erythematosus Disease Area and Severity Index

The CLASI was a key secondary end point in both trials. The difference in the proportion of patients with a 50% or greater reduction in CLASI activity score at week 12 in the subgroup of patients with a baseline CLASI activity score of 10 or higher from baseline to week 52 was used to evaluate the effect of anifrolumab 300 mg versus placebo on skin lesions. The CLASI describes the extent of cutaneous disease in terms of the intensity of involvement measured in 13 different anatomic locations. It has 2 scores, 1 for disease activity (scored from 0 to 70) and 1 for disease damage (scored from 0 to 80).⁴⁷^,⁴⁸ The activity score considers erythema, scale and/or hypertrophy, mucous membrane lesions, recent hair loss, and nonscarring alopecia. The damage score represents dyspigmentation; scarring, atrophy, and/or panniculitis; and scarring of the scalp. Patients were asked if their dyspigmentation lasted 12 months or longer, in which case the dyspigmentation score was doubled. Each of the above parameters was measured in 13 different anatomic locations that were included specifically because they are most often involved in cutaneous lupus erythematosus. The most severe lesion in each area was measured.

The CLASI is a validated and reliable index to assess SLE patients.⁴⁸^,⁴⁹ A clinically important improvement in the CLASI was found to be a mean of 3 points or an 18% decrease in the CLASI activity score.⁵⁰

Joint Reduction Rate

In the TULIP-1 trial, the difference in the proportion of patients with at least 8 swollen and at least 8 tender joints at baseline who achieve at least a 20% or at least 50% reduction from baseline in both the number of swollen and tender joints at week 52 was a supporting secondary end point; In the TULIP-2 trial, the difference in the proportion of patients with a 50% or greater reduction in joint counts at week 52 in the subgroup of patients with at least 6 swollen and at least 6 tender joints at baseline was a key secondary end point.

In both trials, an active joint is defined as a joint with swelling and tenderness. In the TULIP-1 trial, at least 20% reduction and at least 50% reduction are reached if the percentage of change is no more than −20% and no more than −50%, respectively. No restricted medications beyond the protocol-allowed threshold were used on or before the assessment, and there was no permanent premature discontinuation of the investigational product. To achieve at least a 20% reduction and at least a 50% reduction, respectively, the reduction in the number of joints needs to be reached in swollen and tender joints separately. In the TULIP-2 trial, an at least 50% reduction is reached if all the following criteria are met: the percentage reduction from baseline in both the number of swollen joints and the number of tender joints, separately, is 50% or greater; no permanent premature discontinuation of investigational product; and no use of restricted medications beyond the protocol-allowed threshold on or before the assessment.

In both trials, the swollen and tender joint count was based on the left and right shoulder, elbow, wrist, all metacarpophalangeal and proximal interphalangeal joints of the upper extremities and the left and right knees of the lower extremities. An active joint for the SLEDAI-2K calculation is defined as a joint with pain and tenderness and at least 1 of the following (warmth, erythema, swelling, or effusion). However, in the TULIP-1 and TULIP-2 trials, an active joint for the joint count assessment was defined as a joint with tenderness and swelling only. Each of 28 joints was then evaluated separately for tenderness (by palpating the joint) and swelling. Joints with intra-articular injections within 4 weeks were not evaluable for the assessment. The joint count assessment included questions regarding limitation of range of movements and effects of joint symptoms on basic and functional ADLs.

Functional Assessment of Chronic Illness Therapy–Fatigue

The FACIT-F was a secondary end point in both trials. The FACIT-F was assessed at baseline and every 4 weeks until week 52. The FACIT-F is completed by patients to assess fatigue. Patients were presented with a list of 13 statements (i.e., “I am too tired to eat”) and asked to rate each on a 4-point Likert scale (0 = not at all, 1 = a little bit, 2 = somewhat, 3 = quite a bit, and 4 = very much), to indicate how true the statement was during the past 7 days.⁹^,¹⁰ Final scores are the sum of the responses from the 13 items and range from 0 to 52; items are reverse-scored, with higher scores indicating better QoL. According to the sponsor, a clinically meaningful response was considered a change from baseline of more than 3 points, with no restricted medication use beyond the protocol-allowed thresholds on or before the assessment, and no permanent premature discontinuation of IP.

The FACIT-F is a valid and reliable instrument for use in patients with SLE. The FACIT-F is responsive to clinical improvement but not clinical deterioration. It can differentiate groups defined on the BILAG general and musculoskeletal domains. It is correlated with the SF-36 and Patient Global Assessment, with weak to moderate correlation with the PGA, BILAG, and Safety of Estrogens in Lupus Erythematosus National Assessment (SELENA) SLEDAI.⁵¹ According to the evidence, the anchor-based MIDs ranged from 2.5 to 8.4 points.⁵¹ The distribution-based MIDs fell within 3.8 to 4.6 points (based on an SD of 0.33) and 5.8 to 6.8 points (based on an SD of 0.5; standard error of the mean = 2.7 to 2.9 points).⁵¹

Pain Numerical Rating Scale

The pain NRS is a secondary end point in both trials to capture patient-reported pain. The pain NRS is an 11-point Likert scale to capture overall patient-reported pain (0 = no pain; 10 = worst pain imaginable) with a 1-week recall period. The pain NRS has acceptable test-retest reliability.⁵²

Achievement of Low Disease Activity

Low disease activity state (LLDAS) is a secondary end point in both trials. LLDAS is a state that, if sustained, is “associated with a low likelihood of adverse outcome, considering disease activity and medication safety.”⁵³ This is a binary end point used to evaluate the difference in the proportion of patients with response in LLDAS at week 52. Patients were considered LLDAS responders at a specific visit if they met the following criteria:

SLEDAI-2K score of no higher than 4, with no activity in major organ systems (renal, CNS, cardiopulmonary, vasculitis, fever) and no hemolytic anemia or gastrointestinal activity
no new lupus disease activity compared with the previous assessment as measured by the SLEDAI-2K, BILAG-2004, PGA 1 or lower (scale 0 to 3); current prednisone (or equivalent) dosage of no more than 7.5 mg/day
no discontinuation of investigational product
well-tolerated standard maintenance doses of immunosuppressive drugs and approved biologic drugs (i.e., no use of restricted medications beyond the protocol-allowed threshold before assessment).

The LLDAS has good criterion validity; according to the literature, patients who spent 50% or more of their observed time in LLDAS had significantly reduced organ damage accrual and were less likely to have an SDI increase of 1 or greater.⁵³

Disease Flare Frequency and Severity

In both trials, the difference in annualized flare rates through week 52 was a key secondary end point. Flares were defined as either 1 or more new BILAG-2004 A items or 2 or more new BILAG-2004 B items compared to the previous visit (i.e., a worsening from an E, D, or C score to a B score in at least 2 organ systems or a worsening from an E, D, C, or B score to an A score in any 1 organ system compared to the previous visit). The occurrence of a new flare was checked for each available visit versus the previously available visit up to week 52. If no flare occurred, the number of flares was set to 0. Otherwise, all flares were counted, leading to a maximum number of 13 flares. The annualized flare rate is the number of flares divided by the flare exposure time in days multiplied by 365.25. The flare exposure time is the time up to week 52 (date of BILAG-2004 assessment at week 52) or up to the date of last-available BILAG-2004 assessment, up to and including week 52 in cases of premature study discontinuation and was derived as the date of week 52 divided by the date of the last BILAG-2004 assessment minus the date of the first administration of the investigational product plus 1.

The BILAG-2004 can easily distinguish between severe flares and no flare, but mild and moderate flares are more difficult to distinguish. Overall, the BILAG-2004 appears to be a reliable instrument for measuring flares, and studies have shown that it is better at capturing flares than the Safety of Estrogens in Lupus Erythematosus National Assessment Flare Index (SFI).²⁴ The BILAG-2004 had better inter-rater reliability than the SFI and PGA; however, agreement was less consistent with mild and moderate flares than with severe flares.²⁴

Safety Assessments

Key safety assessments were AEs (including AESIs), safety laboratory tests, vital signs, electrocardiograms, and physical examination (including assessment of Cushingoid features). In addition, the C-SSRS and PHQ-8 were utilized as safety assessments for depression in both trials. A modified SFI was used to assess flares.

The PHQ-8 assesses symptoms of depression over the last 2 weeks. There are 8 item scores that range from 0 to 3; a total score higher than 10 is considered indicative of major depression and greater than 20 is considered indicative of severe major depression.⁵⁴ The difference between anifrolumab and placebo in the mean change from baseline in PHQ-8 total score will be assessed by visit up to week 52. The PHQ-8 is completed by the patient and scored by the investigator. No evidence related to the validity, reliability, responsiveness, or MID of the instrument among SLE patients was identified.

The C-SSRS is an assessment tool that evaluates suicidal ideation and behaviour. It is made up of 10 categories, all of which maintain binary responses (yes or no) to indicate the presence or absence of behaviour that is significantly predictive of completed suicide.⁵⁵ The outcome of the C‐SSRS is a numerical score obtained from the aforementioned categories. Two different versions of the questionnaire were used in the pivotal trials⁹^,¹⁰: 1 assessing the last 12 months before the assessment and a second assessing the time since last visit. The score will be derived at each assessment for each patient up to week 52. Suicidal ideation was defined as a “yes” answer at any time in the respective study period to any 1 of the 5 (re-ordered) suicidal ideation questions, ranging from category 1 (“wish to be dead”) to category 5 (“active suicidal ideation with specific plan and intent”) on the C-SSRS. Suicidal behaviour was defined as a “yes” answer at any time in the respective study period, to any 1 of the 5 (re-ordered) suicidal behaviour questions, ranging from category 6 (“preparatory acts or behaviour”) to category 10 (“completed suicide”) on the C-SSRS. Nonsuicidal self-injurious behaviour is assigned if no ideation or behaviour is present. No evidence related to the validity, reliability, responsiveness or MID of the instrument among SLE patients was identified.

The pivotal trials used a modified version of the SFI, with the SLEDAI-2K used instead of the SELENA SLEDAI to identify flares and severity for the safety analysis.⁹^,¹⁰ This is a disease-specific composite measure that classifies flares as mild to moderate or severe, based on criteria of clinical activity, need for additional treatment, or PGA score.⁴⁶ In the pivotal trials, a mild to moderate flare and a severe flare were defined according to the following criteria:

Mild to moderate flare:
- change in SLEDAI-2K score of 3 or more points but less than 7 points compared to previous visit, or
- new or worse discoid, photosensitive, profundus, cutaneous vasculitis, or bullous lupus, or
- nasopharyngeal ulcers, pleuritis, pericarditis, arthritis, or SLE fever, or
- increase of 1.0 or greater in PGA score (but not greater than 2.5).
Severe flare:
- change in SLEDAI-2K score of 7 points or greater compared to previous visit, or
- new or worse CNS-SLE, vasculitis, nephritis, myositis, hemolytic anemia (hemoglobin less than 70 g/L) or
- decrease in hemoglobin of greater than 30 g/L with positive Coombs) and at least 1 of the following: decreased haptoglobin, increased total bilirubin not due to Gilbert’s disease, increased reticulocyte count, or
- hospitalization for SLE, or
- increase in PGA score to greater than 2.5.⁹^,¹⁰

AEs and SAEs were collected from the time of informed consent, throughout the treatment period and including the follow-up period until follow-up visit 2 (12 weeks after the final dose) or week 52 for the patients who enrolled in the LTE study. Any AEs that were unresolved at the patient’s last visit in the study were to be followed up by the study staff for as long as medically indicated. An AE was defined as the development of an undesirable medical condition or the deterioration of a pre-existing medical condition following or during exposure to the investigational product, regardless of whether or not the event was considered causally related to the product. An undesirable medical condition could be symptoms (e.g., nausea, chest pain), signs (e.g., tachycardia, enlarged liver) or the abnormal results of an investigation (e.g., laboratory findings, electrocardiogram).

An SAE was defined as an AE that fulfilled 1 or more of the following criteria:

resulted in death
was immediately life-threatening
required inpatient hospitalization or prolongation of existing hospitalization
resulted in persistent or significant disability/incapacity or substantial disruption of the ability to conduct normal life functions
was a congenital abnormality or birth defect
was an important medical event that could jeopardize the patient or may have required medical intervention to prevent 1 of the earlier outcomes listed.

Statistical Analysis

Sample Size Determination and Power Calculation

TULIP-1

The sample size was primarily driven by the need to acquire a sufficiently large safety database, as well as the ability to assess key secondary end points. In the TULIP-1 trial, assuming that 39% and 63% of patients in the placebo and anifrolumab 300 mg groups, respectively, achieve SRI-4, treatment groups of 180 patients would yield more than 99% power to reject the hypothesis of no difference using a 2-sided alpha of 0.05. This sample size provides a minimal detectable difference of approximately 10% in SRI-4 between anifrolumab 300 mg versus placebo.

In the TULIP-1 trial, estimates of power for 2 key secondary end points were calculated. For the type I interferon gene signature test high subgroup, assuming that 75% of patients are type I interferon gene signature test high, and the proportions of SRI-4 in the type I interferon gene signature test high subgroup were 35% and 61% in the placebo and anifrolumab groups, respectively, a 2-sided alpha of 0.04 yields 98% power. For the OCS dosage of no more than 7.5 mg/day at week 40, which is maintained through week 52 in the subgroup of patients with a baseline OCS dosage of 10 mg/day or higher, a 2-sided alpha of 0.004 yields 87% power, assuming that 60% of patients have an OCS dose of 10 mg or more at baseline and the proportion of patients were 32% and 59% in the placebo and anifrolumab groups, respectively. Power calculations for these 2 key secondary outcomes assumed that the primary end point was met, and testing of the key secondary end points was therefore allowed. Each end point was tested using a weighted Holm procedure, and the alpha was given by the assigned weight in the first step of the algorithm. The assumptions of the effect sizes and sizes of subgroups used for these calculations were based on results from an interim analyses of the MUSE study.

TULIP-2

The power calculation for the TULIP-2 trial was updated from the TULIP-1 trial due to the modified primary end point (BICLA at week 52); however, these calculations yielded no changes to the study sample size. The purpose of the power calculations was to justify updates to the primary and key secondary end points. Assuming that 30% and 46% of patients in the placebo and anifrolumab 300 mg groups, respectively, achieve BICLA, 180 patients per study group yields approximately 88% power to reject the hypothesis of no difference using a 2-sided alpha of 0.05. Effect sizes were based on observed results from the TULIP-1 trial. The minimal detectable difference in BICLA response between anifrolumab 300 mg versus placebo is approximately 10% with this sample size. Calculations are based on a 2-group chi-square test of equal proportions.

Analysis Populations

Both pivotal trials utilized a full analysis set (FAS) for reporting efficacy and safety data. The FAS included all patients randomized into the study who receive at least 1 dose of the investigational product. The FAS was analyzed according to randomized treatment, modified intention-to-treat approach.

Statistical Test or Model

For both pivotal trials, descriptive statistics (number, mean, SD, median, minimum, and maximum) were provided for continuous variables, and counts and percentages were presented for categorical variables. For treatment comparisons, 95% CIs were presented. If a model was used to estimate the treatment difference, the corresponding CI according to the model was presented.

Primary Outcome of the Studies

The main components of the statistical test and model for both trials are discussed in Table 9. The primary outcome for both trials utilized the Cochran-Mantel-Haenszel (CMH) approach. The CMH estimates were stratified by SLEDAI-2K score at screening (< 10 points versus ≥ 10 points, baseline OCS dosage (< 10 mg/day versus ≥ 10 mg/day prednisone or equivalent), and results of a type I interferon test (high versus low).

Key Secondary Outcomes of the Studies

In both trials, the same CMH approach as described for the primary end point was used for 4 key secondary end points of SRI-4 (TULIP-1) or BICLA (TULIP-2) at week 52 in the subgroup of patients with high results on an interferon test; maintained OCS reduction to no more than 7.5 mg in the subgroup of patients with a baseline OCS dose of greater than 10 mg; CLASI reduction in patients with a baseline CLASI activity score of 10 or higher; and joint count reduction by at least 50% in patients with at least 8 swollen and at least 8 tender joints at baseline (TULIP-1), and at least 6 swollen and at least 6 tender joints at baseline (TULIP-2). The analysis was repeated for patients achieving an reduction of at least 20% in swollen and tender joints. For maintenance of OCS reduction, stratification factors were reduced to SLEDAI-2K score at screening and results of the type I interferon gene signature test.

The final key secondary outcome, annualized flare rate through week 52, was analyzed using a negative binomial regression model in both trials. The response variable in the model was the number of flares over the 52-week treatment period The model included covariates of treatment group and the stratification factors. The logarithm (base e) of the follow-up time (flare exposure time) was used as an offset variable in the model to adjust for patients with different exposure times.

Other Secondary Outcome Variables of the Studies

Change from baseline and observed values in SDI global score will be presented by visit with descriptive statistics. Change from baseline in SLEDAI-2K and PGA was analyzed using a repeated measures model with fixed effects for baseline value, treatment group, visit, treatment-by-visit interaction and stratification factors. Covariance parameters were estimated using a restricted maximum likelihood method and Kenward-Rogers denominator degrees of freedom was used for the tests of fixed effects. An unstructured covariance matrix was used. In case of convergence issues, the following alternative structures were used for fitting (in this order): heterogeneous Toeplitz, heterogeneous autoregressive (1), heterogeneous compound symmetry, homogeneous compound symmetry. This analysis was repeated for other supportive outcome variables, including HRQoL measures and symptom scores (e.g., SF-36v2 [acute] domain scores, PCS and MCS, pain NRS, FACIT-F, and Lupus QoL). LLDAS followed the same CMH approach as the primary end point.

Table 9

Statistical Analysis of Key Efficacy End Points (TULIP-1 and TULIP-2).

Data Imputation Methods

For binary efficacy responder end points, any criteria with a missing value were imputed using the LOCF; however, this was only done if the missing data point was for a single visit for that component. Nonresponders were asked to continue to attend scheduled assessments through week 52 in both trials. In the event of 2 or more consecutive visits with missing data for the same component, the LOCF will be used for the first missing value of each sequence, after which the data will be imputed as nonresponders for the specific responder end point. If a component (e.g., SLEDAI-2K) is based on several data points, the LOCF will be used for the single data points. Missing safety data were generally not imputed.

For the primary outcome of both studies, if any of the criteria could not be evaluated at week 52 due to a missing value, that criterion was imputed using the LOCF and the primary end points were derived based on the complete data. This applies only if week 48 data are not missing, otherwise the patient will be defined as not achieving the primary end point at week 52.

Censoring Rules for Time-to-Event Analyses

In the TULIP-2 trial, for the outcome of time to BICLA response, patients without a BICLA response sustained up to week 52 were censored at the date of premature discontinuation of IP, or week 52, whichever occurred earlier. If patient did not prematurely discontinue treatment, but also did not have a week 52 assessment, then the date of the last-available BICLA assessment (latest of BILAG, SLEDAI and PGA date) before week 52 was used as the censoring date.

Subgroup Analyses

Subgroup analyses were planned a priori in the statistical analysis plan for groups of patients in both the TULIP-1 and TULIP-2 trials. For each subgroup the respective outcome and 95% CI was provided. Subgroup analyses were conducted for the primary outcomes of SRI-4 and BICLA in the TULIP-1 and TULIP-2 trials, respectively, and key secondary outcome of maintaining OCS dose reduction of no more than 7.5 mg between weeks 40 and 52 for the subgroup of patients with an OCS dose of 10 mg or higher at baseline. Subgroup analyses were conducted for the following factors:

SLEDAI-2K score at screening (< 10 points, ≥ 10 points)
OCS dose at baseline (< 10 mg/day, ≥ 10 mg/day prednisone or equivalent)
result of type I interferon gene signature test (high, low)
sex (female, male)
age (≥ 18 to 65 years, ≥ 65 years)
onset of disease (pediatric, adult)
BMI (≤ 30 kg/m², > 30 kg/m²)
race (white, Black or African American, Asian, native Hawaiian or other Pacific Islander, American Indian or Alaska native, other)
ethnicity (Hispanic or Latino, non–Hispanic or Latino)
ADA result (positive at any time, negative, persistently positive, ADA-positive with a titre > median of maximum titre)
baseline anti-dsDNA positive or abnormal complement 3 and/or abnormal complement 4 proteins versus complementary group (≥ 1 positive/abnormal, all negative/normal).

Subgroup analyses were suppressed if any of the subpopulations in any treatment group consisted of fewer than 25 patients.

The following subgroups, planned a priori in the statistical analyses plan, aligned with the subgroups prespecified in the protocol for this CADTH review: SLEDAI-2K score at screening (< 10 points, ≥ 10 points); OCS dose at baseline (< 10 mg/day, ≥ 10 mg/day prednisone or equivalent); and type I interferon gene signature test (high, low). Only the subgroups identified in the CADTH review protocol are reported in the following efficacy section. The subgroup of OCS dose of 10 mg or higher is of importance to this CADTH review as the sponsor is requesting reimbursement for this subgroup of patients.

Sensitivity Analyses

Sensitivity analyses were performed using LOCF imputation on the responding population and tipping-point analyses were performed to examine the impact of missing data and nonresponder imputations (e.g., permanent discontinuation of the investigational product) on the primary and key secondary end points. Tipping-point analyses are intended to identify the point at which the results would tip from statistically significant to not statistically significant. Tipping-point analyses were only performed for the primary and key secondary end points that achieved a nominally statistically significant result (a P value < 0.05). These analyses varied the assumptions about outcomes among the subgroup of patients in the trials groups who prematurely discontinued the investigational product. Because the proportions of patients achieving the primary objective and key secondary end points were analyzed using a Pearson chi-square test, the stratification factors used in the main (CMH) analysis were disregarded. In addition, patients who prematurely discontinued the investigational product were altered from nonresponder to responder in an iterative manner.

For the primary end points of each trial, an extra sensitivity analysis was performed to examine the impact of intermediate missing data. Intermediate missing values of SRI-4 in the TULIP-1 trial and BICLA in TULIP-2 were imputed using multiple imputations based on the imputed values of the BILAG-2004, PGA, and SLEDAI-2K components. In addition, the primary end point of the TULIP-2 trial would be repeated using the modified BILAG-2004.

In terms of CLASI score for both trials, a further sensitivity analysis would be provided if at least 10 patients in the anifrolumab 300 mg or placebo treatment group have a burst and taper of OCS or intramuscular steroids during the first 12 weeks of treatment.

For flares in both trials, to examine the sensitivity of the results of the main analysis to deviations from the underlying assumptions, an additional analysis was performed using the controlled multiple-imputation method.⁵⁶ As with the main analysis, the sensitivity analysis includes all data until patients complete or withdraw from the study regardless of whether they discontinue from randomized treatment. For this method, the number of flares after withdrawal from study will be imputed according to the observed number of flares before the withdrawal, a post withdrawal model assumption, the baseline covariates included in the main analysis model, and the time the patient would have remained in the study if not withdrawn (i.e., date of first administration of the investigational product + 364 days – date of last-available BILAG-2004 assessment).

Multiplicity Testing

If the primary end point was statistically significant, the 5 key secondary end points would be tested using the weighted Holm procedure⁵⁷^,⁵⁸ to strongly control the familywise error rate at the 2-sided 5% level. The procedure applies alpha recycling according to the weights given in Figure 4 and Figure 5. The weights were chosen based on a combination of estimated power for the individual key secondary end points and their relative clinical importance. If any key secondary end point achieved statistical significance (i.e., had a 2-sided P value of less than or equal to the corresponding alpha level in the weighted Holm procedure), a statistically significant difference between the treatment groups for the key secondary end point would be declared.

Figure 4

Alpha Recycling Strategy for SRI-4 (TULIP-1).

Figure 5

Alpha Recycling Strategy for BICLA (TULIP-2).

Results

Patient Disposition

A summary of patient disposition in the pivotal trials is provided in Table 10. In the TULIP-1 and TULIP-2 trials, 847 and 649 patients respectively, were screened for eligibility into the trial. In total, the TULIP-1 trial had 180 and 184 patients who met eligibility and were randomized into the anifrolumab 300 mg and placebo groups, respectively (N = 384). In the TULIP-2 trial, 181 and 184 patients were randomized, respectively; however, 1 patient in the treatment group and 2 patients in placebo group were not treated because of an AE and failure to meet randomization criteria, respectively. In total 382 patients were randomized into the TULIP-2 trial. In the TULIP-1 trial, the rate of study discontinuation was similar between treatment groups — 18.9% and 19% in the anifrolumab and placebo groups, respectively. In the TULIP-2 trial, there were fewer discontinuations in the treatment group (13.3%) versus the placebo group (25.3%). In the TULIP-1 trial, the major reason for discontinuation was withdrawal by patient (8.3% and 8.2% in the anifrolumab and placebo groups, respectively) followed by AEs (6.7% and 2.7%, respectively). Similarly in the TULIP-2 trial, the major reason for discontinuation was withdrawal by patients (6.1% and 10.3%), followed by AEs (1.7% and 3.8%), and lack of efficacy (1.1% and 4.4%).

Table 10

Patient Disposition (Full Analysis Set).

Exposure to Study Treatments

Exposure data from the TULIP-1 and 2 trials are summarized in Table 11. In the TULIP-1 trial, exposure to the investigational product was similar, but slightly higher on average in the placebo group than in the anifrolumab group throughout the 52 weeks. However, in the case of the TULIP-2 trial, more patients in the anifrolumab 300 mg group were exposed to at least 48 weeks of the investigational product compared with those in the placebo group (85.6% versus 73.1%). Similarly, in the TULIP-2 trial, the total number of patient-years of treatment exposure was higher in the anifrolumab group compared with the placebo group (166.2 versus 155.5 patient-years of exposure, respectively). In addition, more patients in the anifrolumab group of the TULIP-2 trial received 13 infusions in total (70.0% versus 57.1% for the placebo group) compared to those in the TULIP-1 (62.8% versus 70.7%, respectively). For both studies, most patients were on some form of combination therapy for SLE as background, in addition to the study drugs. Missed infusions were not counted. Dose reductions or delays were not discussed in either pivotal trial.

Table 11

Duration of Exposure and Number of Infusions (Full Analysis Set).

Protocol Deviations

Important protocol deviations in both the TULIP-1 and TULIP-2 trials are summarized in Table 12. Overall, rates of protocol deviations were similar in 2 trials and balanced across groups. Between the 2 studies, the treatment arm of the TULIP-2 trial had the fewest number of deviations (2.8%) while the placebo arm had the greatest number of deviations (4.9%).

Table 12

Important Protocol Deviations — TULIP-1 and TULIP-2 (Full Analysis Set).

Efficacy

Only those efficacy outcomes and analyses of subgroups identified in the review protocol are reported here. Results of the subgroup analysis, specifically the subgroup of patients with a high interferon-test result and the subgroup of patients with a baseline OCS dosage of 10 mg/day or higher (i.e., the population matching the reimbursement request), were available for certain end points and will be presented under each respective efficacy outcome. Detailed efficacy data are available in Appendix 3.

Disease Activity

British Isles Lupus Assessment Group-based Composite Lupus Assessment

A summary of BICLA responses for both pivotal trials is presented in Table 13.

In the TULIP-1 trial, BICLA was a secondary end point, and it was not tested for statistical significance. Despite this, treatment response was numerically greater among the anifrolumab group versus placebo group (10.1% treatment difference; 95% CI, 0.6 to 19.7) and this remained true for the individual components of the BICLA as well.

In the TULIP-2 trial, BICLA was the primary end point. There was a statistically significant improvement in BICLA response in the anifrolumab 300 mg group versus the placebo group (47.8% versus 31.5%;16.3% treatment difference; 95% CI, 6.3% to 26.3%; P value = 0.0013). The difference between groups was apparent as early as week 4 of the trial (Figure 6). The hazard ratio for time to BICLA response was 1.55 (95% CI, 1.11 to 2.18). All components of the BICLA contributed to the treatment effect, with a numerically larger proportion of patients in the treatment group meeting each component of the composite outcome compared to the placebo group (Table 13).

Results of the TULIP-2 sensitivity analyses are available in Appendix 3. The results were consistent with the results of the primary efficacy analysis and support the strength of the primary efficacy results. The tipping-point analysis, which examined the impact of nonresponder imputations due to discontinuation of the investigational product, found that 29% of nonresponders (12 of 41) in the placebo group needed to have been altered to responders to tip the conclusion from statistically significant to nonsignificant. The placebo response rate is a likely scenario assuming no additional patient on anifrolumab is considered a responder. However, because it is likely that more than 7 nonresponders could be altered to responders among the 26 discontinued patients in the anifrolumab group, it is unlikely that the results would tip from clinically significant to nonsignificant based on this analysis.

Subgroup Analysis by BICLA Response

A summary of BICLA response stratified by subgroups is presented in Table 14. A key secondary end point of the TULIP-2 trial was achievement of BICLA response through week 52 in the type I interferon high-status subgroup. BICLA response rates were higher in the subgroup of interferon high patients treated with anifrolumab 300 mg compared with placebo (17.3% treatment difference; 95% CI, 6.5 to 28.2, adjusted P value = 0.0022). A tipping-point analysis was used to assess the impact of patients who discontinued the investigational product. In the TULIP-2 trial, 24 and 32 patients in the anifrolumab and placebo groups, respectively, discontinued the product. Based on the tipping-point analysis, 31.3% nonresponders (10 of 32) in the placebo group would need to be altered to responders to tip the conclusion from statistically significant to nonsignificant, if no additional patient in the anifrolumab group is considered a responder. However, because more than 5 responders are likely to be observed among the 24 discontinued patients, it is unlikely for the response to tip from statistically significant to nonsignificant based on this analysis.

In the TULIP-2 trial, a numerically larger proportion of the subgroup of patients with an OCS dosage of 10 mg/day or higher in the anifrolumab group achieved a BICLA response compared to placebo (12% treatment difference; 95% CI, −2.5 to 26.6). Overall, the subgroup analyses (interferon-test result; SLEDAI-2K score at screening) support the results seen in the main analysis of the TULIP-2 trial. In the TULIP-1 trial, high BICLA interferon-test results were consistent with the TULIP-2 trial; other subgroups were not evaluated for BICLA response in the TULIP-1 trial.

Table 13

Summary of Key Response Variables in TULIP-1 and TULIP-2.

Table 14

BICLA Response by Subgroup at Week 52 — Interferon-Test High, SLEDAI Greater Than 10, and OCS Dosage of 10 mg/day or Higher (Full Analysis Set).

Figure 6

Time to BICLA Response TULIP-2 (Full Analysis Set).

Improvement of 4 points or Greater on the Systemic Lupus Erythematosus Responder Index

Key summary results from the pivotal trials are presented in Table 13. SRI-4 response at week 52, the primary end point of the TULIP-1 trial, did not demonstrate statistical significance (4.2% treatment difference; 95% CI, −14.2% to 5.8%; P value = 0.412). Overall, disease activity as measured by SRI-4 response at week 52 in the TULIP-1 trial was similar between patients receiving anifrolumab 300 mg and placebo, as well as the individual components of the SRI-4 response. In the TULIP-2 trial, an SRI-4 at week 52 was a secondary end point and was not tested for statistical difference. Despite this, a numerically larger proportion of patients in the treatment group versus the placebo group achieved an SRI-4 at week 52 (18.2% treatment difference; 95% CI, 8.1 to 28.3).

Improvements by 5, 6, 7, or 8 points on the SRI were not tested for statistical significance in either study; however, results for these end points were variable in the TULIP-1 and TULIP-2 trials (Appendix 3). In the TULIP-1 trial the results of these analysis were variable, while in the TULIP-2 trial, the results of these end points were consistent with the SRI-4 response, with a greater number of patients achieving a response in the anifrolumab group for all SRI values (Table 39).

Results of the TULIP-1 sensitivity analyses are available in Appendix 3. The results for multiple imputations showed a larger proportion of patients in the placebo group achieving an SRI-4 at week 52 compared to the anifrolumab 300 mg group (Table 40). In contrast, the TULIP-2 sensitivity analyses supported the finding that SRI-4 responses was more common in the anifrolumab group (18.5% treatment difference; 95% CI, 8.3 to 28.7) compared to placebo (Table 41).

Subgroup Analysis by SRI-4 Response

A summary of SRI-4 response stratified by subgroups is presented in Table 15. A key secondary end point of the TULIP-1 trial was the achievement of SRI-4 response through week 52 in the type I interferon high-status subgroup. Results were nonsignificant in the TULIP-1 trial (−3.4% treatment difference; 95% CI, −14.4 to 7.6, P = 0.549). In the subgroup of patients with an OCS dosage at baseline of 10 mg/day or higher (i.e., the subgroup matching the reimbursement request), there was a 5.3% higher SRI-4 response in the placebo group versus the anifrolumab group (95% CI, −20.2 to 9.6). Overall, a numerically higher proportion of patients in the placebo group compared to anifrolumab 300 mg achieved an SRI-4 response across all subgroups in the TULIP-1 trial.

The TULIP-2 trial did not statistically assess SRI-4 responses in interferon high patients as it was a secondary end point; however, a larger proportion of patients in the anifrolumab group achieved an SRI-4 response versus the placebo group (20.3% treatment difference; 95% CI, 9.2 to 31.3). In the subgroup of patients with an OCS dosage at baseline of 10 mg/day or higher there was a 16.5% higher response in SRI-4 in the treatment group versus the placebo group (95% CI, 2.6 to 30.4). Overall, the subgroup analysis was consistent with the results from the SRI-4 FAS analysis of the TULIP-2 trial, with a greater response achieved in the anifrolumab group.

Table 15

SRI-4 Response by Subgroup at Week 52 — Interferon-Test High, SLEDAI 10 or Greater, and OCS Dosage 10 mg/day or Higher (Full Analysis Set).

SLEDAI-2K

Key summary results from the pivotal trials are presented in Table 13. In the TULIP-1 and TULIP-2 trials, the mean total SLEDAI-2K scores at baseline were 11.5 (SD = 3.5) and 11.5 (SD = 3.88), respectively, in the placebo group, and 11.3 (SD = 4.04) and 11.4 (SD = 3.64), respectively, in the anifrolumab group. In the TULIP-1 trial, there was virtually no difference in the change from baseline in total SLEDAI-2K score across the anifrolumab and placebo groups at week 52 (0.7% treatment difference; 95% CI, −1.6 to 0.2). The same result was seen in the TULIP-2 trial (−1.2% treatment difference; 95% CI, −2.0 to −0.3). Generally higher improvement rates were observed across the individual domains of the SLEDAI-2K in the anifrolumab 300 mg group compared with the placebo group for both trials (Appendix 3).

British Isles Lupus Assessment Group 2004

Key summary results from the pivotal trials are presented in Table 13. Both trials saw similar improvements from baseline to week 52 in BILAG global scores. In the TULIP-1 trial, the mean changes (improvements) at week 52 from baseline in the BILAG global score were −13.0 (SD = 8.01) and −10.7 (SD = 7.72) in the anifrolumab 300 mg and placebo groups, respectively. In the TULIP-2 trial, the mean changes from baseline were −12.4 (SD = 7.43) and −10.9 (SD = 7.58) in the anifrolumab 300 mg and placebo groups, respectively.

Detailed information on BILAG by A/B versus C/D at baseline and week 52, are presented in Table 42 and Table 43 in Appendix 3. In both trials, the most frequently involved organ systems at baseline were the musculoskeletal and mucocutaneous organ systems, then cardiovascular and renal, in both groups. Numerically higher proportions of patients in the anifrolumab 300 mg group showed improvements in BILAG A, B, or C scores compared with the placebo group starting at week 4 in the musculoskeletal organ system and at week 16 in the mucocutaneous organ systems.

Physician’s Global Assessment

Key summary results from the pivotal trials are presented in Table 13. Mean change in PGA global scores from baseline to week 52 was similar across both study in both trials. In the TULIP-1 trial, the improvements in PGA were slightly higher in the anifrolumab group compared to the placebo group (−0.22 treatment difference; 95% CI, −0.36 to −0.08). In the TULIP-2 trial, the results were the same (−0.15 treatment difference; 95% CI, −0.28 to −0.01).

Maintenance of Oral Corticosteroid Reduction

Results of the key secondary end point of a maintained OCS reduction of up to 7.5 mg/day between week 40 and 52 from the pivotal trials are presented in Table 16. In the TULIP-1 trial, for patients with a baseline OCS dosage of 10 mg/day or higher, there was no statistically significant difference between the anifrolumab (N = 103) and placebo groups (N = 102) (8.9% treatment difference; 95% CI, −4.1% to 21.9%; P value = 0.180) on maintained OCS dose reduction. In the TULIP-2 trial, a statistically significant difference was observed in the anifrolumab group (N = 87), with 51.5% of patients able to taper their OCS dosage from 10 mg/day or higher to 7.5 mg/day or lower at week 40 and maintain this lower dosage through week 52 versus 30.2% in the placebo group (N = 83) (21.2% treatment difference; 95% CI, 6.8 to 35.7; adjusted P value = 0.0135). The mean changes from baseline in OCS dosage to week 52 are portrayed in Figure 7.

A tipping-point analysis was conducted to examine the impact of nonresponders (e.g., patients treated with restricted medication beyond protocol-allowed thresholds, including those with an increase in their OCS dose after week 40, and those who discontinued the investigational product) on the results. Given that the TULIP-1 trial did not have a statistically significant result, sensitivity analyses were not performed on this key secondary end point. In the TULIP-2 trial, 13 and 26 patients in the anifrolumab and placebo groups, respectively, discontinued the investigational product, without having received restricted medication before discontinuation. The tipping-point analysis shows that 23.1% (6 of 26) of these discontinued placebo patients would have to be altered from nonresponders to responders to tip the conclusion from statistical significance to nonsignificance, assuming that no additional patient on anifrolumab is considered a responder. The placebo response rate is likely, and it is also likely to observe more than 8 responders among the 13 discontinued patients in the anifrolumab group. A shift to nonsignificance may occur with ease based on this analysis.

Subgroup Analysis by Maintained Oral Corticosteroid Dose Reduction

A summary of maintained OCS dose reduction stratified by subgroups is presented in Table 16. In the subgroup of patients with a high result on an interferon test in the TULIP-1 trial, the number of patients who maintained an OCS dosage reduction to no more than 7.5 mg/day in the subgroup of patients with an OCS dosage of 10 mg/day or higher was 10% higher (95% CI, 3.8 to 23.9) in the anifrolumab group (N = 90) versus the placebo group (N = 86). In the TULIP-2 trial, maintained OCS dosage reduction was 21.2% higher in the anifrolumab group (N = 78) versus the placebo group (N = 73). However, given the small sample sizes and exploratory nature of this analysis, the results should be interpreted with caution. Subgroups of patients with SLEDAI-2K below 10 points and 10 points or higher had a higher proportion of patients (5.8% treatment difference; 95% CI, −20.4 to 31.9; and 10.1% treatment difference, 95% −4.8 to 25.1, respectively) maintaining OCS dosage reduction in the anifrolumab group (N = 29 and 74 respectively) versus the placebo group (N = 25 and 77 respectively). The results were similar in the TULIP-2 trial, with a higher number of patients in the anifrolumab group maintaining an OCS dosage reduction for both subgroups of SLEDAI-2K at screening.

Table 16

Maintained OCS Reduction of 7.5mg/day or Lower from Week 40 to Week 52 in Patients With OCS Dosage of 10 mg/day or Higher at Baseline Results in TULIP-1 and TULIP.

Figure 7

OCS Dose (mg) in Patients with a Baseline OCS of 10 mg/day or Higher, Mean Change From Baseline by Time Point in TULIP-2 Full Analysis Set).

Patient-Reported Outcomes at Week 52

Short Form (36) Health Survey (Acute Recall)

Summary scores for the SF-36 from the pivotal trials can be found in Table 17. Increasing scores on the SF-36 questionnaire indicate improved function. According to the sponsor, clinically meaningful thresholds of change were 3.4 points for the PCS and 4.6 for the MCS. According to the literature, anchor-based MIDs are 2.1 to 2.4 for either summary score.³⁹ In the TULIP-1 and TULIP-2 trials, an average meaningful change from baseline to week 52 was seen in the anifrolumab 300 mg group (N = 140 for TULIP-1; N = 132 for TULIP-2) for the PCS (3.57 with a standard error of [SE] of 0.67, and 3.93 [SE = 0.65], respectively), but not in the placebo group or MCS for either group.

In the TULIP-1 trial, at week 52, the proportion of MCS responders (change of at least 4.6 points) in the anifrolumab 300 mg group compared with the placebo group was 20.9% versus 16.7% (4.2% difference; 95% CI, −4.1 to 12.6); the proportion of PCS responders (change of at least 3.4 points from baseline) in the anifrolumab 300 mg group was lower compared with the placebo group by 25% versus 26.7% (−1.7% difference; 95% CI, −10.9 to 7.5).

In the TULIP-2 trial, at week 52, the proportion of MCS responders in the anifrolumab 300 mg group compared with the placebo group was 27.4% versus 21.2%, respectively (6.2% difference; 95% CI, −2.71 to 15.2). The proportion of PCS responders in the anifrolumab 300 mg group compared with the placebo group was 32.8% versus 24.4%, respectively (8.4% difference; 95% CI, −1.1 to 17.8).

Overall, the difference in responses between the treatment groups was minimal in both trials.

Functional Assessment of Chronic Illness Therapy–Fatigue

Fatigue was measured using the FACIT-F scale. Increasing total scores from the FACIT-F questionnaire (0 to 52) indicate decreasing severity of fatigue. A patient who displayed an improvement of more than 3 points was considered a responder.

In the TULIP-1 trial, a slightly higher proportion of patients in the anifrolumab 300 mg group had reduced fatigue at week 52, as measured by FACIT-F responder rate (improvement from baseline to week 52 of > 3 points), compared with the placebo group (29.3% versus 26.8%, respectively; 2.4% difference; 95% CI, −0.9 to 17.9).

In the TULIP-2 trial, a numerically higher proportion of patients in the anifrolumab 300 mg group had reduced fatigue at week 52, as measured by FACIT-F responder rates compared with the placebo group (33.2% versus 24.7%, respectively; difference = 8.5%; 95% CI, 6.9 to 11.8).

There was no notable difference in the TULIP-1 or TULIP-2 trial in change from baseline to week 52 between groups (TULIP-1: 5.7 anifrolumab versus 3.7 placebo; 2% difference; 95% CI, −0.3 to 4.3; TULIP-2: 3.7 versus 2.5; 1.2% difference; 95% CI, −1.0 to 3.4).

Lupus Quality-of-Life

Results from the Lupus QoL questionnaire are presented in Table 17. Increasing scores indicate improvement. In both trials, Lupus QoL domain scores at baseline were similar across treatment groups. The changes (increases) from baseline in Lupus QoL domain scores were similar between the anifrolumab 300 mg and placebo groups at week 52 across all domains.

5-Level EQ-5D

Results from the EQ-5D-5L are presented in Table 17. Increasing scores in EQ-5D-5L (single summary utility index [where 1.0 is highest score] and EQ VAS [0 to 100]) indicate improvement. In both trials, improvements in QoL as measured by change from baseline in EQ-5D-5L were similar between the treatment groups.

In the TULIP-1 trial, patients in the anifrolumab group experienced numerically greater improvements in QoL at week 52 as measured by change from baseline in EQ-5D-5L compared with patients in the placebo group. For the single summary utility index, the change from baseline was 0.107 (SD = 0.21) (from a baseline value of 0.596) at week 52 for the anifrolumab group (N = 130) versus 0.069 (from a baseline value of 0.613) in the placebo group (N = 138). Mean changes in EQ VAS scores at week 52 were 13.4 (from a baseline score of 53.3) in the anifrolumab group and 8.3 (from a baseline score of 54.6) in the placebo group.

Similarly, in the TULIP 2 trial, for the single summary utility index, the mean change from baseline at week 52 was 0.057 (from a baseline value of 0.630) for the anifrolumab group versus 0.047 (from a baseline value of 0.591) in the placebo group. Mean changes in EQ VAS scores at week 52 were 8.1 (from a baseline score of 58.1) in the anifrolumab group and 4.3 (from a baseline score of 56.6) in the placebo group.

Pain Numerical Rating Score

Results from the pain NRS are presented in Table 17. Decreasing scores from the NRS VAS (0 to 10) indicate decreased pain.

In the TULIP-1 trial, The mean NRS VAS scores at baseline were similar between treatment groups (5.7 and 5.4 in the anifrolumab 300 mg and placebo groups, respectively). At week 52, the mean change (decrease) from baseline in NRS VAS scores was similar in the anifrolumab 300 mg group compared with the placebo group (−0.1 versus −0.8; −0.3 difference; 95% CI, −0.8 to 0.3).

In the TULIP-2 trial, The mean NRS VAS scores at baseline were similar between treatment groups (5.2 and 5.5 in the anifrolumab 300 mg and placebo groups, respectively). At week 52, the mean change (decrease) from baseline in NRS VAS scores was similar in the anifrolumab 300 mg group compared with the placebo group (−0.9 versus −0.7; −0.3 difference; 95% CI, −0.8 to 0.3).

Table 17

Patient-Reported Outcomes in TULIP-1 and TULIP-2.

Mortality

There were 2 deaths (0.5%) in the TULIP-1 trial, 1 in each treatment arm, and 1 death (0.27%) in the TULIP-2 trial in the anifrolumab group. These deaths were not considered by the investigator to be related to the treatment.

Measure of Organ Damage, Systemic Lupus International Collaborating Clinics/American College of Rheumatology Damage Index

SDI global scores and mean changes from baseline to week 52 are summarized in Table 18. The mean changes in SDI at week 52 were small and similar between the anifrolumab 300 mg and placebo groups across both trials. Mean changes at week 52 were 0.1 (SD = 0.30) in the anifrolumab 300 mg group and 0.1 (SD = 0.24) in the placebo group for both trials. The number of patients with increased damage was low in both treatment groups.

Table 18

SDI Score at Baseline and Week 52 in TULIP-1 and TULIP-2 (Full Analysis Set).

Cutaneous Lupus Erythematosus Disease Area and Severity Index Activity

Results of the key secondary end point, a reduction of 50% or greater in CLASI activity from baseline to week 12 in patients with a baseline CLASI activity score of 10 or higher, from the pivotal trials is presented in Table 19. In the TULIP-1 trial, for patients with baseline CLASI activity score of 10 or higher (n = 142), the difference in response rates were nonsignificant (P value = 0.054). However, the difference in response rates was numerically higher in the anifrolumab 300 mg group compared with the placebo group at 12 weeks (41.9% versus 24.9% respectively), with 17.0% (95% CI, −0.30% to 34.3%) more patients able to achieve a reduction of 50% or greater from baseline in CLASI activity score.

In the TULIP-2 trial, for patients with baseline CLASI activity score of 10 or higher (n = 89), the difference in response rates was statistically significant, with 24% more patients (95% CI, 4.3% to 43.6%, adjusted P value = 0.0392) able to achieve a reduction of 50% or greater from baseline in CLASI activity score in the anifrolumab 300 mg group compared with the placebo group at week 12. Interpretation of the tipping-point analysis was limited, given the small number of patients who discontinued the investigational product (1 and 3 patients on anifrolumab and placebo, respectively).

Subgroup Analysis by CLASI Activity

A summary of CLASI activity stratified by subgroups is presented in Table 19. In the subgroup of patients with an OCS dosage at baseline of 10 mg/day or higher (i.e., the subgroup matching the reimbursement request), the CLASI response at week 52 was 13.3% higher (95% CI, −14.6 to 41.3) in the anifrolumab group (N = 20) versus the placebo group (N = 21) in the TULIP-1 trial; the CLASI response at week 52 was 50.6% higher (95% CI, 19.0 to 82.2) in the anifrolumab group (N = 17) versus the placebo group (N = 12) in the TULIP-2 trial. In general, similar response was seen within each subgroup (interferon-test result; SLEDAI-2K score at screening) and all analyses had a numerically higher proportion of patients achieving the CLASI end point compared to the placebo group, except for the interferon-test low group of TULIP-1 (−2.5% treatment difference; 95% CI, −41.8 to 36.8). However, given the small sample sizes and exploratory nature of this analysis, the results should be interpreted with caution.

Table 19

Summary of CLASI Activity (≥ 50% Reduction from Baseline to Week 12) in Patients With Baseline CLASI Activity Score ≥ 10 and Subgroup Analysis — TULIP-1 and TULIP-2.

Joint Reduction Rate

Key summary results for joint response in the pivotal trails are presented in Table 20. In the TULIP-2 trial, a key secondary end point was a reduction of 50% or greater in joint swelling and tender joints in patients with at least 6 swollen and 6 tender joints at baseline at week 52. The results of this analysis were not statistically significant and there was no notable difference between treatments in the proportion of patients with at least a 50% reduction in swollen and tender joint counts at week 52 (4.7% difference; 95% CI, −13.5 to 17.6; P value = 0.5469). The results were also nonsignificant for the proportion of patients with at least a 20% reduction in swollen and tender joints at week 52. In the TULIP-1 trial, for the supporting secondary end point of the number of patients with at least a 20% or 50% reduction in at least 8 swollen and at least 8 tender joints at baseline, numerically higher proportions of patients in the anifrolumab 300 mg group compared with the placebo group achieved at least a 20% reduction in swollen and tender joint counts (6.7% treatment difference; 95% CI, −9.7, 23.1) and a 50% reduction in swollen and tender joint counts (14.7 treatment difference; 95% CI, −9.7 to 23.1) and a 50% reduction in swollen and tender joint counts (difference 14.7%; 95% CI, −1.4, 30.8).

Subgroup Analysis by Joint Reduction Rate

The results of the subgroup analysis for joint reduction rate are presented in Table 20. Overall, the results of joint reduction rate by subgroup were variable and inconsistent. Given the exploratory nature of these subgroup analyses and the small sample sizes, the results should be interpreted with caution.

Table 20

Summary of Joint Reduction Rate Among Patients With at Least 6 Swollen and 6 Tender Joints at Week 52 and Subgroup Analysis — TULIP-1 and TULIP-2.

Lupus Low Disease Activity State

In the TULIP-1 trial, the proportion of patients who achieved an LLDAS response at week 52 was generally similar in the anifrolumab 300 mg group compared with the placebo group (15.0% versus 10.4%; difference 4.6%; 95% CI, −2.9 to 12.1). In the TULIP-2 trial, the proportion of patients who achieved an LLDAS response at week 52 was numerically higher in the anifrolumab 300 mg group compared with the placebo group (14.9% versus 8.8%; difference 6.1%; 95% CI, −1.2 to 13.4).

Annual Flare Rate

Summary results for the key secondary end point of annualized flare rate from the pivotal trials are presented in Table 21. The annualized rate of flares through week 52 was numerically lower in the anifrolumab 300 mg group compared with the placebo group in the TULIP-1 trial (0.60 versus 0.72, respectively) and TULIP-2 (0.43 versus 0.64, respectively). In the TULIP-1 trial, this difference was nonsignificant (0.83 ratio difference; 95% CI, 0.60 to 1.14; P value = 0.258). Similarly, in the TULIP-2 trial the difference was also nonsignificant, with a greater response in the placebo group (0.67 ratio difference; 95% CI, 0.48 to 0.94; adjusted P value = 0.0809). In the TULIP-1 trial, the total follow-up time was similar between groups, whereas in the TULIP-2 trial the total follow-up time was longer in the anifrolumab 300 mg group compared with the placebo group. In the TULIP-2 trial, 31.1% of patients in the anifrolumab 300 mg group had a flare during the study compared with 42.3% of patients in the placebo group.

Flare severity was also captured by the studies using a modified SFI and were presented as part of the safety analysis (as detailed in the Harms section).

A summary of sensitivity analyses is presented in Appendix 3. Sensitivity analyses were not conducted in the TULIP-1 trial for this end point because the primary end point was not met. In the TULIP-2 trial, the results of the sensitivity analyses among the matrix of different flare rates after discontinuation of the investigational product showed that the estimated flare rates were consistent with the primary analysis and robust to the missing-data assumptions.

Subgroup Analysis by Annual Flare Rate

In the subgroup analyses for patients with an OCS dose at baseline of 10 mg/day or higher, the annualized flare rate was lower in the anifrolumab group than the placebo group for both the TULIP-1 trial (0.79 ratio rate group difference; 95% CI, 0.53 to 1.18) and the TULIP-2 trial (0.52 ratio rate group difference; 95% CI, 0.33 to 0.82). Overall, the subgroup analysis for flare rates was generally consistent with that for the overall population, with no notable differences in flare rates in the anifrolumab 300 mg group through week 52 compared to the placebo group. However, given the exploratory nature of these subgroup analyses and the small sample sizes, the results should be interpreted with caution.

Table 21

Summary of Annualized Flare Rate and Subgroup Analysis in TULIP-1 and TULIP-2.

Subgroups

The results of each subgroup analysis are presented under the respective efficacy outcomes above. The main subgroup analyses were interferon-test status (high versus low), SLEDAI-2K score at screening (< 10 points versus ≥ 10 points), and baseline OCS dosage (< 10 mg/day versus ≥ 10 mg/day prednisone or equivalent). No statistical testing was conducted to compare the subgroups. Overall, results in each subgroup were similar and no notable differences were observed within each subgroup. The results of the subgroup analyses should be interpreted with caution given the lack of statistical testing within subgroups, small sample sizes, and the exploratory nature of these analyses.

In terms of differences between the anifrolumab group and placebo, the results were mixed across subgroups in the TULIP-1 trial. The SRI-4 by subgroups of interferon-test high, and OCS dosage of 10 mg/day or higher at baseline for the SRI-4 had a nonsignificant difference between anifrolumab and placebo groups. In the TULIP-2 trial, BICLA response by the interferon-test high result and OCS dosage of 10 mg/day or higher demonstrated a statistically significant difference between groups. Overall, the results of the subgroup analyses for the remaining outcomes in the TULIP-2 trial included a numerically higher proportion of patients in the anifrolumab group achieving efficacy outcomes such as SRI-4 and BICLA compared to placebo.

Harms

Only those harms identified in the review protocol are reported. Table 22 provides detailed harms data.

Adverse Events

Rates of AEs were similar across treatment groups and across the pivotal trials (approximately 85% to 90% prevalence in both pivotal trials). In the TULIP-1 and TULIP-2 trials, the most common AEs were nasopharyngitis (20.0% and 15.6% in the anifrolumab 300 mg group versus 12.0% and 11% in the placebo group, respectively), upper respiratory tract infection (12.2% and 21.7% versus 9.8% and 9.9%), and urinary tract infection (12.2% and 11.1% versus 14.7% and 13.7%). Infusion-related reactions were also common in the treatment arm of the TULIP-2 trial (13.9%).

In the TULIP-1 trial, the most frequently reported AEs considered by the investigator to be related to the investigational product were infusion-related reactions (7.8% in anifrolumab 300 mg group versus 7.1% in placebo group), herpes zoster with cutaneous presentation (5.0% versus 0%, respectively), and hypersensitivity (5.6% versus 1.1%). The investigator considered the majority of the AEs to be unrelated to the investigational product; however, AEs considered to be related to the investigational product occurred more often in the anifrolumab 300 mg group compared to the placebo group during the study (30.6% versus 22.8%, respectively)

In the TULIP-2 trial, the most frequently reported AE during the treatment period considered by the investigator to be related to the investigational product was herpes zoster (6.1% in anifrolumab versus 0% in placebo). The investigator considered most AEs to be unrelated to the product; however, AEs considered by the investigator to be related to the investigational product occurred more often in the anifrolumab 300 mg group compared to the placebo group during the study (45% versus 30.2%).

Serious Adverse Events

SAEs were more common in the placebo group versus the anifrolumab group across the TULIP-1 and TULIP-2 trials (13.9% versus 16.3% and 8.3% versus 17%, respectively). In the TULIP-1 trial, the most common SAEs were SLE (1.7% and 1.6%, respectively) and pneumonia (1.7% and 0.5%). In the TULIP-2 trial, the most common SAE was pneumonia (1.7% and 3.8%), followed by SLE (0.6% and 3.3%).

Withdrawal due to Adverse Events

In the TULIP-1 trial, withdrawals were greater in the anifrolumab group versus the placebo group (6.7% versus 3.8%), whereas withdrawals were lower in the anifrolumab group compared to placebo group in the TULIP-2 trial (2.8% versus 7.7%). In the TULIP-1 trial, the most common reason for withdrawal in the anifrolumab group was herpes zoster (1.1%). In the TULIP-2 trial, the most common reason for withdrawal in the placebo group was SLE (1.6%), followed by pneumonia (1.1%).

Mortality

There were 2 deaths during the TULIP-1 trial and 1 death during TULIP-2. One patient in the anifrolumab 300 mg group of each trial had a fatal SAE of pneumonia during the treatment period. In the TULIP-1 trial, 1 patient in the placebo group had a fatal SAE of encephalitis during the follow-up period. The investigator did not find these deaths to be associated by the investigational product.

Notable Harms

In the TULIP-1 trial, notable harms (as outlined in the CADTH protocol) included hypersensitivity reactions (6.1% for anifrolumab 300 mg versus 1.1% for placebo), infusion-related reactions (8.9% versus 7.1%), herpes zoster (5.6% versus 1.6%), serious, nonopportunistic infections (5.0% versus 4.3%), malignancies (1.7% versus 0.5%), depression (2.8% versus 2.7%), and suicidal ideation or behaviour (1.1% versus 1.6%)

In the TULIP-2 trial, notable harms included infusion-related reactions (13.9% for anifrolumab 300 mg versus 7.7% for placebo), herpes zoster (7.2% anifrolumab 300 mg versus 1.1%, placebo), serious, nonopportunistic infections (2.8% versus 5.5%), hypersensitivity (1.1% versus 0.5%), malignancies (0% versus 0.5%), depression (2.8% versus 1.6%), and suicidal ideation or behaviour (1.7% versus 4.4%). Herpes zoster was more common among patients in the anifrolumab group across both trials, but none were considered SAEs.

Depression was measured using PHQ-8 scores. A score of 5 to 9 indicates mild depression and a score of 10 to 14 indicates moderate depression. In both trials, results were similar between treatment groups. No clinically meaningful changes from baseline were observed for any treatment group, with small and similar decreases observed over 52 weeks of treatment across both trials. In the TULIP-1 trial, changes from baseline were similar between groups. PHQ-8 scores at week 52 from baseline decreased by 2.1 and 1.7 points in the anifrolumab group (baseline score = 10.1) versus the placebo group (baseline score = 9.4). In the TULIP-2 trial, PHQ-8 scores at week 52 from baseline decreased by 1.4 and 0.9 points in the anifrolumab group (baseline score = 9.2) versus the placebo group (baseline score = 9.9).

Suicidal ideation and behaviour were measured using the C-SSRS. In the TULIP-1 trial, 2 patients in the anifrolumab group (1.1%) and placebo group (1%) each experienced suicidal ideation during the treatment period; 1 report of suicidal behaviour (actual nonfatal attempt) was documented in the placebo group. During the follow-up period, 1 patient in the placebo group had suicidal ideation. In the TULIP-2 trial, 3 (1.7%) patients in the anifrolumab 300 mg groups versus 8 (4.4%) patients in the placebo group had suicidal ideation during the treatment period; no patients in either treatment group exhibited suicidal behaviour. Overall, few patients reported expressing suicidal ideation or suicidal behaviour at any time during the studies, with no imbalance observed among treatment groups.

The proportion of patients with flares and severity was measured by the modified SFI. In the TULIP-1 trial, there were numerically fewer patients in the anifrolumab 300 mg group (32.2%) compared with the placebo group (36.4%). The proportions of patients with at least 1 mild or moderate flare after initiation of the investigational product treatment were 31.1% in the anifrolumab 300 mg groups versus 32.6% in the placebo group; the proportions of patients with at least 1 severe flare after initiation of the investigational product treatment was 2.8% in the anifrolumab 300 mg groups versus 5.4% in the placebo group. In the TULIP-2 trial, flares were numerically lower for the anifrolumab 300 mg group compared with the placebo group (33.5% versus 38.5%). The proportions of patients with at least 1 mild or moderate flare after initiation of the investigational product treatment were 32.2% in the anifrolumab 300 mg group versus 36.8% in the placebo group; the proportions of patients with at least 1 severe flare after initiation of the investigational product treatment were 1.7% in the anifrolumab 300 mg group and 3.8% in the placebo group.

Table 22

Summary of Harms (Full Analysis Set).

Critical Appraisal

Internal Validity

A number of factors between the 2 pivotal trials contributed to bias or general uncertainty of the outcomes. The primary outcomes for TULIP-1 and TULIP-2 were the composite scores of SRI-4 and BICLA, respectively. The decision to switch the primary end point in the TULIP-2 trial was based on the results of the TULIP-1 and MUSE trials and this decision was made before the unblinding of the data in the TULIP-2 trial at week 52. The risk of operational bias is therefore low. As both trials followed the same procedures for blinding, database locking, unblinding, and data analysis, concerns for potential investigator bias are low. The risk of confounding variables was accounted for through stratification (e.g., SLEDAI-2K score at screening, baseline OCS dose, and type I interferon gene signature test results). Baseline imbalances of these factors could affect efficacy and/or safety assessments of anifrolumab versus placebo. Overall baseline characteristics and disease activity scores (e.g., CLASI activity and SLEDAI-2K scores) were generally similar and balanced between groups across both trials; however, there was a greater percentage of patients with a CLASI damage score of 10 or higher in the treatment group compared to placebo in the TULIP-2 trial (8.9% versus 4.4% respectively) versus the TULIP-1 trial (6.1% versus 4.3%), which could potentially allow for greater leaps in improvement in patients with more severe disease for this outcome. Other concerns include potential ceiling effects for patients with lower disease activity scores (e.g., a patient with a baseline SLEDAI-2K score of 6 would be less likely to achieve a 4-point drop compared with someone who starts with a score of 12). The administration of the investigational product and measurement of variables were standardized between both pivotal trials. A disease adjudication committee was utilized to ensure the quality and accuracy of disease activity measurements by the investigator and confirm eligibility of each patient during the screening period.

In the TULIP-1 trial, there were similar rates of withdrawal in both study arms (18.9% anifrolumab versus 19% placebo) while discontinuation was much lower in the treatment group of the TULIP-2 trial versus the placebo group (13.3% versus 25.3%, respectively). Discontinuations were primarily due to patient request, an AE, lack of efficacy, and condition under investigation worsened. In the TULIP-2 trial, a slightly higher proportion of patients requested to discontinue in the placebo group (10.4%) than in the anifrolumab group (6.1%) and there were also more patients in the placebo group who withdrew due to AEs (3.8% versus. 1.7%) and lack of efficacy (4.4% versus 1.1%) before the end of the study.

The sponsor adhered to its established statistical testing hierarchy for the multiplicity adjustment, testing outcomes in sequence. Sensitivity analyses and multiplicity adjustments were only conducted in the TULIP-2 trial because TULIP-1 did not meet its primary end point. The sponsor used a nonresponder imputation approach in which, if a patient who withdrew from the study or received restricted medications beyond the protocol-allowed threshold, such a patient would be considered a nonresponder. With this approach, when more patients withdrew in the placebo group, this may have biased the results in favour of anifrolumab as these patients would be considered nonresponders whether they were responding at the time of withdrawal or not. The sensitivity analyses performed by the sponsor support the findings of its primary analysis of TULIP-2, using approaches such as LOCF as well as tipping-point analyses. The LOCF method was also used to impute missing data in cases for which individual components of the primary composite outcome were missing. Missing data were more common in the BILAG-2004 component for both studies.

In terms of the difference between subgroups (e.g., SLEDAI-2K score at screening [< 10 points versus ≥ 10 points], baseline OCS dose [< 10 mg/day versus ≥ 10 mg/day prednisone or equivalent] and type I interferon-test result at screening [high versus low]), no hypotheses were provided, and therefore they can only be considered to be hypothesis-generating. In addition, given the exploratory nature of these subgroup analyses and the small sample sizes, the results should be interpreted with caution.

HRQoL, specifically symptoms such as fatigue and mental health, was identified as an important outcome by the patient and clinician groups providing input for this review. MIDs were provided by the sponsor for the SF-36 MCS and FACIT-F, which were in line with thresholds reported in the literature. Although numerical improvements were seen in the treatment group versus the placebo group in both trials for fatigue, and for the PCS and MCS of the SF-36, HRQoL results were not clinically meaningful. In general, no conclusions could be drawn based on the HRQoL data from either trial due to several limitations. Given the overlapping CIs, the small magnitude of change and difference between groups, and the lack of statistical testing and a definition of what constituted a clinically meaningful response for many of the outcome measures, it is not possible to draw conclusions with precision based on the available data.

External Validity

The clinical expert consulted by CADTH agreed that the baseline patient characteristics of the TULIP-1 and TULIP-2 trials were reflective of patients seen in Canadian clinical practice for the present indication. Although the majority of patients in each study were enrolled in trial sites from the US and Europe, the population enrolled in the trial was consistent with the population expected to be treated in Canadian clinical practice. The clinical expert noted that prescribing patterns may differ between countries (e.g., higher use of nervous system medication; or use of mizoribine, which is not prescribed in Canada); however, no different treatment effect would be expected based on different disease-management practices. Additionally, ACR criteria were used to identify patients with SLE in both trials, and these are rigorous criteria that are designed for use in clinical trials, rather than clinical practice. There is therefore a higher risk of misdiagnosis of SLE occurring in clinical practice, although the clinical expert consulted by CADTH noted that a diagnosis of SLE should be straightforward for clinicians with specialty training. Furthermore, the subgroup analyses (e.g., interferon-test high versus low) had no statistical comparisons and even smaller sample sizes, which limits the generalizability to a broader population.

According to the clinical expert, improvements in organ damage or other longer-term outcomes (e.g., mortality) while on anifrolumab are unlikely to be detected during a 52-week double-blind treatment phase because of insufficient duration. The composite primary outcome, patients with an SRI-4 or BICLA response, is not something that would be routinely used to assess patient status in clinical practice. However, the components of the composite would be an important part of the assessment of patients with SLE (e.g., clinical SLEDAI score). As anifrolumab has not been studied versus an active comparator, the efficacy and harms of this drug compared to the addition of other drugs used in the treatment of SLE is unknown. A variety of drugs are used chronically to manage SLE, none of which were specifically developed to manage this disease.

Indirect Evidence

A focused literature search for network meta-analyses dealing with SLE was run in MEDLINE All (1946–) on February 28, 2022. No limits were applied to the search. No relevant studies were identified.

Other Relevant Evidence

This section considers 2 submitted studies provided in the sponsor’s submission to CADTH to address the long-term efficacy of the treatment under review. These include a phase II, multinational, multicentre, randomized, double-blind, placebo-controlled study (MUSE)¹¹ and a phase II, single-arm, open-label, LTE study to evaluate the long-term safety of anifrolumab (Study 1145).¹²

MUSE

MUSE was a phase II study conducted to evaluate the efficacy and safety of anifrolumab in adult patients with chronic, moderately to severely active SLE.

Methods

MUSE was a phase II, multicentre, randomized, double-blind, placebo-controlled, parallel-group study to evaluate the efficacy and safety of 2 IV treatment regimens in adult patients with chronic, moderately to severely active SLE with an inadequate response to standard of care. Approximately 300 patients were to be randomized in a 1:1:1 ratio to receive a fixed IV dose of anifrolumab (300 or 1,000 mg) or placebo every 4 weeks for 48 weeks.

Results for the anifrolumab 1,000 mg group will not be described in this report given that it is not a Health Canada–recommended dose.

Randomization was stratified by SLEDAI-2K score at screening (< 10 points versus ≥ 10 points), day 1 OCS dose (< 10 mg/day versus ≥ 10 mg/day of prednisone or equivalent), and the results of a type I interferon signature test (positive versus negative). The trial assessed the efficacy of anifrolumab compared to placebo at week 24 and week 52 and the effect of anifrolumab compared to placebo in reducing background OCS dosage, with the same tapering protocol as in the pivotal studies. Safety assessments consisted of reporting all AEs, including TEAEs, and SAEs, as well as AESIs.

Populations

In the MUSE study, inclusion and exclusion criteria were consistent with the pivotal TULIP-1 and TULIP-2 clinical trials. A total of 203 of 626 screened patients with chronic, moderately to severely active SLE were randomized into either the placebo (n = 103) or 300 mg anifrolumab (n = 100) groups at 73 sites in 14 countries in North and South America, Europe, and Asia. Baseline demographics were generally similar between the anifrolumab and placebo groups and they were consistent with the pivotal trials. Most patients were ≤ 45 years of age, female, and white. There were numerically fewer patients from Asia in the anifrolumab group (12.7%) compared to the placebo group (3.0%). At screening before randomization, slightly more patients (62.7%) in the placebo group received high-dose corticosteroids (≥ 10 mg/day) compared to the anifrolumab group (55.6%). In terms of disease severity, baseline values for the SLEDAI-2K, SDI, and CLASI were consistent with those in the pivotal trials.

Table 23

Summary of Baseline Characteristics for MUSE (Modified ITT Population).

Outcomes

The primary efficacy end point for this study was the proportion of patients who at day 169 achieved an SRI-4 response as defined in the TULIP-1 trial. Patients who were unable to taper their OCS dosage to less than 10 mg/day and to less than the day 1 dose of prednisone or equivalent by day 85 and maintain an OCS dosage of less than 10 mg/day and less than the day 1 dose until day 169 were declared nonresponders for the primary end point. Subgroup analyses included the proportion of patients who tested positive for a type I interferon signature achieving an SRI-4 response with OCS tapering. Secondary efficacy end points included the proportion of patients achieving an SRI-4 response at day 365 and the proportion of patients on 10 mg/day or higher dosage of oral prednisone (or equivalent) at baseline who were able to taper to no more than 7.5 mg/day at day 365.

Other efficacy outcomes were also assessed in the MUSE trial; however, they are not reported further in this review given that they were assessed as exploratory efficacy outcomes. These included: subgroup analysis of efficacy and safety based on type I interferon test (high and low), proportion of patients with a CLASI activity score of 10 or higher at baseline who achieve a reduction of 50% or greater, proportion of patients who achieve an improvement of more than 3 points in the FACIT-Fatigue score, proportion of patients achieving an SRI-4 or greater response with or without OCS tapering, change from baseline in BICLA, SLEDAI-2K, clinical SLEDAI, BILAG-2004, SLE flares, SELENA-SLEDAI modification of the Physician’s Global Assessment(MDGA), OCS use, painful, swollen and tender joint count, Systemic Lupus International Collaborating Clinics (SLICC)/ACR, SF-36, Health Assessment Questionnaire, pain VAS score, EQ-5D, Lupus QoL, PGA, C3 and C4 complement proteins, and total hemolytic (CH50) complement levels.

Safety outcomes included TEAEs, SAEs, and AESIs.

Statistical Analysis

The primary analyses consisted of all efficacy and safety data collected through day 169. All efficacy analyses were conducted on the modified intention-to-treat (mITT) population, which consists of all patients who received at least 1 dose of the investigational product. The primary end point was analyzed by a logistic regression model comparing anifrolumab doses versus placebo. The independent variables in the model included treatment groups and stratification factors, including the SLEDAI-2K score at screening (< 10 points versus ≥ 10 points), OCS usage at baseline (≥ 10 mg/day versus < 10 mg/day of prednisone or equivalent), and the result of the interferon test at screening (positive versus negative). The primary analyses were evaluated in 2 study populations: the overall mITT population and the subpopulation of patients with a high result on a type I interferon test at screening. In the primary analyses, multiplicity was controlled for in the dose comparisons within each of the 2 study populations using the Cochran-Armitage trend test. Multiplicity was not controlled for across the 2 study populations. For the primary analyses, patients with missing primary or secondary end point data were imputed as nonresponders for that end point. A relevant subgroup analysis for the primary end point was performed based on interferon gene diagnostic test (positive versus negative) using univariate logistic regression. Secondary end points were analyzed by a logistic regression model in the overall population and the diagnostic-positive subpopulation without controlling for multiplicity. A 2-sided significance level of 0.10 was used.

Patient Disposition

Patient disposition of the extension study is summarized in Table 24 according to the mITT population. A total of 626 patients were screened and 307 patients were randomized into the placebo (n = 103) or anifrolumab (n = 100) groups. Totals of 25.2% and 16.0% of patients in the placebo and anifrolumab groups discontinued the study, respectively, mainly due to other reasons. All 307 randomized patients were included in the ITT population, and all but 2 patients who did not receive the investigation product (1 in either group) were included in the mITT and safety populations. One patient randomized to placebo received a 1,000 mg dose of anifrolumab and was removed from the placebo group for the safety analyses.

Table 24

Patient Disposition in the MUSE Study.

Exposure to Study Treatments

In MUSE, through to week 52, the total number of patient-years of exposure was 93.4 for the anifrolumab group and 84.3 for the placebo group. A higher proportion of patients in the anifrolumab group (65.7%) received the full course of treatment (13 doses) compared with those in the placebo group (53.5%).

The proportions of patients on a 10 mg/day or higher dosage of oral prednisone (or equivalent) at baseline who were able to taper to no more than 7.5 mg/day at day 169 and day 365 were 45.5% and 56.4% for the anifrolumab group and 25.0% and 26.6% for the placebo group, respectively.

Efficacy

SRI-4 Response With Oral Corticosteroid Tapering at Week 24

A total of 34.3% of patients had an SRI-4 response with OCS tapering at week 24 in the anifrolumab group compared to 17.6% in the placebo group, with an OR of 2.38 (90% CI, 1.33 to 4.26). The difference was statistically significant, with a P value of 0.014.

Proportion of Patients With a High Type I Interferon-Test Result Who Had an SRI-4 Response With OCS Tapering at Week 24

The proportion of patients with a high type I interferon-test result who had an SRI-4 response with OCS tapering at week 24 was 36.0% for the anifrolumab group and 13.2% for placebo group with an OR of 3.55 (90% CI, 1.72 to 7.32). The difference was statistically significant, with a P value of 0.034.

SRI-4 Response With Oral Corticosteroid Tapering at Week 52

For this secondary end point at week 52, a total of 51.5% of patients had an SRI-4 response with OCS tapering in the anifrolumab group compared to 25.5% in the placebo group, with an OR of 3.08 (90% CI, 1.86 to 5.09) and a P value of < 0.001.

Proportion of Patients on a 10 mg/day or Higher Dosage of Oral Prednisone (or Equivalent) at Baseline Who Were Able to Taper to No More Than 7.5 mg/day at Week 52

For this secondary end point, a total of 56.4% of patients in the anifrolumab group on a 10 mg/day or higher dosage of oral prednisone (or equivalent) at baseline were able to taper to no more than 7.5 mg/day by week 52 compared to 26.6% in the placebo group, with an OR of 3.59 (90% CI, 1.87 to 6.89) and a P value of < 0.001.

Table 25

Primary and Secondary Efficacy Outcomes in MUSE Study Through Week 24 and Week 52 (mITT Population).

Harms

A summary of TEAEs at the interim analysis is presented in Table 26. During the 52-week period, 84.8% of patients in the anifrolumab group and 77.2% of patients in the placebo group reported at least 1 TEAE, the most common being headache, upper respiratory tract infection, nasopharyngitis, and urinary tract infection. Nasopharyngitis occurred at a higher frequency in the anifrolumab group (12.1%) than in the placebo group (4.0%).

The proportion of patients with at least 1 SAE was similar between the anifrolumab and placebo groups, the most common being increase SLE activity and pneumonia. The most common AESIs were infusion, hypersensitivity, and anaphylactic reactions, which represented a higher proportion of the placebo group (5.9%) compared with the anifrolumab group (2.0%). No deaths were reported in the anifrolumab 300 mg/day or placebo groups.

Table 26

Summary of TEAEs in MUSE Extension Study (Safety Population).

Critical Appraisal

Internal Validity

This phase II study had patients randomized and stratified by SLEDAI-2K score at screening and day 1 OCS dose, and by the results of a type I interferon signature test. The trial was double-blinded, with patients and study personnel involved in patient care or outcome assessment blinded to treatment. It is possible patients may have been potentially unblinded or may have been aware of their assignments due to improvement or lack of improvement (placebo) over the study period. The baseline patient characteristics were generally well balanced between anifrolumab and placebo groups. Despite stratification by OCS dose, a higher proportion of patients in the placebo group used a dosage of 10 mg/day or higher of OCS at baseline than those in the anifrolumab group (62.7% versus 55.6%). The discontinuation rate was higher in the placebo group (25.2%) than in the anifrolumab group (16.0%) which raises the concern of a risk of attrition bias. Discontinued patients were classified as nonresponders in the primary analyses, possibly biasing the results in the direction of placebo, although sensitivity analyses using LOCF imputation produced results similar to those of the primary analyses. Furthermore, it was unclear whether the patients who discontinued were different from those who did not. The primary outcome, SRI-4, is a reliable and valid composite measure for disease activity and response in SLE. The primary outcome was measured at 24 and 56 weeks in the MUSE study, which provided data on long-term treatment effects. The clinical expert consulted for this review agreed that a treatment response should be expected within 24 weeks for a drug to have clinical utility.

In terms of statistical analyses of the primary efficacy outcome, multiplicity was controlled for within the dose comparisons, but multiplicity was not controlled across populations. There was no control for multiplicity in the secondary efficacy outcomes which increases the likelihood of a type I error.

External Validity

The MUSE trial used similar inclusion and exclusion criteria as the pivotal trials and enrolled predominantly middle-aged, white females. The expert consulted for this review considered the patients enrolled in the pivotal trials to be representative of patients with moderate to severe SLE in Canada. Nevertheless, the high dropout rate in the placebo group may have led to patients who are less representative of the recruited population, decreasing the generalizability of the results of the study.

Study 1145

Methods

Study 1145¹² was a single-arm, open-label, long-term (up to 3 years) safety and tolerability study of anifrolumab 300 mg administered by IV infusion every 4 weeks (before the February 12, 2015, protocol amendment, the dose of anifrolumab was 1,000 mg, which is not a Health Canada–recommended dose). Safety assessments consisted of reporting all AEs, including TEAEs, and SAEs, as well as AESIs.

A total of 218 adult patients with chronic, moderate to severe SLE who were previously treated with any dose of anifrolumab or placebo in the MUSE trial and who completed the treatment and the 85-day follow-up period were enrolled in Study 1145. Patients were enrolled from 59 centres in North and South America, Europe, and Asia. Permitted standard-of-care SLE medications included an OCS (up to 40 mg/day of prednisone or equivalent), intramuscular corticosteroids, intra-articular/tendon sheath/bursa corticosteroid injections, antimalarials, slow-acting immunosuppressants, NSAIDs, and topical therapy. OCS medications at dosages higher than 40 mg/day for more than 14 days could be continued unless there was a safety concern. Slow-acting immunosuppressants were not permitted in Study 1145 above the following dosages: 200 mg/day of azathioprine, 2.0 g/day of mycophenolate or mofetil/mycophenolic acid, or 25 mg/week of methotrexate. Excluded concomitant medications throughout the study included biologics (e.g., belimumab), monoclonal antibodies (e.g., rituximab); IV corticosteroids, interferon therapy, live or attenuated vaccines, plasmapheresis, and immunoglobulin therapy.

Populations

At baseline, the mean disease SLEDAI-2K global score was 4.9 (SD = 3.9). This was lower than in the pivotal trials, which required a SLEDAI-2K score of 6 points or more for inclusion. A total of 72.9% of patients used corticosteroids at baseline and of these, 37.7% were on high-dose corticosteroids (≥ 10 mg/day). This was slightly lower than those in the pivotal trials, which had at least 46% using 10 mg/day or higher. A total of 68.3% of patients were on antimalarial medication at baseline. Approximately 67% of patients were type I interferon gene signature high (abnormal) and most patients were positive for antinuclear antibodies (95.8%).

Table 27

Summary of Baseline Characteristics for Study 1145 (As-Treated Population).

Outcomes

The primary end points of the study were the safety and tolerability of IV anifrolumab in adult patients with moderately to severely active SLE and were assessed primarily by summarizing TEAEs, SAEs, AEs leading to discontinuation, and AESIs. The secondary safety outcome included evaluating the immunogenicity results of anifrolumab by summarizing the proportion of patients who developed detectable ADAs. Relevant exploratory outcome assessed was mean change in the SLEDAI-2K global score and the SDI global score from baseline through to year 3.

Statistical Analysis

All analyses were descriptive for Study 1145, which was an open-label extension study. Categorical data were summarized by the number and percentage of patients in each category. Continuous variables were summarized by descriptive statistics, including mean, SD, median, minimum, and maximum.

Patient Disposition

A total of 218 patients completed the MUSE trial, met eligibility criteria, were enrolled into this open-label extension study, and received treatment. Of these, 152 (70%) had received anifrolumab and 66 (30%) had received placebo in the MUSE trial. Patients who permanently discontinued treatment could continue the study if they were followed up through 85 days after their last dose. Patients were considered to have not completed the study if consent was withdrawn or the patient was lost to follow-up. Overall, 63.8% of patients completed treatment and 78.9% of patients completed the study procedures. The most common reason for discontinuation of treatment or study was withdrawal by patient, which was not explored further in the MUSE trial. Patient disposition of the extension study is summarized in Table 28.

Table 28

Patient Disposition in Study 1145.

Efficacy Outcomes

SLEDAI-2K

The mean SLEDAI-2K score was 4.9 (SD = 3.9) at baseline (n = 218) and 3.7 (SD = 3.5) at week 168 (n = 139); with a mean change of −0.9 (SD = 4.1).

SDI

The mean SDI score was 0.6 (SD = 1.0) at baseline (n = 218) and 0.6 (SD = 1.0) at week 168 (n = 140), with a mean change of 0.1 (SD = 0.6).

Exposure to Study Treatments

All 218 patients received at least 1 dose of anifrolumab in the open-label extension study for up to 3 years. A majority of patients (64.2%) received at least 35 doses and 70.6% of patients were treated for 30 months or longer, for a total of 542 patient-years of exposure. The median duration of exposure in months was 35.877 (range = 0.03 to 36.60). During an infusion, 6 of 218 patients (2.8%) had their treatment interrupted (stopped during the infusion, then restarted), most commonly due to an AE. A total of 72 of 218 patients (33%) had at least 1 dose omitted (missed), most commonly because of an AE.

A total of 112 (51.4%) of patients received concomitant immunomodulatory medications, most commonly prednisone (20.6%) and methylprednisolone (13.3%). Trends related to OCS dosing were not explored in Study 1145.

Table 29

Extent of Exposure and Dose Modifications in Study 1145 (As-Treated Population).

Harms

The summary of TEAEs reported for up to 3 years of open-label treatment are presented in Table 8. A total of 78% of patients (n = 170) experienced an AE; the most common being nasopharyngitis (14.7%), bronchitis (13.8%), and upper respiratory tract infections (9.2%). A total of 22% of patients (n = 48) had a drug-related TEAE and 22.9% (n = 50) had 1 or more SAEs, with an exposure-adjusted SAE rate of 8.56 per 100 patient-years. The most common SAEs were increased SLE activity and pneumonia, each of which occurred in 2.3% of patients. One patient died from community-acquired pneumonia and this death was assessed by the investigator as related to treatment. In terms of AESIs, 7 patients (3.2%) had infusion, hypersensitivity, or anaphylactic reactions, and 5 patients (2.3%) had latent tuberculosis. Five patients had ADA-positive measurements at any time during Study 1145, of which 3 were at baseline only and 2 were considered persistent.

Table 30

Summary of TEAEs in Study 1145 Through to Year 3 (Safety Population).

Critical Appraisal

The extension study allowed for investigation of long-term efficacy and harms of anifrolumab. Limitations of the extension study include the absence of an active comparator, which limits causal conclusions. Furthermore, the analysis does not take account of the frequency or recurrence of AEs. As a greater proportion of patients in Study 1145 had previously been treated with anifrolumab in the MUSE study, observations based on frequencies of overall AEs in Study 1145 should be interpreted with caution. This could have resulted in a population of patients who were more tolerant of anifrolumab and therefore potentially less likely to experience harms. A relatively high proportion of patients discontinued the study (36.2%), which can increase the risk of attrition bias in favour of the intervention as patients who do not do well on the intervention tend to withdraw from the study. Although these patients were included in the safety analyses, their characteristics were not reported, and it was unclear whether the patients who discontinued were different from those who did not.

TULIP Long-Term Extension

The TULIP LTE was a 3-year, double-blind, placebo-controlled LTE study of the TULIP-1 and TULIP-2 trials in adults who had moderately to severely active SLE at the start of the trials. Patients who received anifrolumab in the TULIP-1 or TULIP-2 trial and entered the LTE remained on anifrolumab. Patients who received placebo and entered the LTE were rerandomized 1:1 to receive either anifrolumab or placebo in the LTE.¹³

Methods

The TULIP LTE was a 3-year phase III, global, multicentre, randomized, double-blind, placebo-controlled LTE study, characterizing the long-term safety and tolerability of anifrolumab 300 mg administered as IV monthly infusions versus placebo in patients with moderately to severely active SLE despite standard therapy.

Patients were randomized using an interactive voice-response system algorithm to the following groups during the LTE:

patients previously treated with anifrolumab 300 mg continued on blinded anifrolumab 300 mg
patients previously treated with anifrolumab 150 mg switched to blinded anifrolumab 300 mg
patients previously randomized to placebo were rerandomized 1:1 to blinded anifrolumab 300 mg or placebo.

This resulted in an approximate ratio of anifrolumab 300 mg versus placebo of 4:1 in the LTE study.

The LTE study consisted of a 156-week treatment period, after which patients continued in the study for another 8 weeks to complete a 12-week safety follow-up after the last dose (given at week 152) of the investigational product. Upon unblinding of the LTE study to support the 4-month safety update for a regulatory submission, treatment allocation for all patients became known to AstraZeneca. All study management personnel remained blinded. The blind was maintained for the investigators, investigational site staff, and for the patients. Measures were taken to minimize the potential impact related to the unblinding of data during an ongoing study, including using redacted documents for review of protocol deviations and narratives, restriction of access to documents containing unblinded data, and careful tracking of all individuals not remaining blinded.

Populations

The LTE target population comprised patients who had completed the 52-week double-blind treatment period in 1 of the phase III studies (TULIP-1 or TULIP-2), met all LTE eligibility criteria, and were willing to continue into the extension study. Similar to the TULIP-1 and TULIP-2 trials, certain SLE medications, such as cyclophosphamide, biologics, IV immunoglobulin, and IV steroids, were prohibited in the LTE to protect the safety of participating patients. However, in contrast to the TULIP-1 and TULIP-2 trials, there was no requirement for OCS tapering, and OCS bursts were allowed. In the LTE study, patients remained on background standard-of-care SLE therapy, but investigators were allowed to adjust, as clinically indicated for disease control, throughout the 3-year LTE. Patients were allowed to change dose or add or switch to a new immunosuppressant during the LTE.

Disease characteristics and baseline treatments were well balanced between groups. The mean age was 41 to 43 years and majority of patients were female (92% to 93%) and white (65% to 69%). Overall, this patient population had moderate to severe disease activity at baseline, with a mean overall SLEDAI 2K score of 11.2 in the LTE anifrolumab 300 mg group and 11.3 in the LTE placebo group; respectively. Across treatment analysis groups, approximately 70% of patients had a total SLEDAI-2K score of 10 points or higher. At week 52, the final visit of the feeder studies and the first visit of the LTE study, the mean SLEDAI 2K score was 5.1 in the combined anifrolumab 300 mg group and 6.0 in the all-placebo group. Baseline organ damage (SDI ≥ 1) was observed in less than half of patients, with an overall mean score of 0.6. Approximately 80% of patients were classified as type I interferon gene signature test high at screening, and with balanced proportions across groups.

Table 31

Summary of Baseline Characteristics.

Outcomes

Primary objective: characterize long-term safety and tolerability of IV anifrolumab in patients who completed the TULIP-1 or TULIP-2 trial (e.g., AESIs, SAEs).

Exploratory objectives: limited efficacy assessments (overall disease activity [SLEDAI-2K], OCS use, damage accrual [SDI]). Other exploratory outcomes included HRQoL (e.g., SF-36v2, and EQ-5D-5L)

Statistical Analysis

No formal comparisons were planned in this study. The LTE sample size was not based on statistical considerations but was instead defined by all patients completing the double-blind treatment period in the TULIP-1 and TULIP-2 trials who met all eligibility criteria and consented to continue into LTE.

AEs are summarized by descriptive statistics and qualitative summaries, exposure-adjusted incidence rates and adjusted cumulative proportions. Differences between treatment groups are presented for SAEs, AEs leading to discontinuation, deaths, and AESIs as adjusted differences in cumulative proportions and risk differences (based on exposure-adjusted incidence rates), and respective 95% CIs.

Observed values and changes from baseline in SLEDAI-2K, OCS use, and SDI global score are presented by visit with descriptive statistics.

Patient Disposition

In total, 547 patients who had completed the 52-week treatment period on the investigational product in the TULIP-1 and TULIP-2 trials were enrolled and received at least 1 dose of the product in the LTE study. Of these, 257 patients treated with anifrolumab 300 mg continued on anifrolumab 300 mg (LTE anifrolumab 300 mg group). Of the 223 patients from the feeder studies’ placebo treatment groups who entered the LTE, 112 patients were rerandomized to continue on placebo (LTE placebo group) and 111 patients were rerandomized to anifrolumab 300 mg. In addition, 67 patients switched from anifrolumab 150 mg in the TULIP-1 trial to anifrolumab 300 mg.

A higher proportion of patients In the LTE anifrolumab 300 mg group (69.3%) completed the LTE study compared with LTE placebo (69.3% and 48.2%, respectively). More patients in the LTE placebo group compared to the LTE anifrolumab 300 mg group discontinued the investigational product due to withdrawal by patient (22.3% versus 11.7%) or due to lack of efficacy (7.1% versus 5.4%). The proportions of patients who discontinued treatment due to AEs were low and comparable between the LTE anifrolumab 300 mg group (7.0%) and the LTE placebo group (8.0%).

Table 32

Patient Disposition (Full Analysis Set — LTE Study).

Exposure to Study Treatments

Exposure during treatment and follow-up in the LTE study was 683.5 patient-years in the anifrolumab 300 mg group and 250.3 patient-years in the placebo group. The total anifrolumab exposure to any dose at any time point during the feeder or LTE was 1,568 patient-years.

Efficacy

Systemic Lupus Erythematosus Disease Activity Index 2000

The observed treatment effect on SLEDAI-2K of anifrolumab 300 mg compared with placebo at week 52 was sustained throughout the 3-year LTE treatment period, with further improvements observed with anifrolumab 300 mg compared with placebo with longer treatment duration. A sensitivity analysis of all patients, including those excluded from the full analysis set, was conducted and had consistent results with the primary analysis.

The proportion of patients who achieved a reduction of 4 points or more from baseline was also consistently higher in the combined anifrolumab 300 mg group than in the combined placebo group. In the combined anifrolumab 300 mg group, 76.1% of patients who reached the week 52 visit had a reduction of 4 points or more and 90.0% of those who reached week 208, compared with 69.5% and 81.8%, respectively, in the combined placebo group. In addition, larger improvements were seen from baseline to week 208 across all domains in the anifrolumab group compared to placebo.

Table 33

SLEDAI-2K and Change From Baseline, Estimates and Standard Errors, Analysis of Covariance, Combined Data From Feeder and LTE Study (Full Analysis Set).

Oral Corticosteroid Use

Overall, and for each year of study, the mean OCS standardized AUC was lower for the combined anifrolumab 300 mg group compared to the all-placebo group. The proportions of patients receiving OCS bursts during the LTE were similar between the combined anifrolumab 300 mg and placebo groups.

Table 34

OCS Standardized AUC, Summary Statistics, Combined Data From Feeder and LTE Study (Full Analysis Set).

Systemic Lupus International Collaborating Clinics/American College of Rheumatology Damage Index

Overall, 30% to 40% of patients had organ damage (i.e., SDI score ≥ 1), at baseline in the TULIP-1 and TULIP-2 trials. Organ damage remained stable in both groups throughout the LTE; at week 208 the mean SDI score in patients with a baseline SDI score of 1 or higher was 2.1 in the combined anifrolumab 300 mg and 2.0 in the combined placebo group.

The time to first SDI worsening was numerically longer in the combined anifrolumab 300 mg group (mean = 925.0; SD = 553.0) compared with the combined placebo group (mean = 754.2; SD = 523.3).

Short Form (36) Health Survey Version 2 (Acute Recall)

At week 208 the proportion of PCS responders in the combined anifrolumab 300 mg compared with combined placebo group was 53.7% versus 41.0%; the proportion of MCS responders was 35.6% versus 26.2% in anifrolumab versus placebo group, respectively.

A numerically larger mean increase (indicating improving function) from baseline to week 208 was observed for the combined anifrolumab 300 mg group compared with the combined placebo group for both PCS and MCS. The mean change in PCS score at week 208 from feeder baseline was 5.51 in patients in the combined anifrolumab group compared with 3.82 in the combined placebo group. The mean change in MCS score was at week 208 from feeder baseline was 1.00 in the combined anifrolumab 300 mg group compared with −0.11 in the combined placebo group.

Table 35

SF-36v2 (Acute Recall) Domain, MCS And PCS, Subjects With Response, Combined Data From Feeder, and LTE Study (Full Analysis Set).

5-Level EQ-5D

EQ-5D-5L assessments showed overall improvements in health status as measured by change from baseline in EQ VAS and single summary utility index. The improvements in QoL as measured by change from baseline in EQ-5D-5L were small but consistently numerically higher for the combined anifrolumab 300 mg compared to the combined placebo group throughout the 4 years. Mean change from feeder study baseline at week 208 was 0.088 (from a baseline of 0.615) for the combined anifrolumab 300 mg group versus 0.017 (from baseline value of 0.614) in the combined placebo group. Mean changes in EQ VAS scores at week 208 were 16.4 (from baseline score of 55.9) in the combined anifrolumab 300 mg group versus 9.2 (from baseline score of 56.7) in the combined placebo group.

Harms

The safety profile up to 4 years of exposure, including assessment of rare events, remains unchanged. In addition, there was no increase in malignancy, major adverse cardiac events, no anaphylaxis, or active tuberculosis. During the 52-week period, 87.5% of patients in the anifrolumab group and 81.3% of patients in the placebo group reported 1 or more TEAEs, the most common being nasopharyngitis, urinary tract infection, upper respiratory tract infection, bronchitis, and headache.

The proportion of patients with 1 or more SAEs was similar between the anifrolumab and placebo groups, the most common being infections and infestations. The most common AESI was nonopportunistic infections. Three deaths were reported in the anifrolumab group (1.2%) and 1 death was reported in the placebo group (0.9%). Overall, no new safety signals were identified.

Table 36

Summary of TEAEs in LTE Study (Full Analysis Set).

Critical Appraisal

Internal Validity

Demographics and baseline characteristics were generally well balanced between groups. At the start of the LTE, fewer anifrolumab patients were on steroids compared to placebo. This may contribute to bias in terms of reducing OCS use if greater numbers of patients in the anifrolumab group were already not using OCS. Approximately 72% anifrolumab and 62% placebo of eligible patients completing treatment in predecessor studies (TULIP-1 and TULIP-2) enrolled into the TULIP LTE. More patients on anifrolumab completed the 3-year extension (69% in the anifrolumab group versus 48%) in placebo. The differential dropout rate may increase the risk of attrition bias in favour of anifrolumab.

Limitations regarding efficacy and HRQoL outcomes included the lack of formal statistical testing. Although a higher proportion of patients in the anifrolumab group had lower OCS use and improved SLEDAI-2K scored compared to placebo, no firm conclusions can be drawn based on the efficacy of anifrolumab, and its steroid-sparing effect based on the presented data. Also, the ability to draw conclusions on the effectiveness of anifrolumab in preventing organ damage was limited due to the lack of statistical testing. Last, because the sponsor was unblinded during the analysis phase. there is the potential for investigator and performance bias for efficacy and patient-reported outcomes.

External Validity

While the patient population was considered to be representative of patients with moderate to severe SLE in Canada, patients enrolled in the TULIP LTE had to have the 52-week double-blind treatment period in 1 of the phase III studies (TULIP-1 or TULIP-2). This is therefore a selective patient population, as it only included those who were able to complete the TULIP studies and, while the baseline characteristics of the patients enrolled in TULIP LTE might not differ from those enrolled in the TULIP-1 or TULIP-2 studies, results from the TULIP LTE cannot be generalized to all patients enrolled in the TULIP trials.

Discussion

Summary of Available Evidence

The CADTH systematic review included 2 phase III multicentre, randomized, double-blind, placebo-controlled studies evaluating the efficacy and safety of IV treatment regimen of anifrolumab 300 mg in adult patients (18 to 70 years of age) with moderate to severe, autoantibody-positive SLE while receiving standard-of-care treatment. The primary objective was to evaluate the effect of anifrolumab 300 mg compared to placebo on disease activity as measured by the difference in the proportion of patients who achieve an SRI-4 at week 52 for TULIP-1 or BICLA response at week 52 in the TULIP-2 trial.

In addition, 2 submitted studies provided in the sponsor’s submission to CADTH that were considered to address long-term efficacy (up to 3 years) of the treatment under review were included. These include a phase II, multinational, multicentre, randomized, double-blind, placebo-controlled study (MUSE)¹¹ and a phase II, single-arm, open-label, LTE study to evaluate the long-term safety of anifrolumab (Study 1145).¹² The primary efficacy end point for MUSE was the proportion of patients who at day 169 achieved an SRI-4 response as defined in the TULIP-1 trial. Study 1145 (N = 218) was a single-arm, open-label, long-term (up to 3 years) safety and tolerability study of anifrolumab 300 mg administered by IV infusion every 4 weeks in adult patients with chronic, moderate to severe SLE who were previously treated with any dose of anifrolumab or placebo in the MUSE trial. Safety assessments consisted of reporting all AEs. including TEAEs and SAEs, as well as AESI results. The primary end points of the study were the safety and tolerability of IV anifrolumab in adult patients with moderately to severely active SLE and they were assessed primarily by summarizing TEAEs, SAEs, AEs associated with discontinuation, and AESIs.

Interpretation of Results

Efficacy

The clinician group input received by CADTH for this review indicated that the ideal treatment would have a meaningful impact on overall survival by reducing disease activity, risk of subsequent flares, use of OCS, risk of AEs, and long-term complications, while inducing remission (low disease activity), and improving QoL. While the TULIP-1 trial did not meet its primary end point of SRI-4 response at week 52, nor any of its key secondary end points, the TULIP-2 trial did meet its primary end point as well as key secondary end points of BICLA in patients with a high result on an interferon test, maintained OCS reduction, and CLASI response. In terms of maintained OCS reduction and CLASI response, it is uncertain why there was a discrepancy between the trials in statistical significance. The sponsor indicated that, upon review of the prespecified analyses from TULIP-1 following database lock, some of the rules for defining patients as nonresponders due to receiving restricted medications were deemed too stringent and clinically inappropriate. Specifically, the original rules inappropriately classified patients who used NSAIDs or who increased NSAID doses as nonresponders. The sponsor also noted that, because most NSAIDs have a short half-life and a slow and weak effect on inflammation and pain, they were not thought to confound the efficacy assessments at week 52, as long as they were not initiated late in the study. The rules were therefore amended to reflect that a patient would not be considered a nonresponder if such a patient used NSAIDs or increased an NSAID dose. These rules were formally agreed upon before the unblinding of TULIP-2 data. While results after the amended rules from the TULIP-1 trial were consistent with those in the TULIP-2 trial, such results should be interpreted with caution as they were post hoc analyses and were not prespecified in the study protocol of TULIP-1 study. Of note, the tipping-point analysis based on nonresponder imputations weakly supported the robustness of the maintained OCS dose results in the TULIP-2 trial. According to clinical expert consulted for this review, potential reasons for the discrepancy between the 2 trials could be simply due to chance (e.g., regression to the mean).

The key difference between trials was the primary end point being switched from SRI-4 to BICLA in the TULIP-2 trial. The switching of the primary end point was based on the TULIP-1 and MUSE study results, which demonstrated that the BICLA had produced consistent results across time. This switch took place after data collection for TULIP-2 was completed and before the unblinding of the results at week 52, and the risk of bias due to operationalization is low. In the opinion of the clinical expert, although the SRI is a clinically relevant outcome to assess response in patients with SLE, there is a shift toward the BICLA given its ability to capture partial responses and its discriminative nature with respect to detecting difference between placebo and active treatment more effectively than the SRI-4. According to the literature, the BILAG-2004, a main component and driver of the BICLA, is a valid and reliable instrument for SLE patients and is more responsive to change than the SLEDAI-2K.²⁷^,²⁹ In comparison with BILAG, the SLEDAI is less responsive to change, it does not capture improvement or worsening, and it does not assess severity in an organ system.³³ As discussed in Appendix 4, on the 1 hand, using a single weighted score to summarize disease activity makes the judgment of disease activities much easier and standardized, while on the other, it would have the potential to mask the underlying importance of organ systems that are contributing to the total score (i.e., the same score could represent multiple mild diseases in many organs or severe disease in a single organ, or an unchanged score may occur despite worsening in 1 organ system if there is also improvement in another system). In addition, the SLEDAI is weighted toward neuropsychiatric and renal manifestations, and patients with severe neuropsychiatric and renal disease were excluded from the pivotal trials. Responders and nonresponders on the SRI have been shown to differ on several measures of disease activity, biomarkers, and HRQoL.²¹^,²² For example, the MUSE study demonstrated a statistically significant difference in SRI-4 response (the primary end point) between patients in the anifrolumab versus the placebo group, while the TULIP-1 trial did not.

SLE also causes significant damage to many vital organs and tissues, most notably the kidneys and the CNS. These effects of the disease take longer to develop, and, according to the clinical expert consulted by CADTH for this review, it is unlikely that a 52-week study would be able to demonstrate a reduction in accumulated organ damage. Ideally, the trial would be at least 2 years (104 weeks) in length. In the opinion of the clinical expert, it is not surprising that there was no difference in the SDI, which was used to assess organ damage, between anifrolumab and placebo. As noted, longer-term results are available from extension trials; however, these are of limited value due to the lack of a comparator. It should also be noted that, because both trials excluded patients with severe renal or CNS involvement, the effects of anifrolumab cannot be ascertained in this population.

The patient input received for this CADTH review indicated that patients would like to see new therapies that reduce AEs; symptoms such as fatigue, flares, headaches, brain fog, joint and muscle pain, insomnia, and rash and skin irritations; the number of medications used; increase in lifespan and the ability to perform ADLs; and improved overall HRQoL. According to the clinical expert, HRQoL is generally stable in patients with SLE. Across both trials, anifrolumab and placebo groups observed similar results across a broad range of HRQoL and symptom score measures such as the SF-36 and FACIT-F. Statistical tests were not conducted and the impact of anifrolumab on HRQoL is therefore unclear.

While the Health Canada indication is for all adult patients with active autoantibody-positive SLE (in addition to standard therapy), the sponsor’s reimbursement request is for moderate to severe SLE patients with an OCS dosage of 10 mg/day or higher of prednisone or its equivalent. This subgroup of patients was not statistically assessed for the primary end point nor the key secondary end points, other than maintained OCS reduction. It would have been more appropriate to have tested the primary end points and key secondary end points on the proposed reimbursement indication rather than the FAS. Based on the available data, the efficacy of anifrolumab for this subgroup of patients is unclear. In addition, the clinical expert and clinical groups specified that the reimbursement population ideally should include patients with an OCS dosage of less than 10 mg/day and the target tapering dosage would be greater than 7.5 mg/day versus 7.5 mg/day or higher. The reimbursement request and target tapering dose in the trials therefore may have been too conservative.

Harms

Based on its mechanism of action, targeting the interferon pathway, infection would be 1 of the notable harms that should be monitored with anifrolumab. The clinical expert consulted by CADTH indicated that the safety profile of anifrolumab was in line with other treatments, and it was unsurprising that herpes zoster was a common AE. There has been no indication from the pivotal trials that there is an increased risk of mortality due to AEs while receiving anifrolumab. Between the 2 pivotal trials, there were 3 deaths, 2 in the anifrolumab group (pneumonia), and 1 in the placebo group (encephalitis). In the LTE study (N = 218, duration up to 3 years), there was 1 death (0.5%) from community-acquired pneumonia, and this death was assessed by the investigator as related to treatment. No deaths were reported in the MUSE study. Overall, a higher proportion of patients in the anifrolumab group compared to the placebo group had AEs of nasopharyngitis, upper respiratory tract infections, infusion-related reactions, bronchitis, and herpes zoster. The LTE study confirmed these findings, although the conclusions that can be drawn are limited by the lack of a control group and attrition bias. Concerns over infection risk with anifrolumab also need to be weighed against those of many current standard-of-care medications, including immunosuppressants and corticosteroids, which are known for their increased infection risk.

Conclusions

The clinical expert consulted by CADTH, and the input received from the clinician groups for this review, indicated that the ideal treatment would have a meaningful impact on overall survival by reducing disease activity, risk of subsequent flares, use of OCS, risk of AEs, and long-term complications, while inducing remission (low disease activity), and improving HRQoL. Two multinational, sponsored-submitted, double-blind, RCTs, TULIP-1 and TULIP-2, were included in this review, along with 2 additional studies that provided long-term safety data. Results of the 2 pivotal RCTs were inconsistent with each other. In 1 study, anifrolumab statistically significantly reduced disease activity after 52 weeks compared to placebo, as measured by BICLA response. The other study showed no statistically significant difference in response as measured by SRI-4 response. While 1 of the studies showed a difference in maintained reduction of OCS dose to less than 7.5 mg/day and reduction in cutaneous manifestations of lupus, the other did not. The inconsistent results contribute to uncertainty in forming conclusions regarding the impact of anifrolumab on disease activity, OCS dose reduction, and CLASI reduction. Despite numerical improvements in HRQoL across the included measures, these results were not statistically tested, and the improvements were generally the same between anifrolumab and placebo groups; the impact of anifrolumab on HRQoL is therefore unknown. The duration of the study was not sufficient to study the effects of anifrolumab on organ damage and survival. Data from the included studies do not suggest issues of tolerability or safety, although the extension study was limited by the lack of a control group.

Abbreviations

ACE: Arthritis Consumer Experts
ACR: American College of Rheumatology
ADA: auto-antibody
ADL: activities of daily living
AE: adverse event
anti-dsDNA: anti–double-stranded DNA
BICLA: British Isles Lupus Assessment Group-based Composite Lupus Assessment
BILAG: British Isles Lupus Assessment Group
BILAG-2004: British Isles Lupus Assessment Group 2004
CaNIOS: Canadian Network for Improved Outcomes in Systemic Lupus Erythematosus
CI: confidence interval
CLASI: Cutaneous Lupus Erythematosus Disease Area and Severity Index
CMH: Cochran-Mantel-Haenszel
CNS: central nervous system
C-SSRS: Columbia Suicide Severity Rating Scale
EQ-5D-5L: 5-Level EQ-5D
FACIT-F: Functional Assessment of Chronic Illness Therapy–Fatigue
FAS: full analysis set
HRQoL: health-related quality of life
IFNAR: interferon-alpha and -beta receptor subunit 1
LLDAS: lupus low disease activity state
LOCF: last observation carried forward
LTE: long-term extension study
MCS: mental component score
MID: minimal important difference
mITT: modified intention-to-treat
NRS: numerical rating scale
NSAID: nonsteroidal anti-inflammatory drug
OCS: oral corticosteroid
OR: odds ratio
PCS: physical component score
PGA: Physician’s Global Assessment
PHQ-8: 8-item Patient Health Questionnaire
QoL: quality of life
RCT: randomized controlled trial
SAE: serious adverse event
SD: standard deviation
SDI: Systemic Lupus International Collaborating Clinics/American College of Rheumatology Damage Index
SELENA: Safety of Estrogens in Lupus Erythematosus National Assessment
SF-36: Short Form (36) Health Survey
SF-36v2: Short Form (36) Health Survey Version 2
SFI: Safety of Estrogens in Lupus Erythematosus National Assessment Flare Index
SLE: systemic lupus erythematosus
SLEDAI: Systemic Lupus Erythematosus Disease Activity Index
SLEDAI-2K: Systemic Lupus Erythematosus Disease Activity Index 2000
SLICC: Systemic Lupus International Collaborating Clinics
SRI: Systemic Lupus Erythematosus Responder Index
SRI-4: improvement of 4 points or greater on the Systemic Lupus Erythematosus Responder Index
TEAE: treatment-emergent adverse event
VAS: visual analogue scale

Appendix 1. Literature Search Strategy

Note that this appendix has not been copy-edited.

Clinical Literature Search

Overview

Interface: Ovid

Databases:

MEDLINE All (1946—)
Embase (1974—)

Note: Patient headings and search fields have been customized for each database. Duplicates between databases were removed in Ovid.

Date of search: March 01, 2022

Alerts: Bi-weekly search updates until project completion

Search filters applied: No filters were applied to limit the retrieval by study type.

Limits:

Publication date limit: none
Language limit: none
Conference abstracts: excluded

Table 37

Syntax Guide.

Multidatabase Strategy

(saphnelo* or anifrolumab* or MEDI-546 or MEDI546 or 38RL9AE51Q).ti,ab,kf,ot,hw,rn,nm.
1 use medall
*anifrolumab/
(saphnelo* or anifrolumab* or MEDI-546 or MEDI546).ti,ab,kf,dq.
or/3-4
5 use oemezd
6 not conference abstract.pt.
2 or 7
remove duplicates from 8

Clinical Trials Registries

ClinicalTrials.gov

Produced by the US National Library of Medicine. Targeted search used to capture registered clinical trials.

[Search -- Studies with results | Saphnelo, anifrolumab, MEDI-546, MEDI546]

WHO ICTRP

International Clinical Trials Registry Platform, produced by the WHO. Targeted search used to capture registered clinical trials.

[Search terms – Saphnelo, anifrolumab, MEDI-546, MEDI546]

Health Canada’s Clinical Trials Database

Produced by Health Canada. Targeted search used to capture registered clinical trials.

[Search terms – Saphnelo, anifrolumab]

EU Clinical Trials Register

European Union Clinical Trials Register, produced by the European Union. Targeted search used to capture registered clinical trials.

[Search terms – Saphnelo, anifrolumab, MEDI-546, MEDI546]

Grey Literature

Search dates: February 16, 2022 – February 23, 2022

Keywords: [Saphnelo, anifrolumab, MEDI-546, MEDI546, lupus, SLE]

Limits: Publication years: none

Updated: Search updated before the meeting of the CADTH Canadian Drug Expert Committee (CDEC)

Relevant websites from the following sections of the CADTH grey literature checklist Grey Matters: A Practical Tool for Searching Health-Related Grey Literature were searched:

Health Technology Assessment Agencies
Health Economics
Clinical Practice Guidelines
Drug and Device Regulatory Approvals
Advisories and Warnings
Drug Class Reviews
Clinical Trials Registries
Databases (free)
Internet Search

Appendix 2. Excluded Studies

Note that this appendix has not been copy-edited.

Table 38

Excluded Studies.

Appendix 3. Detailed Outcome Data

Note that this appendix has not been copy-edited.

Table 39

Secondary Outcomes, SRI[X], TULIP-1, and TULIP-2 (FAS).

Table 40

Sensitivity Analyses for TULIP-1 (FAS).

Table 41

Sensitivity Analyses for TULIP-2 (Full Analysis Set).

Figure 8

OCS Dose (mg), Mean Change From Baseline by Time Point in TULIP-2 (Full Analysis Set).

Table 42

Shifts From Baseline to Week 52, BILAG-2004 Individual Components — TULIP-2.

Table 43

Shifts From Baseline to Week 52, BILAG-2004 Individual Components — TULIP-1.

Appendix 4. Description and Appraisal of Outcome Measures

Note that this appendix has not been copy-edited.

Aim

To describe the following outcome measures and review their measurement properties (validity, reliability, responsiveness to change, and MID):

BICLA
SRI
BILAG-2004
SLEDAI-2K
PGA
SDI
SFI
LLDAS
CLASI
NRS
FACIT-F
Lupus QoL scale
EQ-5D-5L
SF-36 v.2.0
PHQ-8
C-SSRS.

Findings

Table 44

Summary of Outcome Measures and Their Measurement Properties.

British Isles Lupus Assessment Group-based Composite Lupus Assessment

Description and Scoring

The BICLA was derived by expert consensus as a composite index that requires patients to meet response criteria across the BILAG-2004 index, SLEDAI-2K, and PGA.⁶⁷ Details of the individual scales are given in the following sections. In the pivotal trials, the BICLA was a primary end point for TULIP-2 and a secondary end point in the TULIP-1 trial. A patient was defined as a BICLA responder if the following criteria were met:

reduction of all baseline BILAG-2004 A to B/C/D and baseline BILAG-2004 B to C/D, and no BILAG-2004 worsening in other organ systems, as defined by 1 new BILAG-2004 A or more than 1 new BILAG-2004 B item, and
no worsening from baseline in SLEDAI-2K: defined as an increase from baseline of > 0 points in SLEDAI-2K, and
no worsening from baseline in the patients’ lupus disease activity defined by an increase ≥ 0.30 points on a 3-point PGA VAS, and
no discontinuation of investigational product or use of restricted medications beyond the protocol-allowed threshold before assessment.

In contrast to the SRI, improvement in the BICLA is guided by the BILAG-2004 and worsening is assessed using the BILAG-2004, the SLEDAI-2K and PGA.²³ The BILAG-2004 can discern inactive disease, partial or complete improvement, and deterioration of disease activity while the SLEDAI- 2K requires complete resolution of disease activity of the specific element to capture improvement.²³

Validity

One article noted disagreement between the BICLA and SRI in the EMBLEM trial as BICLA criteria requires a strict response in all body systems involved at baseline and does not allow for new flares in remaining body systems.²⁶ A patient could be a responder on the SRI when a component of SLEDAI resolves, while other issues (if present at baseline) stayed the same or worsened slightly.²⁶ In a post hoc analysis of the TULIP trials, BICLA responders had improvements in patient-reported outcomes, including the physical and mental components of the SF-36, the FACIT-F, and PGA scores, indicating convergent validity using the known-groups approach.²³

No literature was identified regarding the reliability and responsiveness of the instrument.

Systemic Lupus Erythematosus Responder Index

Description and Scoring

The SRI is a composite outcome that is rated dichotomously, as to whether a patient has achieved or not achieved response. The SRI-4 response at week 52 was the primary end point in the TULIP-1 trial,⁹ the secondary end point in the TULIP-2 trial,¹⁰ and a prespecified exploratory end point in MUSE.¹¹ The SRI-4 was achieved if all the following criteria were met:

≥ 4-point reduction from baseline in SLEDAI-2K score, and
no new organ system affected as defined by ≥ 1 BILAG-2004 A or ≥ 2 BILAG-2004 B items compared to baseline using BILAG-2004, and
no worsening from baseline in patients’ SLE disease activity defined by an increase ≥ 0.30 points on a 3-point PGAVAS, and
no discontinuation of investigational product or use of restricted medications beyond the protocol-allowed threshold before assessment.⁹^,¹⁰

The SRI was developed from an exploratory analyses of a phase II belimumab trial (LBSL99), which included 449 patients with SLE over 56 weeks.³⁶ According to the developers of the SRI, the SLEDAI component was incorporated to capture global improvement, the BILAG domain to ensure no significant worsening in unaffected organ systems, and the PGA to ensure that improvements in disease activity are not at the expense of a patient’s overall condition that are not captured with the SLEDAI or BILAG.³⁶ It is unclear how these particular outcomes for the composite were chosen amid other outcomes available for SLE.

Validity

Studies have found that the SRI is correlated with other clinical parameters of disease activity. In a post hoc analysis of pooled data from 2 52-week Phase IIb trials of sifalimumab and anifrolumab in 736 patients with SLE, changes in disease measures according to SRI responder status were assessed.²¹ Compared with nonresponders, more SRI responders demonstrated a ≥ 7-point reduction in SLEDAI 2K (P < 0.001); had a greater mean change from baseline in SLEDAI 2K score (P < 0.001), PGA score (P = 0.019), FACIT-F score and SF-36 score (P < 0.001); had more organ domains with improvement in SLEDAI 2K (P < 0.0001); experienced reduction in prednisone equivalent of ≤ 7.5 mg/d (P < 0.001); had ≥ 50% improvement in swollen and tender joint counts (P < 0.001), and ≥ 50% improvement in the Cutaneous Lupus Erythematosus Disease Area and Severity Index (CLASI) (P < 0.001).²¹ In addition, fewer SRI responders experienced ≥ 1 flare as measured by BILAG A or 2B flares compared with nonresponders (P < 0.001).²¹ SRI responders had greater mean change from baseline in anti–double-stranded DNA (anti-dsDNA) compared with nonresponders (P = 0.051), although no statistical difference was observed for C3 and C4 concentrations.²¹

Responsiveness

Among 91 patients from the Oklahoma Lupus Cohort study, SRI was compared with a physician’s assessment of improvement.⁶⁸ The SRI in this study used the SELENA SLEDAI, except that the scoring for proteinuria was based on the SLEDAI-2K. Physicians rated patient’s disease as either clinically significant improvement, worsening, or no change. In relation to these assessments, the SRI had a sensitivity of 85% and specificity of 74%.⁶⁸ In a small study of 20 patients with SLE who presented with inflammatory musculoskeletal symptoms, clinical and ultrasound parameters were compared at 2 and 4 weeks from baseline.²² Effect sizes from baseline to 2 or 4 weeks were calculated from paired nonparametric tests (effect size r = Z statistic/sqrt[2N]).²² Among SRI responders, large effect sizes were observed for tender joint counts and swollen joint counts (r = −0.505 and −0.492, P = 0.024 and 0.028, respectively) and smaller, nonsignificant, effect sizes in nonresponders (r = −0.365 and −0.331, and P = 0.122 and 0.160, respectively). However, the SRI was found to be less responsive to musculoskeletal SLE (e.g., SRI underestimated response as there was objective improvement in synovitis among patients classified as nonresponders) than the BILAG or a physician VAS.²²

No literature was identified regarding the reliability or responsiveness of the instrument.

BILAG-2004

Description and Scoring

The BILAG-2004 is an updated version of the original ‘classic’ BILAG.²⁶ The classic BILAG had 8 domains and consisted of fewer items that were more related to damage than to disease activity and did not properly include disease activity in the gastrointestinal or ophthalmic systems.²⁴ The BILAG-2004 is an ordinal scale index with 97 organ-specific items in 9 domains (constitutional, mucocutaneous, neuropsychiatric, musculoskeletal, cardiorespiratory, gastrointestinal, ophthalmic, renal, and hematology) that is able to capture changes in clinical manifestations.¹⁰^,²⁷ The BILAG-2004 records disease activity across the different organ systems by comparing the immediate past 4 weeks to the 4 weeks preceding them where organ manifestations are scored by the investigator as not present ( = 0), improving ( = 1), same ( = 2), worse ( = 3), or new ( = 4) which are then combined with laboratory tests into a single score for that organ. The numerical scoring enables comparisons with global indices by converting the assessments so that grade A = 12 points, B = 8 points, C = 1 point, and D/E = 0 points (where ‘A’ indicates severe disease, ‘B’ is moderate activity, ‘C’ is mild stable disease, ‘D’ is resolved activity, and ‘E’ indicates the organ was never involved). The BILAG-2004 gives equal weight to all affected body systems and can measure incremental improvements or worsening within a body system unlike the SLEDAI-2K which can only record clinical manifestations as absent or present. For example, a 50% improvement, such as a reduction from 40% to 20% of the skin surface involved with a skin eruption, the BILAG-2004 level for that organ would change from A (severe activity) to B (moderate activity). The BILAG-2004 requires improvement in all baseline manifestations within a system to result in a change in that system’s BILAG-2004 level. For example, a patient with skin eruption and severe mucosal ulceration at baseline must show improvement in both to result in a change in the BILAG-2004 mucocutaneous index level.²⁷ In the pivotal trials, for the annualized flare rate, a flare was defined as either ≥ 1 new BILAG-2004 A or ≥ 2 new BILAG-2004 B items compared to the previous visit, which have been defined as severe and moderate flares in the literature, respectively.²⁴

Validity and Reliability

Hay et al. conducted validity and inter-rater reliability studies of the classic BILAG.²⁹ In the validity study, 353 patients with SLE were included.²⁹ Patients were assessed at intervals of at least 1 month apart over a 12-month period, and at least 2 BILAG assessments were conducted on each patient. Criterion validity was based on the gold standard of initiation or increase in disease-modifying therapy (i.e., corticosteroids or immunosuppressants). Construct validity was tested by comparing BILAG assessment with erythrocyte sedimentation rate, double-stranded DNA antibody titres, and need for hospitalization. In examining 1,139 BILAG assessments, compared with the gold standard criterion (starting or increasing disease-modifying therapy), the BILAG had 87% sensitivity and 99% specificity. The positive predictive value was 80% for a BILAG A score in any system.²⁹ The PPVs for a BILAG A score by organ system were: general = 83%, mucocutaneous = 82%, neurologic = 30%, musculoskeletal = 81%, cardiorespiratory = 100%, vasculitis = 100%, renal = 100%, and hematology = 50%).²⁹ Construct validity was also demonstrated. Of those patients with ESR > 40 mm/h, 52% scored A in 1 or more systems compared with 10% with ESR < 20 mm/h (P < 0.001); 56% with anti-dsDNA antibody titre > 30IU/L scored A in 1 or more systems compared with 13% with anti-dsDNA antibody titre < 30 IU/L (P < 0.001); 19 patients admitted to hospital and 18 of their assessments scored A in 1 or more systems versus 6 of the outpatients (P < 0.001).²⁹

Similarly, in a study of 369 patients with SLE in the UK, increasing overall scores on the BILAG-2004 index were associated with increasing ESRs, decreasing C3 levels, decreasing C4 levels, elevated anti-dsDNA levels, and increasing SLEDAI-2K scores, demonstrating construct validity.²⁸ A study examining the inter-rater reliability included 82 patients with SLE treated at outpatient clinics.²⁹ Two rheumatologists who were experienced with the BILAG assessed each patient (renal and hematological systems were not sored because they are based on laboratory results and not prone to inter-rater measurement error). The weighted kappas showed substantial to almost perfect agreement between assessors (general = 0.79, mucocutaneous = 0.80, neurologic = 0.72, musculoskeletal = 0.85, cardiorespiratory = 0.97, and vasculitis = 0.76).²⁹

In a study of 16 SLE patients assessed by 16 rheumatologists, the rate of complete agreement was assessed between physicians for any flare versus no flare for the BILAG-2004, the SFI, and the PGA.²⁴ Under the BILAG-2004 flares was defined as severe: ≥ 1 BILAG-2004 ‘A’ score in any system due to items that are new or worse; moderate: ≥ 2 ‘B’ scores due to items that are new or worse; mild: 1 ‘B’ score due to items that are new or worse or ≥ 3 ‘C’ scores due to items that are new or worse. Anyone without 1 of these criteria was be categorized as no flare. The rate of agreement (95% CI) was 81% (55% to 94%) for the BILAG-2004, 75% (49% to 90%) for the SFI, and 75% (49% to 90%) for the PGA. The ICC (95% CI) values were 0.54 (0.32 to 0.78) for BILAG 2004 flare compared with 0.21 (0.08 to 0.48) for SELENA flare and 0.18 (0.06 to 0.45) for PGA. The agreement was less consistent in mild/moderate flares than in severe flares.²⁴

Responsiveness

In a 2008 study, the ability to detect disease activity was assessed by determining the number of patients with high activity on the BILAG-2004 (overall score A or B) but a low SLEDAI-2K score and number of patients with low activity on the BILAG-2004 (overall score C, D or E) but a high SLEDAI-2K score.²⁷ Results found that 35 patients (37.6%) had high activity on BILAG-2004 but a low SLEDAI-2K score, of which 48.6% had an increase in treatment, indicating that the SLEDAI-2K was less able than the BILAG-2004 to detect active disease. In another study of 347 SLE patients with 1,761 assessments, increases in overall BILAG-2004 index score was associated with increases in therapy and inversely associated with decrease in therapy.⁶⁹

Minimal Important Difference

Yee et al., (2012)³⁰ developed the BILAG-2004 systems tally (BST) which classified changes in BILAG-2004 index scores according to severity. In the BST a minor deterioration was classified as a change of grade C to B and a minor improvement was classified as a change of grade A to B or grade B to C.

Systemic Lupus Erythematosus Disease Activity Index 2000

Description and Scoring

The SLEDAI is a measure of disease activity that was derived by consensus among experts in rheumatology, followed by regression models to assign relative weights to each parameter.²⁶ The SLEDAI-2K is a modified version of the original SLEDAI to allow for persistent active disease in alopecia, mucous membrane ulcers, rash, and proteinuria to be scored.³² The SLEDAI-2K is based on the presence of 24 descriptors in 9 organ systems which are defined by the investigator as “present” or “absent” in the patient in the past 4 weeks and incudes the use of laboratory samples. Each descriptor has a weighted score and the sum of all 24 descriptor scores falls between 0 and 105, with higher scores representing higher disease activity.³¹ In the pivotal trials, the “Clinical” SLEDAI-2K score is the SLEDAI-2K assessment score without the inclusion of points attributable to any urine or laboratory results including immunologic measures.⁹^,¹⁰

Validity

In a study of 334 SLE patients in Portugal, a strong Spearman rank correlation (0.824) was observed between the SLEDAI-2K and the PGA at the 36-month follow-up, supporting the construct validity of the SLEDAI-2K in SLE patients.^.33 In another study of 92 patients with SLE, a good correlation coefficient of 0.677 between the SLEDAI 2K and PGA was identified, indicating construct validity,⁷⁰

Reliability

The reliability of the SLEDAI-2K was demonstrated using inter-rater reliability between 2 raters in a study of 93 SLE patients.²⁷ Results found agreement between the raters for each of the items ranging between 81.7% and 100%.²⁷

Responsiveness

In terms of responsiveness, in 1 study, the SLEDAI-2K was not successful in detecting a clinically meaningful improvement or worsening in SLE disease activity; as it failed to identify more than 60% of cases with a worsening or improvement, which was defined as a change of 0.3 points in the patient global assessment PGA.⁵⁹ The BILAG-2004 has been found to be more responsive to change in disease activity than the SLEDAI-2K.²⁷ Using a summary score to describe disease activity as in the SLEDAI-2K can mask the underlying organ systems that are contributing to the score (i.e., the same score could indicate mild disease in multiple organs or severe disease in 1 organ; or an unchanged score may occur despite worsening in 1 organ system if there is also improvement in another system).⁴³

Minimal Important Difference

One study identified a minimal clinically meaningful increase of 3 or 4 points for prediction of increase in therapy (worsening) and suggest a minimal clinically meaningful decrease in score of 1 to 2 points for improvement.³⁴ Another study found that the SLEDAI-2K score increased by > 3 points when the clinician assessed that the patient was experiencing a flare.³⁵

Physician’s Global Assessment

Description and Scoring

The PGA represents the physician’s overall assessment of average SLE disease severity on a VAS with equal markings between 0 to 3 where 0 = none, 1 = mild, 2 = moderate, and 3 = severe disease.⁷¹

Validity

In a systematic review the PGA was moderately to strongly correlated with the SLEDAI in 12 studies (r = 0.50 to 0.97) and moderately correlated with the Systemic Lupus Activity Measure SLAM in 4 studies (r = 0.47 to 0.65).³⁷

Reliability

Inter-rater reliability was assessed in 7 studies between 2 or more physicians, with results showing moderate to excellent reliability with ICC values ranging from 0.67 to 0.96.³⁷ Intra-rater reliability was assessed in 3 studies with ICC values ranging from 0.55 to 0.88.³⁷

Responsiveness

Studies have assessed responsiveness by correlating changes in the PGA with changes in other instrument scores. Findings have resulted in moderate correlations with SLEDAI (r = 0.39 to 0.66), SLAM (0.61), and the Lupus Activity Index (LAI) (0.56).³⁷

Minimal Important Difference

The PGA is part of the SRI and SFI. In the SRI, no worsening of PGA is defined as an increase of < 0.3 points.³⁶ The change of 0.3 points on the PGA is based on patients with rheumatoid arthritis.³⁶ In the SFI, a mild or moderate flare can occur with an increase in PGA score of ≥ 1, and a severe flare with an increase in PGA score of > 2.5.⁷² Through consensus, the Hopkins Lupus Center chose a 1-point change on the PGA over the last 93 days, as a gold standard definition of flare.⁷² Based on this definition, moderate flares were defined as a score of 2 to 2.5, and severe flares as a score of 3.⁷² In an epratuzumab trial, a significant improvement was a 20% decrease in PGA score evaluated after 12 months of treatment.³⁷

Systemic Lupus International Collaborating Clinics/American College of Rheumatology Damage Index

Description and Scoring

The SDI was developed by the international collaboration, SLICC.⁴³ The purpose of the assessment is to score irreversible damage, regardless of cause. Damage is defined as irreversible change in an organ system that has occurred since the onset of SLE, and is present for at least 6 months.⁴³ The tool is completed by a physician and consists of 42 items in 12 domains (peripheral vascular, ocular, neuropsychiatric, renal, pulmonary, cardiovascular, gastrointestinal, musculoskeletal, skin, endocrine (diabetes), gonadal, and malignancies) with a maximum score of 47 points (higher scores denote more damage).⁴³^,⁴⁵ The items are rated as present or absent and, in the case of recurring events, such as a stroke, there is a possibility of providing a rating of 2 or 3 points to an item.⁴³ At diagnosis of SLE, the SDI score is 0 by definition.⁴⁵ Damage is considered if the SDI score is ≥ 1 and damage can remain stable or increase over time, however points should not decrease.⁴⁵

Validity

To assess the validity of the SDI, centres who treated SLE patients submitted 2 assessments, 5 years apart, on 2 patients with active disease (one patient with increase in damage over the 5 years and 1 patient with stable damage) and 2 patients with inactive disease (one patient with increase in damage and 1 patient with stable damage).⁴⁴ The cases (14 cases in 3 separate packages) were written up in a uniform format and sent back out, in mixed order, to the centres where the SDI was completed by 20 physicians (2 assessments per patient at time 1 and time 2). The SDI scores of patients with damage after 5 years were increased by a greater degree compared with patients with stable disease (2.08 points versus 0.24 points).⁴⁴ The SDI scores of patients with active disease also increased more compared with patients with inactive disease (1.48 points versus 0.83 points).⁴⁴ A study of 71 patients found that the SDI was associated with SLEDAI 2K (r = 0.742) and the European Consensus Lupus Activity Measurement (ECLAM) (r = 0.699).⁷³ The SDI and BILAG have been found to have weak correlation (Spearman correlation coefficient 0.19).⁷⁴

Reliability

Among 20 SLICC members who completed the SDI on 42 cases, there was moderate agreement between raters (ICC = 0.553).⁴⁵ Similarly, when the SDI was completed by another physician based on retrospective review of patient cases, interobserver reliability was moderate (kappa = 0.47; 95% CI, 0.28 to 0.66).⁷⁴

Responsiveness

The SDI is a statistically significant predictor of clinically important outcomes. In a 10-year retrospective study of 80 patients with SLE, the mean SDI renal damage score at 1 year after diagnosis was a significant predictor of end stage renal failure (at 1 year: renal failure versus no renal failure, SDI renal damage score 0.33 versus 0.03; at 5 years: SDI renal damage score 1.33 versus 0.14; at 10 years: SDI renal damage score 2.80 versus 0.35).⁴⁶ The total SDI score was also associated with end stage renal failure at 5 and 10 years.⁴⁶ The SDI pulmonary damage score at 1 year after diagnosis was a significant predictor of death within 10 years, however total SDI score was not associated with death.⁴⁶ More recent studies with larger cohorts of patients have shown that the SDI is a predictor of mortality. Patients with SLE (N = 1,297) were identified within 2 years of a first clinical visit from 8 centres, and followed for 2, 5 to 10, and > 10 years.⁴⁵ The SDI increased over time and was found to be higher among patients who died.⁴⁵ In the University of Toronto Lupus Clinic, 263 patients were followed for 10 years.⁷⁵ Within 10 years, 25% of patients who exhibited damage at the first SDI assessment (i.e., 1 year after diagnosis) died, compared with 7.3% of patients who had no early signs of damage.⁷⁵

Minimal Important Difference

No formal MID has been assessed. An SDI of 1 or higher indicates damage which can remain stable or increase over time.⁴³

SELENA SLEDAI Flare Index

Description and Scoring

The SFI is used to identify and classify flares as mild/moderate or severe, based on clinical activity, need for additional treatment, or PGA score.³⁶ The original definitions of mild/moderate and severe flares were reached by consensus of the investigators of the SELENA trials.⁷² In the TULIP trials⁹^,¹⁰ a modified version of the SFI was used, using the SLEDAI-2K instead of the SELENA SLEDAI. In the pivotal trials, mild/moderate flare and severe flare were defined according to the following criteria:

Mild or moderate flare:
- change in SLEDAI-2K score of ≥ 3 points but < 7 points compared to previous visit, or
- new or worse discoid, photosensitive, profundus, cutaneous vasculitis, or bullous lupus, or
- nasopharyngeal ulcers, pleuritis, pericarditis, arthritis, or SLE fever, or
- ≥ 1.0 increase in PGA score (not > 2.5).
Severe flare:
- change in SLEDAI-2K score ≥ 7 points compared to previous visit, or
- new or worse CNS-SLE, vasculitis, nephritis, myositis, hemolytic anemia (Hb < 70 g/L or decrease in Hb > 30 g/L with positive Coombs) AND at least 1 of the following: decreased haptoglobin, increased total bilirubin not due to Gilbert’s disease, increased reticulocyte count, or
- hospitalization for SLE, or
- increase in PGA score to > 2.5.⁹^,¹⁰

Validity

In a post hoc analysis of BLISS-52 trial data with 867 SLE patients, the occurrence of a new SFI flare using the SELENA SLEDAI was associated with a significant change in the FACIT-F and all domains of the SF-36v2 except role emotional scores, indicating convergent validity.⁶⁰ In a small study of 16 patients who were each evaluated by 4 physicians, there was 52% agreement between the SFI and BILAG-2004 flare index in classifying patients as having no flare, or mild, moderate or severe flare.²⁴ It was unclear, however, if this study used the SFI, or the modified SFI. The agreement among raters on the SFI was fair (ICC 0.21; 95% CI, 0.08 to 0.48), and lower than the BILAG 2004 assessment of flares.²⁴

Reliability

A study evaluated the modified SFI using paper-based cases of patients with SLE.⁶¹ Initially, 988 cases were assessed by 3 physicians for degree of flare or presence of disease activity and rated as severe, moderate, or mild flare, or persistent/ongoing disease. For those cases where there was agreement by the 3 physicians (N = 451 cases), they were moved on the second part of the study and assessed by 18 pairs of physicians with 3 instruments, BILAG-2004 flare index, SFI, and modified SFI. The assessments based on these instruments were compared with the assessments conducted initially in the first stage of the study by the 3 physicians. For the modified SFI, assessments matched the conclusions of the thee physicians in 70% of cases (weighted kappa 0.74).⁶¹ The discrepancies were concentrated in classifying moderate flares as severe flares, and identifying persistent activity as a flare.⁶¹ There was also an issue of over-scoring due to classifying treatment change as a flare, even when there were no new or worsening clinical features.⁶¹ The authors of this study indicate that “the problem of capturing lupus flare accurately” is not completely solved.⁶¹

No literature was identified regarding the responsiveness of the instrument in SLE patients.

Lupus Low Disease Activity State

Description and Scoring

The LLDAS is a state that if sustained is “associated with a low likelihood of adverse outcome, considering disease activity and medication safety.”⁵³ The LLDAS is achieved by attaining all the following 5 criteria:

SLEDAI-2K ≤ 4, with no activity in major organ systems (renal, CNS, cardiopulmonary, vasculitis, fever) and no hemolytic anemia or gastrointestinal activity, and
no new lupus disease activity compared with the previous assessment (SLEDAI-2K), and
a PGA ≤ 1 (scale 0 to 3), and
a current prednisone (or equivalent) dose ≤ 7.5 mg daily, and
well-tolerated standard maintenance doses of immunosuppressive drugs and approved biologic drugs.⁵³

Validity

Criterion validity was assessed by comparing the LLDAS with damage accrual as measured by the SDI in a study of 191 SLE patients in Australia followed for an average of 3.9 years. For each patient, the LLDAS was measured at each visit and the SDI was completed annually following a baseline measurement.⁵³ Results found that patients who spent ≥ 50% of their observed time in LLDAS had significantly reduced organ damage accrual compared with patients who spent < 50% of their time in LLDAS (P = 0.0007) and were less likely to have an SDI increase of ≥ 1 (relative risk 0.47; 95% CI, 0.28 to 0.79), indicating good criterion validity. The minimum amount to time needed to spend in an LLDAS state to improve outcomes was not calculated due to an insufficient sample size.⁵³

In a post hoc analysis of the MUSE trial, LLDAS attainment was positively associated with, but more stringent than, standard end points.⁶² For example, 16.7% of all patients achieved LLDS at week 24, and of these patients, 80.4% achieved the primary end point of SRI-4 with OCS taper. However, of the 82 patients that achieved the primary end point, only 50% also met the LLDS criteria. Furthermore, patients who achieved LLDAS at week 52 had a 75.2% lower BILAG flare rate during the study, had lower PGA scores, and higher Lupus QoL scores compared with those who did not attain LLDAS at the same time point, indicating convergent validity.⁶² Similar results were found in another post hoc analysis, as 17.0% and 19.3% of patients who achieved an SRI-4 also attained LLDAS in BLISS-52 and BLISS-76, respectively.⁷⁶

No literature was identified regarding the reliability or responsiveness of the instrument in SLE patients.

Cutaneous Lupus Erythematosus Disease Area and Severity Index

Description and Scoring

The CLASI has 2 separate scores; 1 for each of disease activity and disease damage, both of which were used in the pivotal trials.⁴⁷^,⁴⁸ Disease activity is scored from 0 to 70 and is based on erythema, scale/hyperkeratosis, mucous membrane involvement, acute hair loss and nonscarring alopecia.⁴⁷^,⁴⁸ Disease damage is scored from 0 to 80 and consists of dyspigmentation and scarring, including scarring alopecia. If patients’ dyspigmentation has lasted for > 12 months, their dyspigmentation score is doubled.⁴⁷^,⁴⁸ CLASI describes the extent of disease in terms of the intensity of involvement measured in 13 different anatomic locations but does not record the percentage of body surface area or the number of lesions.⁴⁸

Validity

Convergent validity was assessed in a study of 31 patients with cutaneous lupus erythematosus, comparing the CLASI to the SLEDAI and SDI.⁴⁹ Results found a moderate correlation (r = 0.42) between CLASI activity and SLEDAI-rash and between total CLASI-damage and SDI-extensive scarring/panniculum (r = 0.51). A strong correlation (r = 0.94) was found between CLASI scalp scarring and the SDI-skin scarring/alopecia domains.⁴⁹

Reliability

One study had 9 patients with either subacute lupus erythematosus or discoid lupus erythematosus scored by 11 physicians in 2 sessions to estimate the instrument’s inter- and intra-rater reliability.⁴⁸ Results demonstrated good to excellent inter-rater reliability with ICC (95% CI) values of 0.86 (0.73 to 0.99) for the activity score and 0.92 (0.85 to 1.00) for the damage score. Good to excellent results were found for intra-rater reliability with Spearman’s q (95% CI) values of 0.96 (0.89 to 1.00) for the activity score and 0.99 (0.97 to 1.00) for the damage score.⁴⁸

Minimal Important Difference

In a study of 75 patients in the US with cutaneous lupus erythematosus or SLE, a clinically important improvement was associated with a mean 3-point or 18% decrease in the CLASI activity score.⁵⁰

No literature was identified regarding the responsiveness of the instrument in SLE patients.

Short Form (36) Health Survey Version 2

Description and Scoring

The SF-36 is a generic, self-reported health assessment questionnaire that has been used in clinical trials to study the impact of chronic disease on health-related quality of life.³⁸ There are 2 versions of the instrument including the original SF-36⁷⁷ and the SF-36 version 2 (SF-36v2).³⁸^,⁶³ Compared with the original SF-36, the SF-36v2 contains minor changes to the original survey, including changes to: instructions (reduced ambiguity), questions and answers (better layout), item-level response choices (increased), cultural/language comparability (increased), and elimination of a response option from the items in the mental health and vitality subscales.³⁸^,⁶³ The questionnaire consists of 36 items representing 8 subscales: physical functioning (PF; 10 items), role physical (RP; 4 items), bodily pain (BP; 2 items), general health (GH; 5 items), vitality (VT; 4 items), social functioning (SF; 2 items), role emotional (RE; 3 items), and mental health (MH; 5 items). The second question of the survey is a single item used to estimate the general health from a cross-sectional stand point.⁷⁸ The SF-36 has a recall period of 1 week in the pivotal trials⁹^,¹⁰ and item response options are presented on a 3- to 6-point, Likert-like scale.³⁸^,⁶³ Each item is converted to a score ranging from 0 to 100 where a higher value indicates a more favourable health state and item scores are averaged together to create the 8 subscale scores. The SF-36 also provides 2 component summaries, the PCS and MCS, which are created by aggregating the 8 subscales according to a scoring algorithm. The first 4 subscales (PF, RP, BP, and GH) belong to the PCS while the next 4 subscales (VT, SF, RE, and MH) make up the MCS. Like the individual items, the 8 subscale scores, the PCS, and the MCS are each measured from 0 to 100. Although several measures of HRQoL have been studied in SLE, the most used and accepted measure is the SF-36, a generic tool that can be used to make comparisons with other patient groups or to the population at large using the standardized PCS an MCS.³⁹^,⁶⁵

Validity

A literature review found that the Health Assessment questionnaire was strongly correlated with physical function scores of the SF-36 (r = 0.75) and moderately correlated with role physical, bodily pain, and vitality scores (r = 0.41 to 0.48); demonstrating convergent validity.⁶⁴

Reliability

Evidence suggests the instrument has good internal consistency reliability with a Cronbach alpha of ≥ 0.71 across various studies.³⁹

Responsiveness

Studies have suggested that the responsiveness of the instrument has been poor in patients with SLE with poor to moderate SRMs across studies.³⁹ For instance, in a study of 41 SLE patients, responsiveness was found in some domains among those who flared (i.e., SRM of moderate effect of 0.64 in role physical) and improved (i.e., SRM of moderate effect of 0.60 in MCS), but not among patients in remission, when compared to their previous visit.⁶⁵

Minimal Important Difference

Minimum important differences that are specific to SLE patients have been estimated in a literature review of 8 studies.³⁹ Anchor-based MIDs for improvement are estimated to be from 2.1 to 2.4 for summary scores and 2.8 to 10.9 in domains. These estimates are consistent with estimates from other rheumatological conditions (5 to 10 points for domains and 2.5 to 5 points for summary scores). In patients reporting worsening, 1 study noted MIDs ranging from −4.4 to −15.6 in the SF-36 domains.⁷⁹

Lupus Quality of Life

Description and Scoring

A 34-item SLE-specific health-related quality of life measure.⁴⁰ The instrument consists of 8 domains: physical health (8 items), pain (3 items), planning (3 items), intimate relationships (2 items), burden to others (3 items), emotional health (6 items), body image (5 items) and fatigue (4 items).

Validity

A recent literature review identified 7 studies which examined the psychometric properties of the instrument in patients with SLE.³⁹ The evidence suggests good construct validity, with correlations between comparable domains in the Lupus QoL and the SF-36 (r > 0.6) including physical health/physical functioning, emotional health/mental health, pain/bodily pain, and fatigue/vitality. Studies also indicated good convergent validity using the known-groups approach.³⁹ For example, in a study using the BILAG index to assess disease activity in 269 patients, patients with no disease activity (Es/Ds/Cs only) or mild activity (B in only 1 system) reported better Lupus QoL scores than those with moderate (B in ≥ 2 systems) or severe (A in any system) disease activity in all domains except fatigue.⁴⁰

Reliability

The literature review suggested good test-retest reliability in the patient population with an ICC ≥ 0.55 and good internal consistency reliability with Cronbach alpha ≥ 0.85 across all studies.³⁹ Content validity of the instrument was supported by rheumatology and/or medical experts in 4 studies and feedback was gathered from SLE patients to ensure readability and understandability of the tool.⁸⁰

Responsiveness

Regarding responsiveness, the effect size and SRMs were poor in most domains and inconsistent (poor to moderate) depending on the anchor being used.³⁹ For instance, in a study of 41 SLE patients, responsiveness was found in some domains when compared to the previous visit among patients who flared (i.e., a moderate SRM of 0.67 for fatigue) and improved (i.e., SRM of 0.73 in pain; 0.53 in fatigue, and 0.51 in physical health), but not among patients in remission.⁶⁵ Studies have validated non-English versions with similar results.⁸¹

Minimal Important Difference

MIDs derived using an anchor-based approach ranged from 2.4 to 8.7 for deterioration and from 3.5 to 7.3 for improvement. MIDs derived using distribution-based approaches based on 0.5 SD ranged from 12.9 to 16.7.³⁹

5-Level EQ-5D

Description and Scoring

The EQ-5D is a family of HRQoL instruments that may be applied to a wide range of health conditions and treatments.⁸²^,⁸³ The first of part of the EQ-5D-5L is a descriptive system that classifies respondents (aged ≥ 12 years) based on the following 5 dimensions: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. In 2005, updates were made to the original EQ-5D (i.e., the EQ-5D-3L) to create the EQ-5D-5L which includes 5 response levels (as opposed to the original 3 levels) of severity (no problems, slight problems, moderate problems, severe problems, unable to/extreme problems) in each of the dimensions.⁴¹ Respondents are asked to choose the level that reflects their health state for each dimension resulting in 3,125 possible health states.⁸⁴ A scoring function can be used to assign a value to self-reported health states from a set of population-based preference weights.⁸²^,⁸³ The second part is the EQ VAS, which has end points labelled 0 and 100, with respective anchors of “worst imaginable health state” and “best imaginable health state.” Respondents are asked to rate their health by drawing a line from an anchor box to the point on the EQ VAS which best represents their health on that day.

Validity

One study identified in a literature review examined the psychometric properties of the EQ-5D-3L in patients with SLE.³⁹^,⁴² Related domains on the EQ-5D-3L and the SF-36 had a strong correlation (i.e., r = 0.60 for SF-36 mobility and EQ-5D-3L physical functioning) and unrelated domains had a weak/moderate correlation (i.e., r = −0.27 for SF-36 general health perception and EQ-5D-3L pain/discomfort), demonstrating convergent and divergent validity. Evidence of known-groups validity was demonstrated as the instrument was able to discriminate between patients with higher disease activity (SLEDAI > 5) versus those with lower disease activity (SLEDAI ≤ 5). Those with lower disease activity had a higher mean (SD) EQ-5D-3L score of 0.75 (0.18) compared to those with lower disease activity with a mean (SD) score of 0.69 (0.19). However, the instrument was not able to significantly discriminate between patients with high disease damage (SDI > 2) versus those with lower disease damage (SDI ≤ 2). The study suggested that the responsiveness of the instrument has been poor in patients with SLE when comparing self-reported change in health and the EQ VAS. Effect sizes ranged from 0.08 to 0.27 in patients who self-identified as deteriorated and 0.35 to 0.43 in patients who self-identified as improved. Evidence suggests that the instrument was not responsive to longitudinal changes in disease activity measured in 66 patients based on SLEDAI scores with effect sizes of 0.01 in patients who deteriorated (SLEDAI increase > 3) and 0.12 in patients who improved (SLEDAI decrease > 3).

Reliability

One study assessed the reliability of the EQ-5D-5L among 100 SLE patients by determining the ICC for the EQ-5D VAS and kappa coefficients for EQ-5D-5L domains calculated in 2 assessments, 2 to 4 weeks apart, in patients whose self-assessed quality of life was rated as no change on a 15-point health status change scale (−7 to + 7).⁸⁵ Results found an ICC (95% CI) for the VAS of 0.793 (0.707 to 0.856), indicating good reliability. The kappa coefficients were strong for all EQ-5D-5L domains (> 0.79) expect for anxiety/depression (0.28).⁸⁵

No literature was identified regarding the responsiveness of the instrument in SLE patients. SLE-specific MIDs for the EQ-5D have not been reported.

Numerical Rating Score

Description and Scoring

The pivotal trials measured patient-reported pain with an 11-point scale (0 no pain; 10 worst imaginable) with a 1-week recall period.⁹^,¹⁰

Reliability

A cross-sectional study in Peru, had 204 SLE patients rate their pain on a NRS from 0 (no disease activity) to 4 (the most disease activity possible) with a 1-week recall period.⁵² Patients competed the scale twice, before and after an encounter with a physician to assess the reliability of the instrument. Results found that the mean (SD) NRS rating among patients was 1.5 (1.2) before and 1.4 (1.1) after the physician encounter with a Spearman rank correlation coefficient of 0.84, indicating acceptable test-retest reliability. The differences between mean scores were smaller among patients receiving a comprehensive care program versus those receiving standard care which suggests that a comprehensive care program could reduce the variability of patients measuring their disease activity.⁵²

No literature was identified regarding the validity or responsiveness of the instrument in SLE patients. An MID was not identified for the Pain NRS in SLE patients.

FACIT-F Score

Description and Scoring

The FACIT-F is completed by patients to assess fatigue. In the pivotal trials, patients were presented with a list of 13 statements (i.e., “I am too tired to eat”) and asked to rate each on a 4-point Likert scale (0 = not at all, 1 = a little bit, 2 = somewhat, 3 = quite a bit, and 4 = very much), to indicate how true the statement was during the past 7 days.⁹^,¹⁰ Final scores are the sum of the responses and range from 0 to 52; items are reverse-scored and higher scores indicate better quality of life.⁹^,¹⁰

Validity

The FACIT-F was validated in patients with SLE by Lai et al.⁵¹ Patients with moderately to severely active extrarenal SLE (N = 254) completed the FACIT-F, Short Form-36 (SF-36), Brief Pain Inventory, and a patient global assessment VAS at baseline, week 12, week 24, and week 52.⁵¹ Physicians also completed the BILAG and PGA at the same visits. The FACIT-F was able to differentiate between groups that were defined by BILAG General domain and Musculoskeletal domain ratings at 12 weeks.⁵¹ Using the Spearman correlation coefficient, the FACIT-F was found to be moderately to strongly correlated with: the SF-36 (r = 0.69 to 0.87 at week 52), Brief Pain Inventory (r = −0.72 to −0.82 at week 52) and patient global assessment (r = −0.76 at week 52).⁵¹ However, the correlations of FACIT-F with total BILAG score and PGA at week 52 were weak, at −0.25 and −0.21, respectively.⁵¹ In a phase IIb trial that randomized 547 patients with SLE to blisibimod or placebo, FACIT-F was weakly to moderately correlated with PGA (r = −0.32, P < 0.001), SELENA SLEDAI (−0.13, P = 0.006), and BILAG r = (−0.18, P < 0.001).⁸⁶ The FACIT-F was responsive to clinical improvement but not clinical deterioration.³⁹

Reliability

In a post hoc analysis of 2,520 SLE patients in BLISS-SC, BLISS-52, and BLISS-76 trials, the FACIT-F showed good internal consistency reliability (Cronbach alpha > 0.90) and good test-retest reliability with an ICC of 0.84 in the pooled results which ranged from 0.76 to 0.92 in each individual trial.⁸⁷

Minimal Important Difference

The study by Lai et al. included estimation of MIDs for the FACIT-F with anchor and distribution-based techniques.⁵¹ The anchors were based on the General and Musculoskeletal domains of the BILAG. These were selected as anchors for the FACIT-F because the General domain contains physician assessment of fatigue and malaise, and the Musculoskeletal domain contains assessment of pain, which is associated with fatigue.⁵¹ The anchor-based MIDs were estimated from cross-sectional (i.e., comparing mean FACIT-F scores across groups defined by BILAG disease activity at each assessment) and longitudinal analyses (i.e., changes in FACIT-F with changes in BILAG disease activity between consecutive assessments).⁵¹ Changes in BILAG disease activity were classified as more active, less active, or stable (with stable defined as change from BILAG D/E to C or vice versa).⁵¹ The anchor-based MIDs ranged from 2.5 to 8.4 points.⁵¹ The distribution-based MIDs fell within this range (based on one-third SD: 3.8 to 4.6 points; one-half SD: 5.8 to 6.8 points; standard error of the mean = 2.7 to 2.9 points).⁵¹

No literature was identified regarding the responsiveness of the instrument in SLE patients.

8-Item Patient Health Questionnaire

The PHQ-8 Assesses symptoms of depression over the last 2 weeks using 8 of the 9 criteria on which the diagnosis of depressive disorders according to the Diagnostic and Statistical Manual of Mental Disorders (Fourth Edition) is based.⁵⁴ Each item’s score range from 0 (not at all) to 3 (nearly everyday). The scores for each item are summed to produce a total score between 0 and 24 points. A total score greater than 10 is considered indicative of major depression and greater than 20 is considered indicative of severe major depression.⁵⁴ The PHQ-8 is completed by the patient and scored by the investigator. No evidence related to the validity, reliability, responsiveness or MID of the instrument among SLE patients was identified.

Columbia Suicide Severity Rating Scale

Assesses the lethality of attempts and other features of ideation (frequency, duration, controllability, reasons for ideation, and deterrents), all of which are significantly predictive of completed suicide.⁵⁵ In the pivotal trials,⁹^,¹⁰ 2 different versions of the questionnaire were used: 1 assessing the last 12 months before the assessment and another assessing the time since the last visit. Suicidal ideation was defined as a “yes” answer at any time in the respective study period to any 1 of the 5 (re-ordered) suicidal ideation questions, ranging from category 1 (“wish to be dead”) to category 5 (“active suicidal ideation with specific plan and intent”) on the C-SSRS. Suicidal behaviour was defined as a “yes” answer at any time in the respective study period, to any 1 of the 5 (re-ordered) suicidal behaviour questions, ranging from category 6 (“preparatory acts or behaviour”) to category 10 (“completed suicide”) on the C-SSRS. No evidence related to the validity, reliability, responsiveness or MID of the instrument among SLE patients was identified.

Appendix 5. Summary of Pooled Data (TULIP-1 and TULIP-2)

Note that this appendix has not been copy-edited.

Methods

Pooled data from the phase III TULIP-1 and TULIP-2 trials in patients with moderate to severe SLE were analyzed to determine anifrolumab's effect on flares, including flares in patients with glucocorticoid taper.

Populations

Data were pooled for the placebo and anifrolumab 300-mg treatment groups in the phase III TULIP-1 (n = 364) and TULIP-2 (n = 362) trials. Of these 726 patients, 366 received placebo (184 in TULIP-1, 182 in the TULIP-2 trial), and 360 received anifrolumab 300 mg (180 patients in each trial). Patients in both trials were randomized to receive IV infusions of placebo or anifrolumab every 4 weeks for 48 weeks in addition to standard therapy, with a 52-week treatment period. For patients receiving oral glucocorticoid > 10 mg/day (prednisone or equivalent) at baseline, a protocol-mandated attempt to taper to < 7.5 mg/day was required between weeks 8 and 40; tapering was also permitted for patients receiving oral glucocorticoid < 10 mg/day at baseline. Stable oral glucocorticoid dose was required in all patients between weeks 40 and 52.

In the pooled TULIP data, baseline demographics, disease characteristics, and SLE medications were generally similar between the anifrolumab and placebo groups. In the pooled data, at baseline, 82.8% and 82.5% of patients had a high IFNGS and 17.2% and 17.5% had a low IFNGS in each of the in the anifrolumab and placebo groups, respectively. In the pooled anifrolumab and placebo groups, 80.8% and 83.1% of patients were receiving glucocorticoids and 52.8% and 50.5% were receiving glucocorticoids of ≥ 10 mg/d, respectively. The most prevalent baseline BILAG-2004 A or B disease activity scores occurred in the musculoskeletal (88.8%) and the mucocutaneous (86.4%) domains and were balanced across treatment groups.

Table 45

Summary of Baseline Characteristics in Pooled TULIP Data.

Outcomes

In the pooled analysis flares were characterized as at least 1 new A or at least 2 new B scores on the BILAG-2004 versus the prior visit. The pooled analysis aimed to evaluate the effects of anifrolumab on flares during the TULIP-1 and TULIP-2 trials, including numbers of flares per patient, annualized flare rates, time to first flare and time spent flare free, flares in individual organ domains, flares within organ domains not affected at baseline, and flares in the subset of patients who were able to achieve sustained oral glucocorticoid taper. A sustained taper included patients who received at least 10 mg/d of oral glucocorticoids at baseline who achieved a dose reduction to 7.5 mg/d or lower by week 40 and maintained this reduction through week 52.

Statistical Analysis

Annualized flare rate was analyzed using a negative binomial regression model, in which the response variable was the number of flares up to week 52 or the discontinuation visit. The independent variables in the model included treatment groups and stratification factors including the SLEDAI-2K score at screening (< 10 points versus ≥ 10 points), OCS usage at baseline (≥ 10 mg/day versus < 10 mg/day of prednisone or equivalent), and the result of the interferon test at screening (positive versus negative). The model was adjusted for variations in exposure time. Time to first flare was evaluated using a Cox regression analysis, with treatment groups, stratification factors, and study as covariates. Responder rates, percentages, differences, and associated 95% confidence intervals were weighted and calculated using a stratified Cochran-Mantel-Haenszel approach with strata corresponding to the stratification factors used for randomization and an additional stratification factor for study in pooled data. Additional factors for study and study-by-treatment interactions were also included in the analysis of pooled data. Flare rates were determined in the subset of patients who attained a BICLA response, as opposed to using the primary response end point for each trial (i.e., SRI-4 in the TULIP-1 trial). For this subset analysis, TULIP-1 data were classified as responders/nonresponders according to the TULIP-2 revised restricted medication analytical rules to ensure that any patient taking an NSAID was not deemed a nonresponder. As these post hoc analyses were exploratory, there was no control for multiplicity and a significance level was not specified.

Patient Disposition

There were 726 patients in the TULIP-1 and TULIP-2 studies combined with 366 patients in the placebo group (n = 184 in the TULIP-1 trial and n = 182 in the TULIP-2 trial) and 360 patients anifrolumab 300 mg/d group (n = 180 in each trial).

Exposure to Study Treatments

Exposure to study treatments was not examined in the pooled analysis.

Efficacy

Annualized Flare Rates, Total Number of Flares, and Time to First Flare

When comparing anifrolumab (n = 360) to placebo (n = 366) in the pooled data, the rate ratio (95% CI) of flares assessed using the BILAG-2004 scoring method was 0.75 (0.60 to 0.95) (Figure 9). Similar results were observed when flares were assessed using the modified flare analysis. The median time to first flare, assessed using the BILAG-2004 scoring method with standard flare analysis, was 140 days for patients receiving anifrolumab (range 24 to 376 days) versus 119 days for placebo (range 21 to 370 days) with a hazard ratio (95% CI) of 0.70 (0.55 to 0.89) (Figure 2).

A forest plot of annualized flare rates through week 52 in the TULIP-1 and TULIP-2 trials, and pooled TULIP data. When comparing anifrolumab (n = 360) to placebo (n = 366) in the pooled data, the rate ratio of flares assessed using the BILAG-2004 scoring method was 0.75 (95% CI, 0.60 to 0.95).

Figure 9

Annualized Flare Rates Through Week 52 in TULIP-1, TULIP-2, and Pooled TULIP Data^a.

Graph of time to first flare in the TULIP-1 trial, TULIP-2 trial, and pooled TULIP data. The median time to first flare, assessed using the BILAG-2004 scoring method with standard flare analysis, was 140 days for patients receiving anifrolumab (range 24 to 376 days) versus 119 days for placebo (range 21 to 370 days) with a hazard ratio of 0.70 (95% CI, 0.55 to 0.89) with a P value of 0.003.

Figure 10

Time to First Flare in TULIP-1, TULIP-2, and Pooled TULIP Data^a.

Flares per Patient and Flare Severity

In the pooled data, the proportion of patients that were flare free was 66.4% and 57.1% in each of the anifrolumab and placebo groups, respectively. The proportion of patients (95% CI) with > 1 flare was 33.6% in the anifrolumab group and 42.9% in the placebo groups, with a difference (95% CI) of −9.3 (−26.3 to −2.3). The proportion of patients with ≥ 3 flares was 5.3% in the anifrolumab group and 5.2% in the placebo group. Among IFNGS-high patients, 33.6% of patients had ≥ 1 flare with anifrolumab and 44.7% with placebo. In IFNGS-low patients, 33.9% of patients had ≥ 1 flare with anifrolumab and 34.4% with placebo.

Flares by Organ Domain

Flares were assessed in each of the 9 BILAG-2004 organ domains in the pooled TULIP population, with most flares occurring in the mucocutaneous (24.8%) and musculoskeletal (22.5%) domains. A total of 22.8% and 19.4% of patients in the anifrolumab group versus 26.8% and 25.4% in the placebo group has ≥ 1 flare in the mucocutaneous and musculoskeletal domains, respectively.

Flares and Oral Glucocorticoid Taper

Among patients with baseline oral glucocorticoid ≥ 10 mg/day, 50.5% (n = 96) achieved sustained oral glucocorticoid dose reduction to 7.5 mg/day with anifrolumab versus 31.8% (n = 36) with placebo. Among these patients with sustained oral glucocorticoid taper, 79.2% (n = 76) were flare free through week 52 with anifrolumab versus 54.2% (n = 32) with placebo. Patients who were not able to taper oral glucocorticoids, there was no difference in the percentage of patients who were flare free between the anifrolumab (50.0%, n = 47) and placebo group (48.4%, n = 61).

Flares and BICLA Response

A total of 78.9% and 69.6% of BICLA responders had no flares through week 52 in each of the pooled anifrolumab and placebo groups, respectively. The proportion of BICLA responders at week 52 (n = 283) with > 1 flare was 21.1% and 30.4% in each of the anifrolumab and placebo group, respectively. The mean (SD) annualized flare rate per patient was 0.29 (0.644) with anifrolumab versus 0.42 (0.721) with placebo. In BICLA nonresponders (n = 443), 45.0% had ≥ 1 flare through week 52 with anifrolumab compared to 48.4% with placebo. The mean (SD) the annualized flare rate per patient was 0.84 (1.158) with anifrolumab versus 0.42 (0.721) with placebo.

Critical Appraisal

Baseline data, inclusion and exclusion criteria, implementation approaches, and outcome measures were similar in the TULIP-1 and TULIP-2 trials, reducing the between-study heterogeneity. None of the P values were adjusted for multiplicity and the post hoc nature of the pooled analysis can be considered hypothesis-generating. The interpretation of results is also limited by the fact that neither of the individual trials were powered for analyses of flares within organ domains or in subgroups of patients able to taper glucocorticoids, limiting the ability to determine a true effect. In addition, the between-group difference was only conducted for the outcomes time to first flare, and annualized flare rates, hence it is unknow what is the incremental benefit of anifrolumab over placebo for the other outcomes, and whether the difference between treatment groups is clinically meaningful.

References

1.: Wallace DJ. Overview of the management and prognosis of systemic lupus erythematosus in adults. In: Post TW, ed. UpToDate. Waltham (MA): UpToDate; 2021: www.uptodate.com. Accessed 2022 Feb 17.
2.: Lupus Canada. What is lupus. 2020; https://www.lupuscanada.org/living-with-lupus/what-is-lupus/. Accessed 2022 Apr 28.
3.: Schur PH, Hahn BH. Epidemiology and pathogenesis of systemic lupus erythematosus. In: Post TW, ed. UpToDate. Waltham (MA): UpToDate; 2021: www.uptodate.com. Accessed 2022 Feb 17.
4.: Carter EE, Barr SG, Clarke AE. The global burden of SLE: prevalence, health disparities and socioeconomic impact. Nat Rev Rheumatol. 2016;12(10):605-620. [PubMed: 27558659]
5.: Lupus Canada. Lupus Q&A: Ask the Experts. 2020; https://lupuscanada.org/living-with-lupus/lupus-qa-ask-the-experts/. Accessed 2022 Mar 14.
6.: Ibanez D, Gladman DD, Urowitz MB. Adjusted mean Systemic Lupus Erythematosus Disease Activity Index-2K is a predictor of outcome in SLE. J Rheumatol. 2005;32(5):824-827. [PubMed: 15868616]
7.: John Hopkins Lupus Center. Treating lupus with steroids. 2022; https://www.hopkinslupus.org/lupus-treatment/lupus-medications/steroids/. Accessed 2022 Apr 28.
8.: Saphnelo (anifrolumab): 150 mg/mL single dose, sterile vial solution for intravenous infusion [product monograph]. Mississauga (ON): AstraZeneca Canada Inc.; 2021 Nov 30.
9.: Clinical Study Report: D3461C00005. TULIP 1: A multicentre, randomised, double-blind, placebo-controlled, phase III study evaluating the efficacy and safety of two doses of anifrolumab in adult subjects with active systemic lupus erythematosus [internal sponsor's report]. Gaithersburg (MD): AstraZeneca PLC; 2019 May 20.
10.: Clinical Study Report: D3461C00004. TULIP 2: A multicentre, randomised, double-blind, placebo-controlled, phase III study evaluating the efficacy and safety of anifrolumab in adult subjects with active systemic lupus erythematosus [internal sponsor's report]. Gaithersburg (MD): AstraZeneca PLC; 2019 Dec 16.
11.: Clinical Study Report: CD-IA-MEDI-546-1013. MUSE: A phase 2, randomized study to evaluate the efficacy and safety of MEDI-546 in subjects with systemic lupus erythematosus [internal sponsor's report]. Gaithersburg (MD): AstraZeneca PLC; 2016 Mar 07.
12.: Clinical Study Report: CD-IA-MEDI-546-1145. A phase 2, open-label extension study to evaluate long-term safety of MEDI-546 in adults with systemic lupus erythematosus [internal sponsor's report]. Gaithersburg (MD): AstraZeneca PLC; 2018 Dec 05.
13.: Clinical Study Report: D3461C00009. A Multicentre, Randomised, Double-blind, Placebo-Controlled Phase III Extension Study to Characterise the Long-term Safety and Tolerability of Anifrolumab in Adult Patients with Active Systemic Lupus Erythematosus [internal sponsor's report]. AstraZeneca; 6 October 2022.
14.: Schur PH, Wallace DJ. Arthritis and other musculoskeletal manifestations of systemic lupus erythematosus. In: Post TW, ed. UpToDate. Waltham (MA): UpToDate; 2022: www.uptodate.com. Accessed 2022 Feb 17.
15.: Cojocaru M, Cojocaru IM, Silosi I, Vrabie CD. Manifestations of systemic lupus erythematosus. Maedica (Bucur). 2011;6(4):330-336. [PMC free article: PMC3391953] [PubMed: 22879850]
16.: Health Canada. Regulatory decision summary for Benlysta. 2021; https://hpr-rps.hres.ca/reg-content/regulatory-decision-summary-detail.php?linkID=RDS00843. Accessed 2022 Apr 28.
17.: McGowan J, Sampson M, Salzwedel DM, Cogo E, Foerster V, Lefebvre C. PRESS Peer Review of Electronic Search Strategies: 2015 guideline statement. J Clin Epidemiol. 2016;75:40-46. [PubMed: 27005575]
18.: Grey matters: a practical tool for searching health-related grey literature. Ottawa (ON): CADTH; 2019: https://www.cadth.ca/grey-matters. Accessed 2022 Feb 16.
19.: Furie RA, Morand EF, Bruce IN, et al. Type I interferon inhibitor anifrolumab in active systemic lupus erythematosus (TULIP-1): a randomised, controlled, phase III trial. Lancet Rheumatol. 2019;1(4):e208-e219.
20.: Morand EF, Furie R, Tanaka Y, et al. Trial of Anifrolumab in Active Systemic Lupus Erythematosus. N Engl J Med. 2020;382(3):211-221. [PubMed: 31851795]
21.: Furie R, Wang L, Illei G, Drappa J. Systemic Lupus Erythematosus (SLE) Responder Index response is associated with global benefit for patients with SLE. Lupus. 2018;27(6):955-962. [PubMed: 29460699]
22.: Mahmoud K, Zayat AS, Yusof Y, et al. Responsiveness of clinical and ultrasound outcome measures in musculoskeletal systemic lupus erythematosus. Rheumatology (Oxford). 2019;58(8):1353-1360. [PMC free article: PMC6649792] [PubMed: 30608614]
23.: Furie R, Morand EF, Bruce IN, et al. What Does It Mean to Be a British Isles Lupus Assessment Group-Based Composite Lupus Assessment Responder? Post Hoc Analysis of Two Phase III Trials. Arthritis Rheumatol. 2021;73(11):2059-2068. [PMC free article: PMC8596929] [PubMed: 33913260]
24.: Isenberg DA, Allen E, Farewell V, et al. An assessment of disease flare in patients with systemic lupus erythematosus: a comparison of BILAG 2004 and the flare version of SELENA. Ann Rheum Dis. 2011;70(1):54-59. [PubMed: 20833737]
25.: Cresswell L, Yee CS, Farewell V, et al. Numerical scoring for the Classic BILAG index. Rheumatology (Oxford). 2009;48(12):1548-1552. [PMC free article: PMC2777486] [PubMed: 19779027]
26.: Mikdashi J, Nived O. Measuring disease activity in adults with systemic lupus erythematosus: the challenges of administrative burden and responsiveness to patient concerns in clinical research. Arthritis Res Ther. 2015;17:183. [PMC free article: PMC4507322] [PubMed: 26189728]
27.: Yee CS, Isenberg DA, Prabu A, et al. BILAG-2004 index captures systemic lupus erythematosus disease activity better than SLEDAI-2000. Ann Rheum Dis. 2008;67(6):873-876. [PubMed: 17519277]
28.: Yee CS, Farewell V, Isenberg DA, et al. British Isles Lupus Assessment Group 2004 index is valid for assessment of disease activity in systemic lupus erythematosus. Arthritis Rheum. 2007;56(12):4113-4119. [PMC free article: PMC2659367] [PubMed: 18050213]
29.: Hay EM, Bacon PA, Gordon C, et al. The BILAG index: a reliable and valid instrument for measuring clinical disease activity in systemic lupus erythematosus. Q J Med. 1993;86(7):447-458. [PubMed: 8210301]
30.: Yee CS, Gordon C, Isenberg DA, et al. The BILAG-2004 systems tally--a novel way of representing the BILAG-2004 index scores longitudinally. Rheumatology (Oxford). 2012;51(11):2099-2105. [PMC free article: PMC3475981] [PubMed: 22908329]
31.: Touma Z, Urowitz MB, Gladman DD. Systemic lupus erythematosus disease activity index 2000 responder index-50 website. J Rheumatol. 2013;40(5):733. [PubMed: 23637378]
32.: Gladman DD, Ibanez D, Urowitz MB. Systemic lupus erythematosus disease activity index 2000. J Rheumatol. 2002;29(2):288-291. [PubMed: 11838846]
33.: Jesus D, Rodrigues M, Matos A, Henriques C, Pereira da Silva JA, Ines LS. Performance of SLEDAI-2K to detect a clinically meaningful change in SLE disease activity: a 36-month prospective cohort study of 334 patients. Lupus. 2019;28(5):607-612. [PubMed: 30895904]
34.: Yee CS, Farewell VT, Isenberg DA, et al. The use of Systemic Lupus Erythematosus Disease Activity Index-2000 to define active disease and minimal clinically meaningful change based on data from a large cohort of systemic lupus erythematosus patients. Rheumatology (Oxford). 2011;50(5):982-988. [PMC free article: PMC3077910] [PubMed: 21245073]
35.: Gladman DD, Urowitz MB, Kagal A, Hallett D. Accurately describing changes in disease activity in Systemic Lupus Erythematosus. J Rheumatol. 2000;27(2):377-379. [PubMed: 10685800]
36.: Furie RA, Petri MA, Wallace DJ, et al. Novel evidence-based systemic lupus erythematosus responder index. Arthritis Rheum. 2009;61(9):1143-1151. [PMC free article: PMC2748175] [PubMed: 19714615]
37.: Chessa E, Piga M, Floris A, Devilliers H, Cauli A, Arnaud L. Use of Physician Global Assessment in systemic lupus erythematosus: a systematic review of its psychometric properties. Rheumatology (Oxford). 2020;59(12):3622-3632. [PubMed: 32789462]
38.: Maruish M, Maruish ME, Kosinski K, et al. User's manual for the SF-36v2 health survey; 3rd edition. Lincoln (RD): Quality Metric Incorporated; 2011.
39.: Izadi Z, Gandrup J, Katz PP, Yazdany J. Patient-reported outcome measures for use in clinical trials of SLE: a review. Lupus Sci Med. 2018;5(1):e000279. [PMC free article: PMC6109821] [PubMed: 30167315]
40.: McElhone K, Abbott J, Shelmerdine J, et al. Development and validation of a disease-specific health-related quality of life measure, the LupusQol, for adults with systemic lupus erythematosus. Arthritis Rheum. 2007;57(6):972-979. [PubMed: 17665467]
41.: van Reenen M, Janssen B, Stolk E, et al. EQ-5D-5L user guide: basic information on how to use the EQ-5D-5L instrument, version 3.0. Rotterdam (NL): EuroQol Research Foundation; 2019: https://euroqol.org/wp-content/uploads/2021/01/EQ-5D-5LUserguide-08-0421.pdf. Accessed 2021 Nov 24.
42.: Aggarwal R, Wilke CT, Pickard AS, et al. Psychometric properties of the EuroQol-5D and Short Form-6D in patients with systemic lupus erythematosus. J Rheumatol. 2009;36(6):1209-1216. [PubMed: 19369452]
43.: Castrejon I, Tani C, Jolly M, Huang A, Mosca M. Indices to assess patients with systemic lupus erythematosus in clinical trials, long-term observational studies, and clinical care. Clin Exp Rheumatol. 2014;32(5 Suppl 85):S-85-95. [PubMed: 25365095]
44.: Gladman D, Ginzler E, Goldsmith C, et al. The development and initial validation of the Systemic Lupus International Collaborating Clinics/American College of Rheumatology damage index for systemic lupus erythematosus. Arthritis Rheum. 1996;39(3):363-369. [PubMed: 8607884]
45.: Romero-Diaz J, Isenberg D, Ramsey-Goldman R. Measures of adult systemic lupus erythematosus: updated version of British Isles Lupus Assessment Group (BILAG 2004), European Consensus Lupus Activity Measurements (ECLAM), Systemic Lupus Activity Measure, Revised (SLAM-R), Systemic Lupus Activity Questionnaire for Population Studies (SLAQ), Systemic Lupus Erythematosus Disease Activity Index 2000 (SLEDAI-2K), and Systemic Lupus International Collaborating Clinics/American College of Rheumatology Damage Index (SDI). Arthritis Care Res (Hoboken). 2011;63(Suppl 11):S37-46. [PMC free article: PMC3812450] [PubMed: 22588757]
46.: Stoll T, Seifert B, Isenberg DA. WDAE/ACR Damage Index is valid, and renal and pulmonary organ scores are predictors of severe outcome in patients with systemic lupus erythematosus. Br J Rheumatol. 1996;35(3):248-254. [PubMed: 8620300]
47.: Chakka S, Krain RL, Concha JSS, Chong BF, Merola JF, Werth VP. The CLASI, a validated tool for the evaluation of skin disease in lupus erythematosus: a narrative review. Ann Transl Med. 2021;9(5):431. [PMC free article: PMC8033342] [PubMed: 33842652]
48.: Albrecht J, Taylor L, Berlin JA, et al. The CLASI (Cutaneous Lupus Erythematosus Disease Area and Severity Index): an outcome instrument for cutaneous lupus erythematosus. J Invest Dermatol. 2005;125(5):889-894. [PMC free article: PMC3928016] [PubMed: 16297185]
49.: Jolly M, Kazmi N, Mikolaitis RA, Sequeira W, Block JA. Validation of the Cutaneous Lupus Disease Area and Severity Index (CLASI) using physician- and patient-assessed health outcome measures. J Am Acad Dermatol. 2013;68(4):618-623. [PubMed: 23107310]
50.: Klein R, Moghadam-Kia S, LoMonico J, et al. Development of the CLASI as a tool to measure disease severity and responsiveness to therapy in cutaneous lupus erythematosus. Arch Dermatol. 2011;147(2):203-208. [PMC free article: PMC3282059] [PubMed: 21339447]
51.: Lai JS, Beaumont JL, Ogale S, Brunetta P, Cella D. Validation of the functional assessment of chronic illness therapy-fatigue scale in patients with moderately to severely active systemic lupus erythematosus, participating in a clinical trial. J Rheumatol. 2011;38(4):672-679. [PubMed: 21239746]
52.: Elera-Fitzcarrald C, Vega K, Gamboa-Cardenas RV, et al. Reliability of Visual Analog Scale and Numeric Rating Scale for the Assessment of Disease Activity in Systemic Lupus Erythematosus. J Clin Rheumatol. 2020;26(7S Suppl 2):S170-S173. [PubMed: 31899713]
53.: Franklyn K, Lau CS, Navarra SV, et al. Definition and initial validation of a Lupus Low Disease Activity State (LLDAS). Ann Rheum Dis. 2016;75(9):1615-1621. [PubMed: 26458737]
54.: Kroenke K, Strine TW, Spitzer RL, Williams JB, Berry JT, Mokdad AH. The PHQ-8 as a measure of current depression in the general population. J Affect Disord. 2009;114(1-3):163-173. [PubMed: 18752852]
55.: Posner K, Oquendo MA, Gould M, Stanley B, Davies M. Columbia Classification Algorithm of Suicide Assessment (C-CASA): classification of suicidal events in the FDA's pediatric suicidal risk analysis of antidepressants. Am J Psychiatry. 2007;164(7):1035-1043. [PMC free article: PMC3804920] [PubMed: 17606655]
56.: Keene ON, Roger JH, Hartley BF, Kenward MG. Missing data sensitivity analysis for recurrent event data using controlled imputation. Pharm Stat. 2014;13(4):258-264. [PubMed: 24931317]
57.: Burman CF, Sonesson C, Guilbaud O. A recycling framework for the construction of Bonferroni-based multiple tests. Stat Med. 2009;28(5):739-761. [PubMed: 19142850]
58.: Zhang J, Quan H, Ng J, Stepanavage ME. Some statistical methods for multiple end points in clinical trials. Control Clin Trials. 1997;18(3):204-221. [PubMed: 9204221]
59.: Ward MM, Marx AS, Barry NN. Comparison of the validity and sensitivity to change of 5 activity indices in systemic lupus erythematosus. J Rheumatol. 2000;27(3):664-670. [PubMed: 10743805]
60.: Jolly M, Annapureddy N, Arnaud L, Devilliers H. Changes in quality of life in relation to disease activity in systemic lupus erythematosus: post-hoc analysis of the BLISS-52 Trial. Lupus. 2019;28(14):1628-1639. [PubMed: 31674267]
61.: Isenberg D, Sturgess J, Allen E, et al. Study of Flare Assessment in Systemic Lupus Erythematosus Based on Paper Patients. Arthritis Care Res (Hoboken). 2018;70(1):98-103. [PMC free article: PMC5767751] [PubMed: 28388813]
62.: Morand EF, Trasieva T, Berglind A, Illei GG, Tummala R. Lupus Low Disease Activity State (LLDAS) attainment discriminates responders in a systemic lupus erythematosus trial: post-hoc analysis of the Phase IIb MUSE trial of anifrolumab. Ann Rheum Dis. 2018;77(5):706-713. [PMC free article: PMC5909750] [PubMed: 29420200]
63.: Jenkinson C, Stewart-Brown S, Petersen S, Paice C. Assessment of the SF-36 version 2 in the United Kingdom. J Epidemiol Community Health. 1999;53:46-50. [PMC free article: PMC1756775] [PubMed: 10326053]
64.: Nantes SG, Strand V, Su J, Touma Z. Comparison of the Sensitivity to Change of the 36-Item Short Form Health Survey and the Lupus Quality of Life Measure Using Various Definitions of Minimum Clinically Important Differences in Patients With Active Systemic Lupus Erythematosus. Arthritis Care Res (Hoboken). 2018;70(1):125-133. [PubMed: 28320078]
65.: Touma Z, Gladman DD, Ibanez D, Urowitz MB. Is there an advantage over SF-36 with a quality of life measure that is specific to systemic lupus erythematosus? J Rheumatol. 2011;38(9):1898-1905. [PubMed: 21724700]
66.: EuroQol Research Foundation. EQ-5D-5L | About. 2021; https://euroqol.org/eq-5d-instruments/eq-5d-5l-about/. Accessed 2022 Jan 04.
67.: Murphy CL, Yee CS, Gordon C, Isenberg D. From BILAG to BILAG-based combined lupus assessment-30 years on. Rheumatology (Oxford). 2016;55(8):1357-1363. [PubMed: 26589244]
68.: Thanou A, Chakravarty E, James JA, Merrill JT. Which outcome measures in SLE clinical trials best reflect medical judgment? Lupus Sci Med. 2014;1(1):e000005. [PMC free article: PMC4225744] [PubMed: 25396057]
69.: Yee CS, Farewell V, Isenberg DA, et al. The BILAG-2004 index is sensitive to change for assessment of SLE disease activity. Rheumatology (Oxford). 2009;48(6):691-695. [PMC free article: PMC2681285] [PubMed: 19395542]
70.: Uribe AG, Vila LM, McGwin G, Jr., Sanchez ML, Reveille JD, Alarcon GS. The Systemic Lupus Activity Measure-revised, the Mexican Systemic Lupus Erythematosus Disease Activity Index (SLEDAI), and a modified SLEDAI-2K are adequate instruments to measure disease activity in systemic lupus erythematosus. J Rheumatol. 2004;31(10):1934-1940. [PubMed: 15468356]
71.: CDR submission: Benlysta (belimumab), subcutaneous (SC) autoinjector [CONFIDENTIAL sponsor's submission]. Mississauga (ON): GlaxoSmithKline Inc.; 2019 May 29.
72.: Petri M, Buyon J, Kim M. Classification and definition of major flares in SLE clinical trials. Lupus. 1999;8(8):685-691. [PubMed: 10568907]
73.: Shariati-Sarabi Z, Monzavi SM, Ranjbar A, Esmaily H, Etemadrezaie H. High disease activity is associated with high disease damage in an Iranian inception cohort of patients with lupus nephritis. Clin Exp Rheumatol. 2013;31(1):69-75. [PubMed: 23190627]
74.: Castrejon I, Rua-Figueroa I, Rosario MP, Carmona L. Clinical composite measures of disease activity and damage used to evaluate patients with systemic lupus erythematosus: A systematic literature review. Reumatol Clin. 2014;10(5):309-320. [PubMed: 25022441]
75.: Feld J, Isenberg D. Why and how should we measure disease activity and damage in lupus? Presse Med. 2014;43(6 Pt 2):e151-156. [PubMed: 24791651]
76.: Oon S, Huq M, Golder V, Ong PX, Morand EF, Nikpour M. Lupus Low Disease Activity State (LLDAS) discriminates responders in the BLISS-52 and BLISS-76 phase III trials of belimumab in systemic lupus erythematosus. Ann Rheum Dis. 2019;78(5):629-633. [PubMed: 30679152]
77.: Ware JE, Sherbourne CD. The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Med Care. 1992;30(6):473-483. [PubMed: 1593914]
78.: The Short-Form-36 health survey (Rand Corporation and John E. Ware Jr., 1990, revised 1996). Excerpt from Ian McDowell Measuring health: a guide to rating scales and questionnaires. New York (NY): Oxford University Press; 2006: http://www.med.uottawa.ca/courses/CMED6203/Index_notes/SF36%20fn%20.pdf. Accessed 2021 Apr 26.
79.: Devilliers H, Amoura Z, Besancenot JF, et al. Responsiveness of the 36-item Short Form Health Survey and the Lupus Quality of Life questionnaire in SLE. Rheumatology (Oxford). 2015;54(5):940-949. [PubMed: 25361539]
80.: Delis PC, Dowling J. An Integrative Review of the LupusQoL Measure. J Nurs Meas. 2020;28(2):E139-E174. [PubMed: 32430357]
81.: Carrion-Nessi FS, Marcano-Rojas MV, Freitas-DeNobrega DC, Romero Arocha SR, Antuarez-Magallanes AW, Fuentes-Silva YJ. Validation of the LupusQoL in Venezuela: A specific measurement of quality of life in patients with systemic lupus erythematosus. Reumatol Clin (Engl Ed). 2021. [PubMed: 34373232]
82.: Brooks R. EuroQol: the current state of play. Health Policy. 1996;37(1):53-72. [PubMed: 10158943]
83.: EuroQol Group. EuroQol--a new facility for the measurement of health-related quality of life. Health Policy. 1990;16(3):199-208. [PubMed: 10109801]
84.: Ameri H, Yousefi M, Yaseri M, Nahvijou A, Arab M, Akbari Sari A. Mapping the cancer-specific QLQ-C30 onto the generic EQ-5D-5L and SF-6D in colorectal cancer patients. Expert Rev Pharmacoecon Outcomes Res. 2019;19(1):89-96. [PubMed: 30173585]
85.: Wang SL, Hsieh E, Zhu LA, Wu B, Lu LJ. Comparative Assessment of Different Health Utility Measures in Systemic Lupus Erythematosus. Sci Rep. 2015;5:13297. [PMC free article: PMC4543990] [PubMed: 26293686]
86.: Petri MA, Martin RS, Scheinberg MA, Furie RA. Assessments of fatigue and disease activity in patients with systemic lupus erythematosus enrolled in the Phase 2 clinical trial with blisibimod. Lupus. 2017;26(1):27-37. [PubMed: 27353505]
87.: Rendas-Baum R, Baranwal N, Joshi AV, Park J, Kosinski M. Psychometric properties of FACIT-Fatigue in systemic lupus erythematosus: a pooled analysis of three phase III randomised, double-blind, parallel-group controlled studies (BLISS-SC, BLISS-52, BLISS-76). J Patient Rep Outcomes. 2021;5(1):33. [PMC free article: PMC8032841] [PubMed: 33830377]
88.: Furie R, Morand EF, Askanase AD, et al. Anifrolumab reduces flare rates in patients with moderate to severe systemic lupus erythematosus. Lupus. 2021;30(8):1254-1263. [PubMed: 33977796]

Copyright © 2023 - Canadian Agency for Drugs and Technologies in Health. Except where otherwise noted, this work is distributed under the terms of a Creative Commons Attribution-NonCommercial- NoDerivatives 4.0 International licence (CC BY-NC-ND).

Bookshelf ID: NBK596622

Contents

< Prev Next >

PubReader
Print View
Cite this Page
Anifrolumab (Saphnelo): CADTH Reimbursement Review: Therapeutic area: Systemic lupus erythematosus [Internet]. Ottawa (ON): Canadian Agency for Drugs and Technologies in Health; 2023 Apr. Clinical Review.
PDF version of this title (16M)

In this Page

Executive Summary
Introduction
Stakeholder Perspectives
Clinical Evidence
Discussion
Conclusions
Abbreviations
Literature Search Strategy
Excluded Studies
Detailed Outcome Data
Description and Appraisal of Outcome Measures
Summary of Pooled Data (TULIP-1 and TULIP-2)
References

Other titles in this collection

CADTH Reimbursement Reviews and Recommendations

Related information

PMC
PubMed Central citations
PubMed
Links to PubMed

Recent Activity

Clear Turn Off Turn On

Clinical Review - Anifrolumab (Saphnelo)
Clinical Review - Anifrolumab (Saphnelo)

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on