Getting Started

There are several ways of working with the IRW data. Below we will first describe how to get data from the IRW and then offer some suggestions for how to analyze it.

Getting IRW data

There are several ways of getting data from the IRW.

  • You can use the Data Browser to investigate individual datasets and then download them directly via Redivis.

  • You can also access IRW data programmatically. There are several ways of doing this that we describe below.

    • You can use a Redivis notebook. Consider some example workflows here.

    • You can use the Redivis API for R or Python (note that you will first need to generate and set an API token). Given that we anticipate this being a popular means of using the IRW, we elaborate on how this can be done in the next section.

Programmatic access of IRW data

Below we offer examples in both Python and R for how to access data programatically from the IRW. In R, we make use of the irwpkg which was written to help facilitate handling of IRW data in R; more details available here.

Code
# individual dataset
library(irwpkg)
df <- irw_fetch("4thgrade_math_sirt")
Code
import redivis

# individual dataset
dataset = redivis.user('datapages').dataset('item_response_warehouse')
df = dataset.table('4thgrade_math_sirt').to_pandas_dataframe()

Analysis of IRW data

We next provide a first example for working with IRW data. The below code blocks import multiple datasets from the IRW and compute some simple metadata (e.g., the number of responses). This should be a useful starting point for conducting your own analyses of the data.

A first analysis

Code
library(dplyr)
library(purrr)


compute_metadata <- function(df) {
  df <- df |> filter(!is.na(resp)) |> mutate(resp = as.numeric(resp))
  tibble(
    n_responses = nrow(df),
    n_categories = n_distinct(df$resp),
    n_participants = n_distinct(df$id),
    n_items = n_distinct(df$item),
    responses_per_participant = n_responses / n_participants,
    responses_per_item = n_responses / n_items,
    density = (sqrt(n_responses) / n_participants) * (sqrt(n_responses) / n_items)
  )
}

dataset_names <- c("4thgrade_math_sirt", "chess_lnirt", "dd_rotation")
tables<-irwpkg::irw_fetch(dataset_names)
summaries_list <- lapply(tables,compute_metadata)
summaries <- bind_rows(summaries_list)
summaries<-cbind(table=dataset_names,summaries)
summaries
table n_responses n_categories n_participants n_items responses_per_participant responses_per_item density
4thgrade_math_sirt 19920 2 664 30 30.000000 664.0 1.0000000
chess_lnirt 10240 2 256 40 40.000000 256.0 1.0000000
dd_rotation 1178 2 121 10 9.735537 117.8 0.9735537
Code
import pandas as pd
from math import sqrt
import redivis

dataset_names = ["4thgrade_math_sirt", "chess_lnirt", "dd_rotation"]

def compute_metadata(df):
    df = (df
          .loc[~df['resp'].isna()]
          .assign(resp=pd.to_numeric(df['resp']))
         )
    
    return pd.DataFrame({
        'n_responses': [len(df)],
        'n_categories': [df['resp'].nunique()],
        'n_participants': [df['id'].nunique()],
        'n_items': [df['item'].nunique()],
        'responses_per_participant': [len(df) / df['id'].nunique()],
        'responses_per_item': [len(df) / df['item'].nunique()],
        'density': [(sqrt(len(df)) / df['id'].nunique()) * (sqrt(len(df)) / df['item'].nunique())]
    })

dataset = redivis.user('datapages').dataset('item_response_warehouse')
def get_data_summary(dataset_name):
  df = pd.DataFrame(dataset.table(dataset_name).to_pandas_dataframe())
    
  summary = compute_metadata(df)
  summary.insert(0, 'dataset_name', dataset_name)
  return summary

summaries_list = [get_data_summary(name) for name in dataset_names]
summaries = pd.concat(summaries_list, ignore_index=True)
print(summaries)

Reformatting IRW data for use with other packages

Here is a slightly more complex example that takes advantage of irwpkg to easily fetch a dataset and to then compute the InterModel Vigorish contrasting predictings for the 2PL to predictions from the 1PL for an example dataset (using cross-validation across 4 folds; see also the documentation in the related imv package). Note the irw_long2resp function which is helpful for reformatting IRW data from long to wide.

Code
df<-irwpkg::irw_fetch("gilbert_meta_2")  #https://github.com/hansorlee/irwpkg
resp<-irwpkg::irw_long2resp(df)
resp$id<-NULL
##1pl/Rasch model
m0<-mirt::mirt(resp,1,'Rasch',verbose=FALSE)
##2pl
ni<-ncol(resp)
s<-paste("F=1-",ni,"
             PRIOR = (1-",ni,", a1, lnorm, 0.0, 1.0)",sep="")
model<-mirt::mirt.model(s)
m1<-mirt::mirt(resp,model,itemtype=rep("2PL",ni),method="EM",technical=list(NCYCLES=10000),verbose=FALSE)
##compute IMV comparing predictions from 1pl and 2pl
set.seed(8675309)
omega<-imv::imv.mirt(m0,m1)
mean(omega)
[1] 0.01276902