Getting Started

There are several ways of working with the IRW data. Below we will first describe how to get data from the IRW and then offer some suggestions for how to analyze it.

Getting IRW data

There are several ways of getting data from the IRW.

You can use the Data Browser to investigate individual datasets and then download them directly via Redivis.
You can also access IRW data programmatically. There are several ways of doing this that we describe below.
- You can use a Redivis notebook. Consider some example workflows here.
- You can use the Redivis API for R or Python (note that you will first need to generate and set an API token). Given that we anticipate this being a popular means of using the IRW, we elaborate on how this can be done in the next section.

Programmatic access of IRW data

Below we offer examples in both Python and R for how to access data programatically from the IRW. In R, we make use of the irwpkg which was written to help facilitate handling of IRW data in R; more details available here.

R
Python

Code

# individual dataset
library(irwpkg)
df <- irw_fetch("4thgrade_math_sirt")

Code

import redivis

# individual dataset
dataset = redivis.user('datapages').dataset('item_response_warehouse')
df = dataset.table('4thgrade_math_sirt').to_pandas_dataframe()

Analysis of IRW data

We next provide a first example for working with IRW data. The below code blocks import multiple datasets from the IRW and compute some simple metadata (e.g., the number of responses). This should be a useful starting point for conducting your own analyses of the data.

Code

library(dplyr)
library(purrr)


compute_metadata <- function(df) {
  df <- df |> filter(!is.na(resp)) |> mutate(resp = as.numeric(resp))
  tibble(
    n_responses = nrow(df),
    n_categories = n_distinct(df$resp),
    n_participants = n_distinct(df$id),
    n_items = n_distinct(df$item),
    responses_per_participant = n_responses / n_participants,
    responses_per_item = n_responses / n_items,
    density = (sqrt(n_responses) / n_participants) * (sqrt(n_responses) / n_items)
  )
}

dataset_names <- c("4thgrade_math_sirt", "chess_lnirt", "dd_rotation")
tables<-irwpkg::irw_fetch(dataset_names)
summaries_list <- lapply(tables,compute_metadata)
summaries <- bind_rows(summaries_list)
summaries<-cbind(table=dataset_names,summaries)
summaries

table	n_responses	n_categories	n_participants	n_items	responses_per_participant	responses_per_item	density
4thgrade_math_sirt	19920	2	664	30	30.000000	664.0	1.0000000
chess_lnirt	10240	2	256	40	40.000000	256.0	1.0000000
dd_rotation	1178	2	121	10	9.735537	117.8	0.9735537

Code

import pandas as pd
from math import sqrt
import redivis

dataset_names = ["4thgrade_math_sirt", "chess_lnirt", "dd_rotation"]

def compute_metadata(df):
    df = (df
          .loc[~df['resp'].isna()]
          .assign(resp=pd.to_numeric(df['resp']))
         )
    
    return pd.DataFrame({
        'n_responses': [len(df)],
        'n_categories': [df['resp'].nunique()],
        'n_participants': [df['id'].nunique()],
        'n_items': [df['item'].nunique()],
        'responses_per_participant': [len(df) / df['id'].nunique()],
        'responses_per_item': [len(df) / df['item'].nunique()],
        'density': [(sqrt(len(df)) / df['id'].nunique()) * (sqrt(len(df)) / df['item'].nunique())]
    })

dataset = redivis.user('datapages').dataset('item_response_warehouse')
def get_data_summary(dataset_name):
  df = pd.DataFrame(dataset.table(dataset_name).to_pandas_dataframe())
    
  summary = compute_metadata(df)
  summary.insert(0, 'dataset_name', dataset_name)
  return summary

summaries_list = [get_data_summary(name) for name in dataset_names]
summaries = pd.concat(summaries_list, ignore_index=True)
print(summaries)

Reformatting IRW data for use with other packages

Here is a slightly more complex example that takes advantage of irwpkg to easily fetch a dataset and to then compute the InterModel Vigorish contrasting predictings for the 2PL to predictions from the 1PL for an example dataset (using cross-validation across 4 folds; see also the documentation in the related imv package). Note the irw_long2resp function which is helpful for reformatting IRW data from long to wide.

Code

df<-irwpkg::irw_fetch("gilbert_meta_2")  #https://github.com/hansorlee/irwpkg
resp<-irwpkg::irw_long2resp(df)
resp$id<-NULL
##1pl/Rasch model
m0<-mirt::mirt(resp,1,'Rasch',verbose=FALSE)
##2pl
ni<-ncol(resp)
s<-paste("F=1-",ni,"
             PRIOR = (1-",ni,", a1, lnorm, 0.0, 1.0)",sep="")
model<-mirt::mirt.model(s)
m1<-mirt::mirt(resp,model,itemtype=rep("2PL",ni),method="EM",technical=list(NCYCLES=10000),verbose=FALSE)
##compute IMV comparing predictions from 1pl and 2pl
set.seed(8675309)
omega<-imv::imv.mirt(m0,m1)
mean(omega)

[1] 0.01276902