Code
# individual dataset
library(irwpkg)
<- irw_fetch("4thgrade_math_sirt") df
There are several ways of working with the IRW data. Below we will first describe how to get data from the IRW and then offer some suggestions for how to analyze it.
There are several ways of getting data from the IRW.
You can use the Data Browser to investigate individual datasets and then download them directly via Redivis.
You can also access IRW data programmatically. There are several ways of doing this that we describe below.
You can use a Redivis notebook. Consider some example workflows here.
You can use the Redivis API for R or Python (note that you will first need to generate and set an API token). Given that we anticipate this being a popular means of using the IRW, we elaborate on how this can be done in the next section.
Below we offer examples in both Python and R for how to access data programatically from the IRW. In R, we make use of the irwpkg
which was written to help facilitate handling of IRW data in R; more details available here.
We next provide a first example for working with IRW data. The below code blocks import multiple datasets from the IRW and compute some simple metadata (e.g., the number of responses). This should be a useful starting point for conducting your own analyses of the data.
library(dplyr)
library(purrr)
compute_metadata <- function(df) {
df <- df |> filter(!is.na(resp)) |> mutate(resp = as.numeric(resp))
tibble(
n_responses = nrow(df),
n_categories = n_distinct(df$resp),
n_participants = n_distinct(df$id),
n_items = n_distinct(df$item),
responses_per_participant = n_responses / n_participants,
responses_per_item = n_responses / n_items,
density = (sqrt(n_responses) / n_participants) * (sqrt(n_responses) / n_items)
)
}
dataset_names <- c("4thgrade_math_sirt", "chess_lnirt", "dd_rotation")
tables<-irwpkg::irw_fetch(dataset_names)
summaries_list <- lapply(tables,compute_metadata)
summaries <- bind_rows(summaries_list)
summaries<-cbind(table=dataset_names,summaries)
summaries
table | n_responses | n_categories | n_participants | n_items | responses_per_participant | responses_per_item | density |
---|---|---|---|---|---|---|---|
4thgrade_math_sirt | 19920 | 2 | 664 | 30 | 30.000000 | 664.0 | 1.0000000 |
chess_lnirt | 10240 | 2 | 256 | 40 | 40.000000 | 256.0 | 1.0000000 |
dd_rotation | 1178 | 2 | 121 | 10 | 9.735537 | 117.8 | 0.9735537 |
import pandas as pd
from math import sqrt
import redivis
dataset_names = ["4thgrade_math_sirt", "chess_lnirt", "dd_rotation"]
def compute_metadata(df):
df = (df
.loc[~df['resp'].isna()]
.assign(resp=pd.to_numeric(df['resp']))
)
return pd.DataFrame({
'n_responses': [len(df)],
'n_categories': [df['resp'].nunique()],
'n_participants': [df['id'].nunique()],
'n_items': [df['item'].nunique()],
'responses_per_participant': [len(df) / df['id'].nunique()],
'responses_per_item': [len(df) / df['item'].nunique()],
'density': [(sqrt(len(df)) / df['id'].nunique()) * (sqrt(len(df)) / df['item'].nunique())]
})
dataset = redivis.user('datapages').dataset('item_response_warehouse')
def get_data_summary(dataset_name):
df = pd.DataFrame(dataset.table(dataset_name).to_pandas_dataframe())
summary = compute_metadata(df)
summary.insert(0, 'dataset_name', dataset_name)
return summary
summaries_list = [get_data_summary(name) for name in dataset_names]
summaries = pd.concat(summaries_list, ignore_index=True)
print(summaries)
Here is a slightly more complex example that takes advantage of irwpkg
to easily fetch a dataset and to then compute the InterModel Vigorish contrasting predictings for the 2PL to predictions from the 1PL for an example dataset (using cross-validation across 4 folds; see also the documentation in the related imv
package). Note the irw_long2resp
function which is helpful for reformatting IRW data from long to wide.
df<-irwpkg::irw_fetch("gilbert_meta_2") #https://github.com/hansorlee/irwpkg
resp<-irwpkg::irw_long2resp(df)
resp$id<-NULL
##1pl/Rasch model
m0<-mirt::mirt(resp,1,'Rasch',verbose=FALSE)
##2pl
ni<-ncol(resp)
s<-paste("F=1-",ni,"
PRIOR = (1-",ni,", a1, lnorm, 0.0, 1.0)",sep="")
model<-mirt::mirt.model(s)
m1<-mirt::mirt(resp,model,itemtype=rep("2PL",ni),method="EM",technical=list(NCYCLES=10000),verbose=FALSE)
##compute IMV comparing predictions from 1pl and 2pl
set.seed(8675309)
omega<-imv::imv.mirt(m0,m1)
mean(omega)
[1] 0.01276902