Title: | Correcting Nonparametric Frontier Measurements for Outliers |
---|---|
Description: | Provides method used to check whether data have outlier in efficiency measurement of big samples with data envelopment analysis (DEA). In this jackstrap method, the package provides two criteria to define outliers: heaviside and k-s test. The technique was developed by Sousa and Stosic (2005) "Technical Efficiency of the Brazilian Municipalities: Correcting Nonparametric Frontier Measurements for Outliers." <doi:10.1007/s11123-005-4702-4>. |
Authors: | Kleber Morais de Sousa [aut, cre], Maria da Conceicao Sampaio de Sousa [aut], Paulo Aguiar do Monte [aut] |
Maintainer: | Kleber Morais de Sousa <[email protected]> |
License: | GPL-3 |
Version: | 0.1.0 |
Built: | 2025-02-13 03:57:44 UTC |
Source: | https://github.com/klebermsousa/jackstrap |
Histogram with Jackstrap Efficiency Indicators: This function builds graphics with distributions of efficiency indicators without outliers and complete sample. The outliers are defined by K-S Test.
hist_jack_ks(efficiency, model_hist_ks)
hist_jack_ks(efficiency, model_hist_ks)
efficiency |
is the jackstrap object created by jackstrap function. |
model_hist_ks |
is the desired graphic model. There are four kinds: 1- Density Histogram of efficiency indicator with complete sample and without outliers by K-S test; 2 - Histogram of efficiency with complete sample and without outliers by K-S test; 3 - Histogram of efficiency without ouliers by K-S test; 4 - Histogram of efficiency with complete sample. |
Return the plot with efficiency indicators with complete sample and/or without outliers by combination leverage level and K-S test;
#Build charts with efficiency indicators with jackstrap method and K-S test criterion hist_jack_ks(efficiency_ks, 1) hist_jack_ks(efficiency_ks, 2) hist_jack_ks(efficiency_ks, 3) hist_jack_ks(efficiency_ks, 4)
#Build charts with efficiency indicators with jackstrap method and K-S test criterion hist_jack_ks(efficiency_ks, 1) hist_jack_ks(efficiency_ks, 2) hist_jack_ks(efficiency_ks, 3) hist_jack_ks(efficiency_ks, 4)
Histogram with Jackstrap Efficiency Indicators: This function builds a graphic with indicator distributions without outliers and complete sample. The outliers are defined by heaviside step function method.
hist_jack_step(efficiency, model_hist_step)
hist_jack_step(efficiency, model_hist_step)
efficiency |
is the jackstrap object created by jackstrap function. |
model_hist_step |
is the desired graphic model. There are four kinds: 1- Density Histogram of efficiency indicators with complete sample and without outliers by heaviside step function; 2 - Histogram of efficiency with complete sample and without outliers by heaviside step function; 3 - Histogram of efficiency without ouliers by heaviside step function; 4 - Histogram of efficiency with complete sample. |
Return the plot with efficiency indicators with complete sample and/or without outliers by heaviside step function;
#Build charts with efficiency indicators with jackstrap method and heaviside criterion hist_jack_step(efficiency, 1) hist_jack_step(efficiency, 2) hist_jack_step(efficiency, 3) hist_jack_step(efficiency, 4)
#Build charts with efficiency indicators with jackstrap method and heaviside criterion hist_jack_step(efficiency, 1) hist_jack_step(efficiency, 2) hist_jack_step(efficiency, 3) hist_jack_step(efficiency, 4)
Jackstrap Method: Tool identifies outliers in Nonparametric Frontier. This function applies the developed technique by Sousa and Stosic (2005) Technical Efficiency of the Brazilian Municipalities: Correcting Nonparametric Frontier Measurements for Outliers.
jackstrap( data, ycolumn, xcolumn, bootstrap = 1000, perc_sample_bubble = 0.1, dea_method = "vrs", orientation_dea = "in", n_seed = NULL, repos = FALSE, num_cores = 1 )
jackstrap( data, ycolumn, xcolumn, bootstrap = 1000, perc_sample_bubble = 0.1, dea_method = "vrs", orientation_dea = "in", n_seed = NULL, repos = FALSE, num_cores = 1 )
data |
is the dataset with input and output used to measure efficiency; Dataset need to have this form: 1th column: name of DMU (string); 2th column: code of DMU (integer); n columns of output variables; n columns of input variables. |
ycolumn |
is the quantity of y columns of dataset. |
xcolumn |
is the quantity of x columns of dataset. |
bootstrap |
is the quantity of applied resampling. |
perc_sample_bubble |
is the percentage of sample in each bubble. |
dea_method |
is the dea method: "crs" is DEA with constant returns to scale (CCR); "vrs" is DEA with variable returns to scale; and "fdh" is Free Disposal Hull (FDH) with variable returns to scale. |
orientation_dea |
is the direction of the DEA: "in" for focus on inputs; and "out" for focus on outputs. |
n_seed |
is the code as seed used to get new random samples. |
repos |
identify if the resampling method is with reposition TRUE or not FALSE. |
num_cores |
is the number of cores available to process. |
Return the jackstrap object with information as follows: "mean_leverage" is leverage average for each DMU; "mean_geral_leverage" is general average of leverage and step function threshold; "sum_leverage" is accrued leverage on all resampling for each DMU; "count_dmu" is amount of each DMU was selected by bootstrap. "count_dmu_zero" is amount of each DMU was selected by bootstrap but it did not influence in others. "ycolumn" is the number of output variables; "xcolumn" is the number of input variables; "perc_sample_bubble" is the percentage of sample used in each bubble;"dea_method" is the model used in DEA analysis; "orientation_dea" is the orientation of DEA; ""bootstrap" is the amount of bubble used by jackstrap method; "type_obj" is type of object; "size_bubble" is the amount of DMU used in each bubble.
# Examples with the municipalities data. #Load package jackstrap library(jackstrap) #Load data example municipalities <- jackstrap::municipalities #Command measures efficiency with jackstrap method and heaviside criterion efficiency <- jackstrap (data=municipalities, ycolumn=2, xcolumn=1, bootstrap=1000, perc_sample_bubble=0.20, dea_method="vrs", orientation_dea="in", n_seed = 2000, repos=FALSE, num_cores=4)
# Examples with the municipalities data. #Load package jackstrap library(jackstrap) #Load data example municipalities <- jackstrap::municipalities #Command measures efficiency with jackstrap method and heaviside criterion efficiency <- jackstrap (data=municipalities, ycolumn=2, xcolumn=1, bootstrap=1000, perc_sample_bubble=0.20, dea_method="vrs", orientation_dea="in", n_seed = 2000, repos=FALSE, num_cores=4)
Jackstrap KS Method: Tool identifies outliers in Nonparametric Frontier. This function applies the developed technique by Sousa and Stosic (2005) Technical Efficiency of the Brazilian Municipalities: Correcting Nonparametric Frontier Meansurements for Outliers and to use the K-S test with criterion to define outliers.
jackstrap_ks(data, jackstrap_obj, num_cores = 1, perc = 0.9)
jackstrap_ks(data, jackstrap_obj, num_cores = 1, perc = 0.9)
data |
is the dataset with input and output used to measure efficiency; Dataset need to have this form: 1th column: name of DMU (string); 2th column: code of DMU (integer); n columns of output variables; n columns of input variables. |
jackstrap_obj |
is the object created by the function jackstrap. |
num_cores |
is the number of cores available to process. |
perc |
is the percentage of DMU analyzed by K-S test. |
Return the jackstrap object increased with informations as follows: "result_kstest_method" is p-values of K-S test obtained by removing sequencially one by one the high leverage DMU; "efficiency_ks_method" is efficiency indicators obtained by K-S test criterion.
#Command measures efficiency with jackstrap method and K-S test criterion efficiency_ks <- jackstrap_ks (data=municipalities, jackstrap_obj=efficiency, num_cores = 4)
#Command measures efficiency with jackstrap method and K-S test criterion efficiency_ks <- jackstrap_ks (data=municipalities, jackstrap_obj=efficiency, num_cores = 4)
Dataset of Municipalities of Bahia state in Brazil
municipalities
municipalities
A data frame with 489 rows (DMUs) and 3 variables (2 outputs and 1 inputs):
municipio
string variable with descriptions of the each local governments
cod
integer variable identifies each DMU for integer code
total_atend_amb_hosp_ab
float variable with public health services in local governments (output)
total_diversid
float variable with diversity of public services provide in local governments (output)
desp_saude
float variable with public service expeditures in local governments (input)
#Load data exemple municipalities <- jackstrap::municipalities
#Load data exemple municipalities <- jackstrap::municipalities
Plot Jackstrap KS: This function plots p-value of Kolmogorov-Smirnov Test in decreasing order of leverage.
plot_jackstrap_ks(data_plot, model_plot)
plot_jackstrap_ks(data_plot, model_plot)
data_plot |
is the jackstrap object created by jackstrap function. |
model_plot |
is the desired model. There are two models: 1 - The graphic shows the amount of removed DMU on x axis and p-value of K-S test on y axis; 2 - The graphic shows DMU code on x axis and p-value of K-S test on y axis. |
Return the plot with p-value of K-S test and removed DMU or DMU code.
##Plot the dispersion chart with p value of K-S test and amount of DMU removed. plot_jackstrap_ks(effic_ks, 1)
##Plot the dispersion chart with p value of K-S test and amount of DMU removed. plot_jackstrap_ks(effic_ks, 1)
Summary Jackstrap: This function shows the main outcomes with outlier technique developed by Sousa and Stosic(2005).
summary_jackstrap(object_jackstrap, data)
summary_jackstrap(object_jackstrap, data)
object_jackstrap |
is the jackstrap object created by jackstrap function. |
data |
is the dataset of research. |
Return the data frame with information as follows: "outliers_by_step_func" are the outliers by heaviside step function criterion; "outliers_by_ks" are the outliers by K-S test; "dmu_efficiency_by_step_func" are DMUs evaluated as efficient by heaviside step function criterion; "dmu_inefficiency_by_step_func" are the DMUs evaluated as maximum inefficient by heaviside step function criterion; "dmu_efficiency_ks" are DMUs evaluated as efficient by K-S test criterion; "dmu_inefficiency_by_ks" are the DMUs evaluated as maximum inefficient by K-S test criterion.
#Create object with the resume of efficiency measurement. summary_efficiency <- summary_jackstrap(efficiency_ks, municipalities)
#Create object with the resume of efficiency measurement. summary_efficiency <- summary_jackstrap(efficiency_ks, municipalities)