Talk:WikiJournal Preprints/Design effect

WikiJournal Preprints
Open access • Publication charge free • Public peer review

WikiJournal User Group is a publishing group of open-access, free-to-publish, Wikipedia-integrated academic journals. <seo title=" Wikiversity Journal User Group, WikiJournal Free to publish, Open access, Open-access, Non-profit, online journal, Public peer review "/>

<meta name='citation_doi' value=>

Article information

Submitting author: Tal Galili[i]  
Additional contributors: Wikipedia community

See author information ▼
  1. tal.galili@gmail.com

Plagiarism check edit

  Pass. Report from WMF copyvios tool detected only trivially short phrase overlap (e.g. "estimator for the variance of the weighted mean"). T.Shafee(Evo﹠Evo)talk 22:33, 8 February 2023 (UTC)Reply

Editor notes edit

In the ''Common types of weights'' section, the term "reliability weights" probably needs a definition (was mentioned in the WP version). T.Shafee(Evo﹠Evo)talk 22:30, 8 February 2023 (UTC)Reply

Thanks for the comment T.Shafee(Evo﹠Evo).
I've added a new reference for the definitions of types of weights (this one), and also decided to remove the term "reliability weights" from the article. This is because I couldn't find a good reference that defined it. For a discussion on this in crossvalided (stats in stackoverflow), see here. For the changes I've made, see here. Talgalili (discusscontribs) 18:00, 2 September 2023 (UTC)Reply

The preprint, as well as the current Wikipedia article, launches into technical language quite quickly, which isn’t ideal according to Wikipedia’s “Provide an accessible overview” guideline for what is called the lead section. Please attempt to provide a more accessible lead section. --Aoholcombe (discusscontribs) 08:20, 4 June 2023 (UTC)Reply

Thank you Aoholcombe, I agree with your comment. I've made a new abstract, and updated the introduction accordingly (you can see the changes here) Talgalili (discusscontribs) 08:03, 31 August 2023 (UTC)Reply
@Talgalili Second peer reviewer has submitted a second round of comments. Please view the PDF in this section and respond. Thanks. OhanaUnitedTalk page 05:44, 5 January 2024 (UTC)Reply

First peer review edit

 
reviewer-annotated pdf

Review by Anonymous expert solicited by the handling editor ,
These assessment comments were submitted on , and refer to this previous version of the article

Comments were provided as a PDF and uploaded by the editor here: https://en.wikiversity.org/wiki/File:Comments_on_Design_effect_articleAnonymized.pdf

Item 1

Response

Thanks, I'm happy to expend the article with your feedback to add more sources other than quotes from Kish.

Item 2

Response

Thanks, good point. I've added a mention of "measure of interest" in the abstract. Also, the introduction includes a clear mention of how the estimator of interest is intertwined with the definition of the Deff.

Item 3

Response

Thanks, good point. I've standardize the various notations of Deff across the article.

Item 4

Response

Thanks, this is a great point. I've revised the section on Deft to discuss how both the without-replacement aspect of the design might be ignored not just in the denominator but also the numerator. I've added the examples you've provided there.

Item 5

Response

This sentence basically meant that if we did Deff*var_SRS we would get the variance that includes all the complexities of the design. But after giving it some thought, this doesn't seem to add more information or clarity beyond what's already written, so I've removed it. I did reference this briefly in the "uses" section, so to indicate the Deff is not likely to be used for building confidence intervals.

Item 6

Response

This is good feedback. I've fixed the sentence so it makes it very clear that the Deft cannot be generalized across statistics and measurements. I also moved it to be a note (instead of a paragraph in the article). The reason I still kept it as a note is since it's (IMHO) a worthwhile comment about the original hope of the Deft (to be generalizable), and to see how followup works have not supported that aspiration.

Item 7

Response

Agreed. I've removed the "put differently" paragraph. Instead, I've added a clearer example with specific estimator and numbers. I also moved the Kish's formula from this section to the section of Kish's design effect.

Item 8a

Response

I've retitled the section to "Design effect depends on sampling design and statistical adjustments" and tried to improve the leading section to make it clearer. I also added a reference to "Introduction to Variance Estimation"

Item 8b

Response

I've improved the section describing the impact of estimating sampling design aspect (e.g.: post stratification etc.).

Item 8c

Response

Relating to the comment about "the Sources for unequal selection probabilities section" - thank you for the correction! I've fixed the text (to move to talk about using either SRS of selection of clusters in the first stage, or the second method which you provided). I also added the notations you've provided as a note in the text.

Item 9

Response

Thanks, this is very helpful. I've fixed the text to make it clearer, and also added the example you've provided as a note in the text.

Item 10

Response

To clarify I've added to the text: "Adjustments for non-coverage can lead to unequal survey weights.". If I understand correctly, non-coverage leads, by definition, to unequal probability of selection - since it means that the sampling frame has some items with some positive probabilities, and other items that have 0 probability of selection.

Item 11

Response

Fair point. I've removed the term ad hoc from this section, and attempted to clarify the sentence further.

Item 12

Response

Thanks. I've added a note with this clarification.

Item 13

Response

This is great input. I've removed most of Kish's comments from the usage section, as well as moved some of his claims to the "History" section, where I mentioned some of Kish's original intent for Deff, and how the applicability has diminished nowdays that we have more sophisticated software

Item 14

Response

Added citations to the software implementations you've mentioned (and also added citations to the ones already present)

Item 15

Response

Thanks, I've added the relevant Deff papers.

Second peer review edit

 
reviewer-annotated pdf

Review by Charles DiSogra   , Freelance Consultant
These assessment comments were submitted on , and refer to this previous version of the article

Accuracy

  • Is anything incorrectly stated?
    No
Response

Great, thanks.

  • Do the references support the statements being made?
    Yes although I could not locate the 2006 reference for “Park and Lee” in the “Alternative definitions” section.
Response

This is citation number 4, it also includes a link to a pdf of the paper.

  • Are any important recent papers missed?
    Not to my knowledge
Response

Great, thanks.

  • Are any references out of date or obsolete?
    Not to my knowledge however, much of the work has been around for a while. Ask Dr. Raphael Nishimura at Univ. of Michigan (survey sampling statistician) for thoughts overall and most recent papers, if there are any.
Response

Thanks. I sent him an email now, and hope to get his feedback soon. If so, I'll happily include it in the paper. TODO: give an update.

Balance

  • Does it reflect current thinking in the field?
    Yes - Article is as much about weighting as it is about Deff
Response

Great, thanks. In the future, it might make sense to split the article to other articles. But I think it's current shape is a good framing.

  • Is anything important missing (or cherry-picked)?
    There is no mention of replicate weighting.
Response

Thanks. After giving this some thought - I've decided to mention them in the uses section, when discussing how the Deff is not likely to be used for building confidence intervals (there I mentioned how an alternative to this could be to use the replication weights.

  • No discussion using trimming methods to reduce Deff for analysis purposes.
Response

Thanks. I added a mention of this to the "uses" section.

  • Mention of Neyman’s optimal allocation requires a reference in section Unequal selection probabilities, number 1, first bullet.
Response

Thanks. Done.

  • Are viewpoints given due weight given the existing literature?
    I would think so.
Response

Great, thanks.

  • Are any conclusions/perspectives/outlook/opinions/originalresearch clearly indicated?
    Nothing indicated as original research but a decent tour of existing concepts.
Response

Great, thanks.

Accessibility

  • Is the language clear and unambiguous?
    Mostly. I did some editing for plurals and use of articles in sentence construction but these were minor. Use of “e.g.” seems like a lot but makes sense when there are examples to be mentioned.
Response

Thanks for edits. I reviewed all of them, and agreed with all of them.

  • Are any diagrams misleading or incomplete?
    n/a
Response

NA

  • Is the work written such that a knowledgable generalist can understand it?
    Absolutely not. Knowledge of statistics, especially sampling statistics is necessary.
Response

Thanks, I agree. But I think it should be clear enough for reasonably statistically oriented readers.

  • Is the abstract/lead understandable to a general audience?
    Somewhat. It doesn’t mention that the effective sample is the size to be used when conducting statistical texts using weighted data.
Response

Thanks, I agree. I've added a new abstract (lead), that should be accessible for the general public. It also mentions the use of the design effect in sample size determination (but without going into details, so to keep it "light" enough).

  • 4th paragraph of intro change “quantifying the representative of a sample” to “quantifying the representativeness of a sample”.
Response

Thanks, I see you've already fixed it.

  • Does the lay summary (if included) capture the key points of the work while being understandable to a reader with only secondary school background?
    No. Reading level is 17.9 years, that would be >college.
Response

Good point! In the abstract I added I simplified the language to adjust it to a reader with secondary level background.

Some other points:

  • In text, “trace back” should be two words (as opposed to Python coding language which makes it all one word)
Response

Thanks, I see you've already fixed it.

  • Should “Formula” section be “Formulae”? … or “Formulas” if the Latin is to be ignored.
Response

Thanks. Changed to "Formulas".

  • First proof box under “Formula” needs to be lengthened to encompass the full length of the proof.
Response

Thanks. Done.

Second round of comments edit

 
reviewer-annotated pdf

Review by Charles DiSogra   , Freelance Consultant
These assessment comments were submitted on , and refer to this previous version of the article

In general - thanks a lot for your second round of feedback, it's helpful in making this manuscript better - much appreciated!

Item 1

Response

Deff smaller than 1 occurs when using stratified sampling with known a-priori knowledge (e.g.: stratum size, and variance within each stratum of the outcome of interest). This has the advantage of getting a sample with reduced variability, as compared with a SRS. I acknowledge that this is generally the more rare case for most practitioners, so I can add it to the text (although I don't have a citation for this statement, so I prefer to not state it in the abstract and leave it as is). I've added the following sentence to the text (second paragraph in the intro): "Intuitively we can get   when we have some a-priori knowledge we can exploit during the sampling process (which is somewhat rare). And, in contrast, we often get   when we need to compensate for some limitation in our ability to collect data (which is more common).". I also added the specific example of stratified sampling in which Deff is smaller than 1.

Item 2

Response

Linked to Deft definition. Also fixed study->studies, strata->stratum.

Item 3

Response

Non coverage could be corrected theoretically, but obviously not always. I've added more clarification on this (and mentioned that the weights might be “inadequate”).

Item 4

Response

Fixed: pick up -> answer. Also things around section “4. statistical adjustments”

Item 5

Response

Fixed: each strata -> each stratum. Regarding Ii, I clarified that it MAY be non-independent. It doesn't have to be, but the point I wanted to make is that EPSEM may or may not be independent (that it's about the marginal probability only).

Item 6

Response

stratum -> strata. Thanks for keeping an eye about the correct use of strata/stratum and data/datum. I went through and tried my best to fix all discrepancies.

Item 7

Response

"neff should correctly be Deff" - I think it should be Neff (N effective).

Item 8

Response

Regarding Kish's Deff formula - I added clarification. The point of that section ("Assumptions and proofs") is to show that Kish's Deff can be derived from a model based perspective, since it's a relatively clean proof (and it's indeed added in the paper).

Item 9

Response

Thanks for taking a look at the formulas for the two deffs (spencer and lee)

Peer review from editorial board member with more mathematical expertise edit

Comment 0 edit

Comment 0

Hi, I am the editorial board member who had a readthrough. I am pro acceptance, but there were a few smaller mistakes/unclarities which should be fixed or clarified, see list below:

Response 0

Thanks a LOT for your review. I appreciate all the mistakes you caught, and fixed all of them. Thank you so much! Talgalili (discusscontribs) 20:09, 26 March 2024 (UTC)Reply

Comment 1 edit

Comment 1

  • “When Deff>1, then the data collected is not as accurate as it could have been if people were picked randomly. On the other hand, if Deff<1, then the data is even more accurate than a simple random sample.”
If don’t think “accurate” is correct here, since the collected data in itself can’t be more or less accurate. It’s about the data in relation to the population. So I would suggest writing the following (which is more clunky, but also more precise)
  • “As a result, an analyst cannot estimate a with replacement variance for the numerator even if desired. The standard workaround is to compute a variance estimator as if the PSUs were selected with replacement.“
Is this correct, or should one of the “with replacement”s be “without replacement”?

Response 1

I agree that the word "accurate" is wrong here, since indeed the data is not more or less accurate, but rather the inference made with it. I would rather not use your proposed alternative since one of the requests in the template is that: "the lay summary (if included) capture the key points of the work while being understandable to a reader with only secondary school background?" So I worked to keep the text level of the abstract to be in a relatively basic level.

Instead, I propose to change the text to be:

“When Deff>1, then inference from the data collected is not as accurate as it could have been if people were picked randomly. On the other hand, if Deff<1, then the inference is even more accurate than it would have been if a simple random sample was used.”

What do you think? Talgalili (discusscontribs) 17:00, 24 March 2024 (UTC)Reply

Comment 2 edit

Comment 2

  • The two paragraphs starting with “When the sampling design is not known upfront” and “When the sampling design isn’t set in advance” seem to be essentially duplicates. I don’t have a clear favourite, but it seems like the top version might be the more recent one, so maybe that one should stay.

Response 2

Great catch, thanks! I've removed the first version and kept the second one.

Comment 3 edit

Comment 3

  • The author gives a formula for Kish’s design effect as
Deff = \frac{n \sum_{i=1}^n w_i^2}{(\sum_{i=1}^n w_i)^2}
and then another version where both parts of the fraction are divided by n^2
These are of course equivalent, but the two proofs that follow would be shorter and kind of nicer if the first version were used, instead of the second one.

Response 3

Fair point. In the definition, I'll keep both versions (as they are used interchangeably in the literature). However, I've now simplified both proofs so that they'll use the faster-to-get-to version of the formula. Thanks.


Comment 4 edit

Comment 4

  • The proof in section “Assumptions and proofs” has little numbers on top of every equal sign; the numbers 6 to 11 are not needed

Response 4

Fair. I've simplified it further.

Comment 5 edit

Comment 5

  • First paragraph of section on Spencer’s Deff says “Each item has a probability of p_k (k from 1 to N) to be drawn in a single draw”. Should that not be “M” (the population size) instead of “N”? If it is indeed N, then it should be defined what N is.

Response 5

Good catch, thanks. I've moved the notation in the whole section to use n and N (instead of mixing it with m and M).

Comment 6 edit

Comment 6

  • Still in the section about Spencer’s Deff, it says that “Only if the variance of y is much larger than the mean then the right-most term is close to 0.” That is true, but the content of the paranthesis that follows is wrong. It’s 1/relvar(y) that will be approximately 0, not relvar(y).

Response 6

Good catch, thanks! Fixed.

Comment 7 edit

Comment 7

  • I did not read for grammar, but noticed a few plural apostrophes.

Response 7

Thanks, fixed a couple of these.

Comment 8 edit

Comment 8

  • In the section on Unequal selection probabilities x Cluster sampling, the brackets in the denominator should go outside the sum.
--Mstefan (discusscontribs) 17:06, 11 March 2024 (UTC)Reply

Response 8

Good catch! Fixed.

Peer review and editing timeline notes edit

On 8 Dec, in an email the anonymous explicitly granted a a public domain license for his review above. They also said they had skimmed the revision and thought it looked ok.

On 7 Dec, in an email the reviewer Charles DiSogra explicitly granted a a public domain license for his review above. He said he hoped to look at the revisions in the next week.

On 6 Dec, I sent some writing comments (below) to the author, who responded by making the appropriate changes.

Minor writing comments Looks like the revision accidentally introduced some redundancy into the preprint, which now says “The term "Design effect" was coined by Leslie Kish in” in both the first and third paragraphs of the Introduction.

“it also matters if the design (e.g.: selection probabilities) are correlated with the outcome of interest”. “whether” is preferred over “if” in formal writing for this type of use of “if”. Also, there is a verb agreement problem, where “design” is singular but the verb (“are”) is plural. Also, I don’t fully understand the sentence, because “selection probabilities” hasn’t been introduced at this point in the article, and unfortunately it looks like it’s never explained properly even later, so I’m hoping you can fit a brief explanation of it in earlier.

“ a researcher might approximate the Deft with calculating the variance” - I think “by” calculating the variance is better.

“SRS with replacement (srswr)”. It’s highly unusual to not capitalize acronyms and, while I see that online some people don’t capitalize this acronym, I assume Wikipedia style is to capitalize all acronyms, so I think you should do that - similarly for srswor.

“Also, let it be combined with an estimator that rakes to totals for several demographic variables.” Did you really mean “rakes”? Only because I’m not familiar with that use of the word.

“some pairs of PSUs implying” - I think you should have a comma before “implying”.

“This is, in fact, the default choice in the software packages that will handle survey data—e.g., Stata, R survey package, and the SAS survey procedures”. I suggest you shorten this to “This is the default choice in software packages such as Stata, the R survey package, and the SAS survey procedures.”

I think that all of your headings that start with “Design effect” should probably say “The design effect”.

“For example, we might decide”. I think you can delete “for example” because the previous sentence already says that, plus the word “might” further implies it.

Where you wrote “enough of the bias”, can you reword , because unfortunately there is no indication of the criterion for enough, maybe you can change to “sufficient”?

At this point I started making some changes directly to the preprint, because I noticed you were happy when a reviewer did some of that. However, so far I haven’t gotten any further than the “Design effect depends on sampling design and statistical adjustments” section.

(The preceding unsigned comment was added by Aoholcombe (talkcontribs) 22:53, 21 December 2023‎ (UTC))Reply

After the author responded to the above and we both made some more minor edits to the pre-print, a second round of comments was received from Reviewer 2 the first week of January 2024. On 6 January 2024, I notified the author of them and asked him to respond.

In Feb / March 2024, the author responded to the comments of Reviewer 2 and he replied to say he was happy with the author's responses.

In late Mar 2024, an editorial board member, Mstefan, with more mathematical expertise, looked through the preprint and made a number of comments (above) that the author responded to.

Aoholcombe

Author further revision edit

Today I have finished reviewing the paper again. I fixed a typo and made tens of grammatical improvements. I also added four new summary tables for, hopefully, improved readability.

I'm ready for a final go-over by the relevant editors and (hopefully) an acceptance.

Talgalili (discusscontribs) 18:02, 2 April 2024 (UTC)Reply

Editor further comments edit

I made a few more wordsmithing edits after checking some of them with the author.

--Aoholcombe (discusscontribs) 20:07, 10 April 2024 (UTC)Reply

Return to "WikiJournal Preprints/Design effect" page.