Talk:WikiJournal of Science/Binary search algorithm
Lin, A; et al..
Suggestions by spontaneous reviewerEdit
- Add a summary or conclusion to the paper. e.g. Comparision of approaches PROs and CONs.
- Draw a conclusion/references to current topics in computer sciences, e.g. requirements and constraints for application of binary search on Big Data.
- Classify the paper - seems not to be a Original Research Paper (encyclopedic paper). Could be in general part of the header of any WikiJournal paper (incorporate in the template).
- What is the knowledge a reader will gain, if she/he is reading the paper. Just one sentence in Abstract and answers that sentence in the summary or conclusions of the paper.
First invited reviewEdit
Anonymous reviewer 1 ,
This review was submitted on , and refers to this previous version of the article
The article is well written and contains all relevant information about Binary Search algorithms, including variants and cases of applications.
An undergraduate or even less experienced student will find what is needed to implement it and decide when to use it.
I am a bit surprised to find the historical section toward the end of the article, instead of the begining, as is usually done.
Much of the history section depends on earlier content (e.g., specialized variations), so I believe it is best to put it near the end after the variations section.
Second invited reviewEdit
Anonymous reviewer 2 ,
This review was submitted on , and refers to this previous version of the article
"The present article is on binary search, an efficient and well known algorithm for searching for an element in an ordered list of such elements. The article states different versions of the algorithm and discusses their performance. Binary search is then compared to other data structures and algorithms for searching, and variants of binary search are listed. Finally, remarks on history, implementation, and library support are made.
I believe that the article reflects most of the important aspects of binary search; below I only have a few suggestions of what to add. In general, I believe the depth is good for an encyclopaedia article. As far as I can tell, the correct references have been put where necessary.
At this point, however, the article still has many issues that need to be resolved. First and foremost, there are some utterly wrong statements! Also, the reading flow isn’t as good as it could be, formulations aren’t perfect at all places. I am listing concrete comments below.
Remark: I am a theoretical computer scientist, therefore I don’t think I can fully judge the sections on implementation issues and library support (although they seem plausible). I also consider myself more an expert in algorithms than in data structures, so I may have missed a problem related to data structures as well.
1) You should separately speak about the running time of binary search and the number of comparisons made by it, because its running time depends on both the number of comparisons made and the time taken per comparison. Only the former is currently treated in this article. I suggest to only speak of comparisons made in the current form of the article. Then you can possibly add a paragraph as subsection of “Performance” on running time. This isn’t just some theoretical issue: Note that, for instance, to compare two arbitrary-decision integers (e.g., as in python) it takes linear time in their encoding lengths! There are many occurrences of this issue, I’m not listing them explicitly.
Added a running time and cache use section.
2) I do not see how the space complexity of binary search is O(1). You need to keep three pointers to elements, and the encoding length of this is O(log n)! (It should be highlighted that this is on top of the space taken up by the array.)
3) I am not happy with the readability of Section “Procedure”:
- 3a) I suggest to establish a one-to-one correspondence between the natural-language descriptions and the pseudocodes. (Now the details are a bit different, which may confuse the reader. So may go-to loops anyway.)
- 3b) For completeness, I would also add a pseudocode for the “Alternative procedure”.
- 3c) Instead of bloating the section with even more (almost identical) algorithm descriptions, I suggest to just say in “Duplicate elements” that the “alternative procedure” (add formal reference?) always returns the rightmost element. Then say which changes (in fact only line 4 and 5!) have to be made to make the procedure return the leftmost element.
I think readers are more likely to want to copy from the pseudocode directly, and since binary search has some tricky details, I believe it's best to have the two separate sections. Also, the reference for the rightmost claim is the history section of the TAOCP binary search chapter.
4) “This is equivalent to a binary search that has reduced to one element and always eliminates the smaller subarray out of the two in each iteration if they are not of equal size.” False. You may well eliminate the larger part but still reach the last level: Imagine an incomplete binary tree whose last level is more than half full and you’re looking for an element in the second half. Maybe just delete, because I cannot come up with a straightforward/useful equivalent statement.
I removed the statement.
1) With respect to later research, I believe the following paper is quite relevant (published in one of the two main conferences of theoretical computer science): Ehsan Emamjomeh-Zadeh, David Kempe, Vikrant Singhal. Deterministic and Probabilistic Binary Search in Graphs. In ACM Symposium on Theory of Computing (STOC), pp. 519–532, 2016. The paper considers the following generalization of binary search to (undirected connected) graphs: There is a target node that needs to be found. Upon querying a vertex, one learns either that the queried node is the target node or about an incident edge on a shortest path to the target node. The setting for standard binary search can be viewed as the special case in which the graph is just a path. Interestingly though, even on arbitrary graphs, there is a generalization of binary search for which O(log n) queries are sufficient to find the target node in the worst case!
I added a section on the graph generalization.
2) Binary search is often used to reduce a (usually discrete) optimization problem to its feasibility version. To start with, one needs to have some upper and lower bounds (L and U, resp.) on the optimal objective-function value. Given an oracle that decides if a certain objective function value is achievable, one can now run binary search on the integer interval [L..U] to find the optimal objective value. Although from my point of view this needs to be part of an article about binary search, textbooks do not seem to address it that explicitly either, and my view may well be biased. So I can hardly ask the author to include this in the article, but I will still put this comment in hope for them to do so; otherwise I will eventually add it to the Wikipedia article myself.
Here are some sources:
- 2a) from practice (or at least competitive programming): https://www.topcoder.com/community/competitive-programming/tutorials/binary-search/
- 2b) from theory of algorithms: http://www-math.mit.edu/~goemans/18433S09/ellipsoid.pdf
I think it is certainly an interesting application, although it is more of a specific application than a variation, so I don't think it would be appropriate for this general overview of the algorithm.
1) To include in the introduction that “implementing binary search correctly requires attention”, I would like to see more evidence that the predominant opinion is that this is the more the case with binary search than with any other implementations. I seriously doubt that this is the case.
I decided to remove it entirely from the lead.
2) “but the array must be sorted first” in introduction: Add something like “to be able to apply binary search”.
3) “binary search applies to a wider range of problems” in introduction: Justification? Otherwise I suggest removing it.
I added an example in the lead.
4) Introduction and Algorithm, “the middle element of the array”: “Middle element” is also ambiguous in natural language. I think it would be okay to say “an element in the middle of the array” in the introduction. In the description of the algorithm you can become more precise.
5) Duplicate elements, “The procedure may return any index whose element is equal to the target value”. This seems to suggest that the the problem description allows for returning any position of the target value. Your problem description (I believe first sentence in introduction) however seems to suggest that there are no duplicate elements (“that finds *the* position of a target value”). Maybe just change “the” for “a” in the problem description?
Changed the lead wording from "the position" to "a position".
6) Duplicate elements: Remove “if an element is duplicated in the array” since the statement is still true without the condition; rather replace it with “if such an element exists”, because without that condition, the statement is not true.
7) Approximate matches: In “whether the array contains keys matching those endpoints”, replace “keys” by “entries” as “keys” hasn’t been used before.
8) Performance, “reducing the procedure to a binary comparison tree”: I don’t think “reduce” is a good word here; what is a “binary comparison tree”? Maybe “viewing the run of the procedure on a binary tree”?
Done Used your proposed wording
9) Delete “This model represents binary search.” (Doesn’t say anything extra.) Probably the same holds for “This represents the successive elimination of elements.”
Done Removed the two superfluous sentences.
10) Say in the beginning of the second paragraph of the Performance section where the rounded-down log + 1 comes from. Similarly, I think it makes sense to explain the computation of the average-case analysis.
Added a short justification to the end of the worst case analysis. Similarly, I added an average case analysis.
11) Note a, “This happens”: Say clearer what “this” is.
I removed the sentence the footnote was attached to (as proposed above).
12) Does it make sense to structure the “Binary search versus other schemes” section from simple to complicated scheme?
I rearranged the order.
13) Note d: Clarify this refers to the access time.
14) Trees, “Although unlikely, the tree may be severely”: Clarify that does not apply to balanced binary search trees.
15) Uniform binary search, “Uniform binary search stores, instead of the lower and upper bounds, the index of the middle element and the change in the middle element from the current iteration to the next iteration”: These are two different types of storing. You should mention the latter one first and clarify that you are talking about a lookup table. The lookup table is computed before the algorithm is run! “Each step reduces the change by about half.” isn’t really necessary to say.
16) Uniform binary search, “Uniform binary search works on the basis that the difference between the index of middle element of the array and the left and right subarrays is the same.”: Formulate clearer.
This is already conveyed through the example, so I removed it.
17) Exponential search, “iterations of the exponential search”: Before it seemed that binary search was part of exponential search. Rather replace by “iterations between binary search is started”.
18) Interpolation search: Clarify that usually a linear interpolation function is used (as shown in the figure!); also the O(log log n) result holds for this case!
19) Interpolation search, “This is only possible if the array elements are numbers.”: I believe you should clarify in the beginning that linearly ordered objects other than numbers are allowed.
20) I believe the result for noisy comparisons (Note i) can also go in the main body as there are no results mentioned there.
21) Implementation issues, second paragraph: I believe the overflow issues you are talking about are the same ones as in the previous paragraph. This should also become evident from the way of writing.
22) History: I believe it should be mentioned that ordering items linearly to be able to search better is an old idea, for instance used for books in libraries. It was shown that this idea goes back at least to 300 BC. See also the discussion in The Art of Computer Programming, Vol. 3 (page 421 in 2nd Ed.)."
Added a paragraph to the history section on sorting items dating back to antiquity.
Comments by Guest editor: Michaël Thomazo (INRIA & ENS Paris)
According to the reviewers, the submission seems to be well adapted for an encyclopaedic article on binary search. I would be in favor of accepting the paper after taking into account the various comments, or justifying why it should not be the case.
@Sylvain Ribault:: I believe that I have addressed the comments in the above peer reviews, with the following notes:
- I have decided to keep both of the procedures under Duplicate elements in full explicit form as these variations may be trickier to implement if I described the changes required to convert one to the other.