How to generate resource IDs

Resource IDs are strings used to uniquely identify an item such as any social media post, comment, playlist, tweet and video.

They allow software to distinguish between each item, even those with the same user-specified names/titles.

It is like a number plate of each ressource, and included in the URL.

Current examples

edit

One can take a look at how existing websites generate resource identifiers:

Website name Character count Character range Combinations per character Total combinations
Twitter tweet IDs ≤19 [0-9] (numeric) 10 1019
YouTube video IDs 11 [-_0-9A-Za-z] 64 73.786.976.294.838.206.464 (73.7×1018)
Dailymotion public video IDs ≤8 [0-9a-z] 36 2.821.109.907.456 (2.82×1012)
Archive.Today
  • 4 (early 2012)
  • 5 (late 2012 - today)
  • [0-9A-Za-z] 62 930.909.168 (62⁴+62⁵)

    How to do it correctly

    edit

    Avoid dash characters

    edit

    A dash character (“-”) is considered a separation character.

    In order to make it more convenient for users to highlight strings, avoid using any dashes in IDs.

    In addition, dashes impede cursor navigation using ctrl + and ctrl + .

    Test highlighting here (double-click on desktop, hold on mobile):

    • 4gSOMba1UdM (none)
    • 4gSOM-a1UdM (with dash)
    • 4gSOM_a1UdM (with underscore)

    Some browsers (e.g. Mozilla Firefox) and text editors might also treat an underscore as a separation character.

    It is recommended to only use numbers and/or letters to avoid these problems.

    Case-insensitive

    edit

    It is recommended for random identifiers that include alphabetic characters to be case-insensitive, because there is no effective benefit in making the ID case-sensitive.

    In addition, case-insensitive resource identifier strings are much more pronounceable and facilitates writing it down to paper.

    Confusing characters

    edit

    The characters “i”, “I”, “l” and “j” look too similar to each other in many font types.

    If possible, they should be excluded from the identifier: [^jiIl].

    Combinations

    edit

    When applying these recommendations listed above, one character now has 33 different possibilities.

    One character of a YouTube video ID currently has 64 possibilities (also known as “Base64”; not to be confused with Base64 encoding.), but adding just two characters of length to a Base33 video ID (i.e. 13 instead of 11) does already nearly compensate for the restricted number of possible ressource ID strings.
    An addition of just three characters (i.e. 14 instead of 11) would already overcompensate it by far while adding convenience in total.

    • 64¹¹= 73.786.976.294.838.206.464 (current)
    • 33¹⁰= 1.531.578.985.264.449 (reference)
    • 33¹¹= 50.542.106.513.726.817 (reference)
    • 33¹²= 1.667.889.514.952.984.961 (reference)
    • 33¹³= 55.040.353.993.448.503.713 (reference)
    • 33¹⁴= 1.816.331.681.783.800.622.529 (reference)

    In addition, even 3310 would already offer an abundant number of combinations.

    Calculations

    edit

    Assuming that YouTube will reach 20.000.000.000 unique videos that have ever been uploaded in near future, that is still 1/2.527.105 of 33¹¹.
    Even 3310 is still abundant, because only 1/76578 of the possible combinations would have been occupied.

    Also, the 20 billion video count is a future projection. YouTube is not close to it at the moment.

    3314 would surpass the total number of combinations of 6411 by more than 24 times.


    YouTube's comment IDs, comment reply IDs (formerly only numbers) and playlist IDs also already have an eternally abundant length (>30 characters).

    Should those limits ever be attained, hypothetically, one can always add one more character of length to the resource ID string.

    edit