String manipulation ++++++++++++++++++++++++ Text manipulation and working with strings is one of the most common things in programming. Let us start with some simple things. Invert letter case of string =============================== Swap case of each character in a given text. .. code-block:: python "THE quick brown fox, JUMPS ovEr the lazy dog".swapcase() To use it on terminal. .. code-block:: bash $ python -c \ "import sys; print(sys.argv[1].swapcase())" \ "THE quick brown fox, JUMPS ovEr the lazy dog" the QUICK BROWN FOX, jumps OVeR THE LAZY DOG Rot-13 a String ==================== From wikipedia: .. note: ROT13 ("rotate by 13 places", sometimes hyphenated ROT-13) is a simple letter substitution cipher that replaces a letter with the 13th letter after it, in the alphabet. We will look at two ways of doing it. .. code-block:: python from string import ascii_uppercase as upr, ascii_lowercase as lwr def rot13(txt): map = dict(list(zip(upr, upr[13:]+upr[:13]))+list((zip(lwr, lwr[13:]+lwr[:13])))) return "".join([map[el] for el in txt]) There is a lot going on here, so let's break it down. .. code-block:: python from string import ascii_uppercase as upr, ascii_lowercase as lwr Standard python imports, not much to see here. .. code-block:: python list(zip(upr, upr[13:]+upr[:13])) This gives a us a list of two-tuples .. code-block [('A', 'N'), ('B', 'O'), ('C', 'P'), ... ] We do the same thing for lower case letters. .. code-block:: python map = dict(list(zip(upr, upr[13:]+upr[:13]))+list((zip(lwr, lwr[13:]+lwr[:13])))) Now we lookup the substitution letter from the map and switch join the together to get the Rot-13 word. .. code-block:: python return "".join([map[el] for el in txt]) Alternatively, we can go full "Betteries Included", and using the codecs module do :code:`codecs.encode(txt, 'rot_13')` .. code-block:: python import codecs def rot13(txt): return codecs.encode(txt, 'rot_13') left pad ======== Left pad allow you to specify minimum length to your string and a fill char to pad with to enforce that minimum limit. This is easy to do using the `rjust` (right justify) methods on all strings. .. code-block:: python def left_pad(txt, count, fill=' '): return txt.rjust(count, fill) .. code-block:: bash $ python -c "import sys;print(sys.argv[1].rjust(int(sys.argv[2]), sys.argv[3]))" foobar 60 → →→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→foobar Speaking in ubbi dubbi ================================ Ubbi dubbi is a language game spoken with the English language, Ubbi dubbi works by adding -ub- before each vowel sound in a syllable. You can read about ubbi dubbi at: https://en.wikipedia.org/wiki/Ubbi_dubbi This was recnetly popularised in "the Big bang Theory" https://www.youtube.com/watch?v=rfR03gibh6Ms. Let's look at how we would do it with Python. .. code-block:: python vowels = "aeiou" vowels_dict = {i: f"ub{i}" for i in "aeiou"} def ubbi_dubbi(txt): return txt.lower().translate(str.maketrans(vowels_dict)) How are we doing it? We first generate a mapping of vowels to their ubbu-dubbi form. .. code-block:: python vowels = "aeiou" vowels_dict = {i: f"ub{i}" for i in "aeiou"} We then use :code:`str.maketrans(vowels_dict)` to generate the translation table, then use :code:`txt.lower().translate` to generate the ubbu-dubbi. Let's see the function in action. .. code-block:: bash In [4]: ubbi_dubbi("Subaru") Out[4]: 'subububarubu' In [5]: ubbi_dubbi("Speak") Out[5]: 'spubeubak' In [6]: ubbi_dubbi("Hubba Bubba bubblegum") Out[6]: 'hububbuba bububbuba bububblubegubum' Pig latin ================ Pig Latin is a language game in which words in English are altered, usually by adding a fabricated suffix. The reference to Latin is a deliberate misnomer. The rules are simple - For words that begin with consonant sounds, all letters before the initial vowel are placed at the end of the word sequence. Then, "ay" is added, - When words begin with consonant clusters (multiple consonants that form one sound), the whole sound is added to the end.Then, "ay" is added, - For words that begin with vowel sounds, adds "way" to the end .. code-block:: python vwls=set('aeiou') def pig(wd): if len(wd)<2 or len(vwls&set(wd))==0:return f"{wd}way" elif wd[0] in vwls:return f"{wd}ay" else: x = min(wd.find(v) for v in vwls if v in wd);return f"{wd[x:]}{wd[:x]}way" def pig_ltn(txt): return " ".join(pig(e) for e in txt.lower().split()) There is a lot going on here, and to fit the code in 280 chars the variable names are very short. Lets break it down: .. code-block:: python vwls=set('aeiou') We are creatiing a set of vowels. Then we use it to apply the 3 rules described above. .. code-block:: python def pig(wd): if len(wd)<2 or len(vwls&set(wd))==0:return f"{wd}way" elif wd[0] in vwls:return f"{wd}ay" else: x = min(wd.find(v) for v in vwls if v in wd);return f"{wd[x:]}{wd[:x]}way" There are few interesting things we are doing. - :code:`len(vwls&set(wd))==0` This finds if there are no vowels in the word. - :code:`len(vwls&set(wd))==0` This finds the first instance of a vowel, and :code:`f"{wd[x:]}{wd[:x]}way"` splices the consonants from the beginning to the end, then adds way. Convert to leetspeak ======================== Leetspeak works by replacing certain letters. So we will use the same technique as rot13, and use :code:`.translate` .. code-block:: python leet_dict = dict(zip("aeilot", "431|07")) def leet(txt): return txt.lower().translate(str.maketrans(leet_dict)) convert repeated spaces to one space ==================================== .. code-block:: python s = 'this sentence has non-uniform spaces' print(' '.join(s.split())) The above snippet clears out the repeated spaces in a text and replaces it with single space. Check if a string is a valid IP v4 address ======================================================================== To find valid addresses, we can use :code:`ipaddress.IPv4Addres` which fails, if the strings can't be parsed ad an IP address. .. code-block:: python def ipv4_check(ip): try: ipaddress.IPv4Address(ip) return True except ipaddress.AddressValueError: return False Or if you want only traditionally formatted ip addresses. .. code-block:: python def ipv4_check(ip): try: chunks = str(ip).split(".") return all(0<=int(chunk)<=255 for chunk in chunks) and len(chunks) == 4 except ValueError: return False In here, - we split the IP address on :code:`.`, - then :code:`all(int(chunk)<255 for chunk in chunks) and len(chunks) == 4` checks if every chunk is an integer less than 255 and there are 4 chunks. Check if a string is a valid IP v6 address ======================================================================== We use the same technique as above, but use :code:`ipaddress.IPv6Addres` .. code-block:: python def ipv6_check(ip): try: ipaddress.IPv6Address(ip) return True except ipaddress.AddressValueError: return False .. code-block:: bash In [32]: ipv6_check('2001:0db8:85a3:0000:0000:8a2e:0370:7334') Out[32]: True In [33]: ipv6_check('2001:0db8:85a3:0000:0000:8a2e:0370:733455') Out[33]: False Or if you want only traditionally formatted ip addresses. .. code-block:: python def ipv6_check(ip): try: chunks = str(ip).split(":") return all(int(chunk, 16)<16**4 for chunk in chunks) and len(chunks) == 8 except ValueError: return False Again we use the same technique as IPv4, - splitting the address on :code:`:` - Ensuring each chunk is parsable as a hexadecimal string, and less than :code:`16**4`. Check if string is palindrome ============================== A palindrome is a word, number, or other sequence of characters which reads the same backward as forward. .. code-block:: python def is_palindrome(txt): return txt == txt[::-1] Python's extended slicing syntax :code:`[::-1]` returns the reverse of a given string or an iterable. By comparing the two, we find out if a string is a palindrome. Find all valid anagrams of a word ======================================= To generate all valid anagrams, - use :code:`itertools.permutations` to genrate the permutations - Use :code:`words=set(open('/usr/share/dict/words').read().split())` to get a woldlist - Check if the existing permutation exists and then we will return a set. .. code-block:: python import itertools words=set(open('/usr/share/dict/words').read().split()) def anagrams(txt): return set(["".join(perm) for perm in itertools.permutations(txt.lower()) if "".join(perm) in words]) .. code-block:: ipython In [5]: anagrams('hello') Out[5]: {'hello'} In [6]: anagrams('rat') Out[6]: {'art', 'rat', 'tar'} In [7]: anagrams('post') Out[7]: {'opts', 'post', 'pots', 'spot', 'stop', 'tops'}