Contact me with any typos. Individual Homework 2 File to submit to T-Square: HW02.py This is an INDIVIDUAL assignment! Collaboration at a reasonable level will not result in substantially similar code. Students may only collaborate with fellow students currently taking CS 2316 and the lecturer. Collaboration means talking through problems, assisting with debugging, explaining a concept, etc. You should not exchange code or write code for others. For Help: - Piazza (but do not post code!) - Collaborate with others in the class, but do not exchange code! Notes: - Don't forget to include the required comments and collaboration statement (as outlined in the syllabus) - Don't wait until the last minute. Coding takes time and is usually filled with unexpected issues. Quality of code: Be sure to give yourself enough time so that even once you have code that works, rethink and refine your code to make it more elegant, more efficient, etc. Rarely is the first design the best design. Also be sure to write small amounts of code and test it before adding more. You'll end up with fewer and less complicated issues to fix if you embrace step-wise refinement. Good design: ABSTRACTION! Be sure to use good abstraction within your code. Only the most trivial of problems should be solved by writing one function. Both of these HW problems should result in several small functions and a main driver to accomplish the goal. 1. Copy from the course website the saving and loading functions that utilized the pickle module. (Or write your own that use pickle.) Integrate calls to those functions into your foreign dictionary translation program. Call the main function translate. It has one optional parameter - the filename to load that already has a pickled translation dictionary in it. Use the default parameter value feature of Python to accomplish this. Use the default filename "FrenchDictionary.pickle". The first thing your program should do is try to open the pickle file. If that fails, then let the user know that the file does not exist and prompt the user asking if they want create a new dictionary or provide a new filename or quit. If the new filename also fails, then keep asking them giving those same options. (If you know how to use a fancy file chooser, feel free to use it within your prompting of the user. Be sure the same logic applies, in that the user could opt to quit or if the file still doesn't open for some reason, the prompting continues.) Ask the user for something to translate, and then use the dictionary to perform a rough translation of the words. Display the complete translation to the user. Keep repeating this until the user wants to quit. As we did in class, if a word is not in the dictionary, then prompt the user for the word's translation. a) if the user does not know the translation, then let the English word slide through into the translated text, and do not add the word to the dictionary. (We want only English to French translations in the dictionary.) b) if the user does know a translation, then trust that the user is right and add that to the dictionary and use that word translation to translate the English to French. (Basically pretend this thing is crowd sourced and later there are ways to down vote a bad translation, so we'll just trust it, use it, add it, for now.) So what kind of input should we expect (and handle) from the user? First off, use looping so that the user can perform numerous (0 to infinite) translations until they want to quit. They should be able to quit immediately without doing any translations. Make code fancier a step at a time adding in ability to handle these: a) a word as input b) a phrase as input c) a sentence with no punctuation as input d) a sentence with punctuation - including commas, periods, etc. e) numbers (like "42") should just slide through to the translation f) looping using numerous entries of any and all of the above No matter what the user types in, your translation can be all lowercase. The user can type in any case they want - uppercase, lowercase, mixed case, capitalized, etc. You can also assume there will always be a one word to one word mapping - which is not really guaranteed to be true in real life. Once the user wants to quit, write out the pickled dictionary using the default name or whatever the user may have specified. (This should work even if the dictionary is empty.) (In the beginning just as we did in class, you may want to just hard code a starting dictionary within your function and remove that later. While the dictionary is small, I recommend printing it out constantly so you can see every modification happening while you work on your code. print(dictionary) should suffice for this.) HINTS: pickle module string module str class >>> dir(str) >>> import string >>> dir(string) >>> dir(dict) CODING REMINDERS: 1) Use good abstraction. If well designed, your solution will consist of many functions, not just one huge one. This helps with testing and development since each function will do just a little bit of the logic. Every function should be extremely short. 2) You are required to use the pickle module. 3) Tricky bit - the filename (default or user provided) is prone to not really exist. Use try/except to handle this. 4) The default filename is to be provided in your code as a parameter having the default filename. BONUS CHALLENGES: (these are completely optional, but worth bonus) 1) CHALLENGE LEVEL SILVER: Make your saving to the file safer by having your code realize when the file already exists before saving. Ask the user if they want to replace it or instead provide a different filename. (Of course if that alternative filename already exists, repeat until they agree to replace it or a filename for a non-exisiting file is given or they decide to quit without saving.) HINT: os module. 2) CHALLENGE LEVEL GOLD: Have an option to display in alphabetic order the English/French word pairings. (Alphabetized by the English word.) This should display nicely as one English word and one French word per line. (With no dictionary, list, tuple, etc. notation mixed in.) 3) CHALLENGE LEVEL PLATINUM: (yes, higher than gold) Handling upper and lowercase is a mess. Add in features so that words that are capitalized remain capitalized. (Capitalized means the first letter is uppercase.) Words that are in all uppercase, remain in all uppercase. Words that are in all lowercase remain all lowercase. HINTS: dir(str) & import string & dir(string) are your friends. 2. Using the Alice in Wonderland text file provided, you are to calculate the frequency of each word used in the book. Call your function wordFrequency. For flexibility, it has two default parameters. - the first one the book filename (default to "AlicesAdventuresInWonderland.txt") - the second one the csv filename to save the frequency as csv (default to "AlicesAdventuresInWonderlandWordFrequency.csv". Use a dictionary as your data structure to accumulate the frequencies where the key:value pairs are word:count. After processing the frequencies, your function writes out the word and frequency pairs sorted alphabetically by word to a proper csv file named using the csv filename. The format of the csv file: The csv file will have a header labeling the columns as WORD and COUNT. The format follows the standard csv format (like we used in class). (csv files are standard ascii text files, so you can open using a simple editor to see what's in there, as well as load the file into excel to be sure things look good.) SPECIFICS: 1) The book file is a normal book with upper and lowercase letters, lots of punctuation, a preamble, a postamble, etc. 2) For the purposes of this assignment: a) make all words lowercase. So even "Alice" will be recorded as "alice". b) lose all the punctuation and do not count it. c) if any numbers happen to appear the in the text, go ahead and count those as words. Certainly II, III, IV, etc. appear at least in the chapter titles and can/should be counted as words. (And yes, all the words (like "chapter") in the chapter headings will count.) 3) Go ahead and include counting of words, numbers, etc. found within the entire file. (This includes the preamble, postamble, etc.) BONUS CHALLENGES: 1) CHALLENGE LEVEL SILVER: Change your function so it takes only one filename. The second filename is automatically formed by dropping the extension of the first filename and appending "WordFrequency.csv" as the new ending. 2) CHALLENGE LEVEL GOLD: The file actually has both a preamble and huge postamble discussing the fact that this is from the Project Gutenberg. Have your code drop those chunks (smartly) so that those parts do not figure into your word frequencies. By "smartly" I mean in the end your code should be smart enough to trim off these parts when presented with other like-formatted Gutenberg files. (Searching should be involved, and not just something like skip the first 10 lines.) This feature does not alter the physical file - that will be the same as it ever was. 3) CHALLENGE LEVEL PLATINUM: Figure out a way to deal with the single quote ' when it is used within a contraction. Things such as "Alice's" and "I'll" will be broken into "alice", "s", "i", "ll" unless this is avoided somehow. It'd be nice if "Alice's", "I'll", "shan't", etc. were dealt with as is, keeping the single quote and without the single quote splitting them. NOTE: Lots of other single quotes appear in the text though which should act just like normal punctuation.