Fuzzy Date Parser
Overview
Create a class that allows you to configure certain items like locale and start of week values and then pass in a string and get back a dictionary of the original string, the parsed date if any and any issues found during parsing.
The really hard part of identifying what text to parse inside of a string of text is not the scope of this project, but rather if given a text item that contains any of the following, parse it:
- 5 minutes from now
- 5 min from now
- 5m from now
- in 5 minutes
- 5 min
- 5 min before now
- 5 min before next week
- 5 hours from noon
- 5 hours before noon
- in 2 weeks
- 7 days before now
- next day
- tomorrow
- yesterday
- next week
- last week
Heikki suggested:
- every two weeks
- every other day
- on even days
- third Friday of every other month except December
Number parsing - the code should be able to parse "five minutes" the same as "5 minutes"
The code should also parse most known date formats if also given.
The library would use a combination of regular expressions, keywords and state tables to chunk the given text into it's various date related peices and then extract either day/month/year information or day/month/year delta information. Once this is known, the source date can have the delta information applied to it and you have your desired date.
Task List
- Convert current single-py-file tests to unit tests
- Run code thru py-lint
- Add error handling code
- allow soft errors to return partial dates and a flag
- ensure hard errors are flagged and return no date information
- Add fixed date handling
- Add code to sense numbers and replace them with the proper numbers: five -> 5, twenty five -> 25
- Move code to new repository and create distutils/setuptools packaging
- Documentation
- Add locale handling
- make sure timezones are honored
- ensure default locale info can be provided to fill in the unknown values during parsing
- Add recurrence parsing