eMail address, IP address (v4/v6), FQDN (domain, hostname) validation utilities
ββββββββββββββββββββββββββββββββββ β eMail address parser/validator β ββββββββββββββββββββββββββββββββββ The purpose of this library is primarily to enable users to verify eMail addresses, which cannot be done with a regular expression. More eMail-related parsers, validators, emitters, etc. later. Validation of IP addresses (IPv6, Legacy IPv4) and FQDNs (hostnames, domains) is provided as well. Release note ββββββββββββ This early release of this library had a tight deadline. As such, it only parses address/mailbox lists and their dependents and can validate, using eMail rules, localparts and domains (IP address or FQDN, also usable separately). Installation ββββββββββββ Add a suitable dependency to your project, for example with Maven: <dependency> <groupId>org.evolvis.tartools</groupId> <artifactId>rfc822</artifactId> <version>0.8</version> </dependency> Or download releases manually from Maven Central: https://repo1.maven.org/maven2/org/evolvis/tartools/rfc822/ Building the library yourself is also possible, of course. The source code may be retrieved (via anonymous read-only git access) or inspected (per gitweb) at the Evolvis repository; a mirror on another platform is also available: https://github.com/qvest-digital/rfc822 Make sure your project can handle Java 8 bytecode. Usage βββββ To validate eMail addresses first get a parser instance: final String address = "user <localpart@domain>"; final Path p = Path.of(address); If address was null or too long (weβre generous here), this will return null. Otherwise, call the parser object: import lombok.val; val mailbox = p.forSender(false); // Path.Address val address = p.forSender(true); // Path.Address val mbxList = p.asMailboxList(); // Path.AddressList val adrList = p.asAddressList(); // Path.AddressList val adrspec = p.asAddrSpec(); // Path.AddrSpec (Using βvalβ means the protected type of the result can be stored. This is not necessary and the return values are of the (public) Path.ParserResult interface, which works.) The return value is null if the address cannot be parsed. The first thing to do now is, with To: for example, to weed out parsable but invalid input (e.g. bad domain or IP or too long): if (!adrList.isValid()) { LOG.error("invalid recipients: {}", adrList.invalidsToString()); return null; } Afterwards, decide what you wish to do with the result. Mostly, youβll intend to send out eMails, which needs just addresses: return adrList.flattenAddrSpecs(); // List<String> It is also possible to validate hostnames or domainsβ¦ final String hostname = "foo.example.com"; final boolean ok = FQDN.isDomain(hostname); β¦ and IP addresses, both IP and Legacy IP (a.k.a. IPv4): final String ip = "2001:db8::1"; final InetAddress ia = IPAddress.v6(ip); // or .v4(ip) β¦ checking both kinds of IPAddress in one go: final InetAddress ia = IPAddress.from(ip); In all cases ia will not be null for valid addresses. There are other useful methods on the resulting object; toString() especially and introspection of the various parts of a mail path. CLI (command-line interface) utility ββββββββββββββββββββββββββββββββββββ This project ships an executable JAR, offering access to most validation functionality from the command line. (The run.sh launcher can be extracted with jar or unzip then placed in the same directory as the JAR. It can be used from PATH.) On the other hand, the downloads available from the Maven Central repo at https://repo1.maven.org/maven2/org/evolvis/tartools/rfc822/ only offer the .jar file anyway, so maybe use thatβ¦ Usage: $ java -jar rfc822-0.8.jar [-lax] [--] [input β¦] or $ ./run.sh [-lax] [--] [input β¦] This will show (in a colourful format intended for human conβ sumption only!) for each argument which productions match it. This command exits with errorlevel 40. A double-dash argument separator before the first input argument is mandatory if the first input begins with a hyphen-minus. If no βinputβ is preβ sent an interactive REPL is started, prompting if stdin isnβt redirected (dΜ²oΜ²nΜ²βΜ²tΜ² redirect stdout/stderr!), exiting on end of input (^D on Unix, ^Z on DOS). The -lax option enables the βUXβ (more user-friendly) parser, which allows some more constructs on input but converts those to their respective standard forms on output, for all parsing regarding eMail addresses (Path). In lax mode, all inputs are additionally whitespace-trimmed on both sides (no matter what parse mode is selected). $ java -jar rfc822-0.8.jar -h or $ ./run.sh -h Show short usage info on stderr and exit nonzero. $ java -jar rfc822-0.8.jar [-lax] -TYPE [--] input or $ ./run.sh [-lax] -TYPE [--] input Validates the one given βinputβ for the specified TYPE. Possible TYPEs are: -addrspec (e.g. foo@example.com) -mailbox (e.g. Foo Bar <foo@example.com>) -address (mailbox or group, e.g. Label:foo@example.com,bar@example.com;) -mailboxlist mailbox ["," mailbox]* -addresslist address ["," address]* -domain FQDN (hostname or domain name) -ipv4 Legacy IP address, dotted-quad, e.g. 192.0.2.1 -ipv6 IP address (without scope), e.g. 2001:db8::1 If input was valid, exits with errorlevel 0; no other codepath (other than -extract if all inputs were valid) exits 0 so this is a reliable validator. Additionally, a (somewhat, subject to JRE details; for example, a v4-mapped IPv6 address is rendered as IPv4 address instead) canonical representation of the input followed by a newline will be written to stdout on success. If input was invalid this exits with errorlevel 43 for domain, ipv4, ipv6; with errorlevel 41 (cannot be parsed) or 42 (fails post-parsing validation) for the others (eMail-related types); nothing is printed on standard output in this case. $ java -jar rfc822-0.8.jar [-lax] -extract [--] input β¦ Validates each given βinputβ item as address-list. Diagnostics will be shown on stderr for invalid inputs. The exit status is 45 if no valid inputs were provided; 0 if all input given were valid; 44 otherwise (both invalid and valid input presented). Additionally, one corresponding line will be written to stdout for every input: an empty line if invalid; the addr-spec items present in the input, separated by comma and space, otherwise. This facilitates matching between input and output and sending eMails to recipients based on header values. This allows users to check either stdout, stderr or the errorlevel independently from each other, handling the result in various automated ways or by manual stderr inspection (characters not printable ASCII are backslash-escaped). Limitations βββββββββββ This API checks for both RFC5322 and RFC5321 compliance, with several others (DNS) influencing. Itβs intended to be used by eMail senders towards public internet which limits, among the allowed lengths and characters, features: β’ IPv6 Zone identifiers (ff02::1%vr0) arenβt valid as they are strictly host-specific β’ General-address-literal isnβt permitted, as (other than IPv6 addresses) there currently isnβt any usable tag β’ display-name must be valid ASCII dot-atom, or quoted-string (but no tabs), syntax Future directions βββββββββββββββββ Pass more of the structure up (comments, for example). Improve UXAddress (the βDWIM modeβ Path parser) to handle more input varieties users will enter into webforms such as: Es βΓβ Zett <sz@example.com>; Shaun D. MΓ€h <sheep@example.com> James T. Kirk, III. <jim@example.com> Currently, the following deviations are accepted: β’ <hal@ai.> (trailing dot) β’ use of semicolon (β;β) as mailbox-list separator The following further ones are planned: β’ accept and encode extra punctuation for display names (jim) β’ accept, but drop at first, nΕn-ASCII display names β’ later MIME-encode them β’ new βlocal uses allowedβ mode that supports link-local IPs For more thoughβ¦ Sorry, no timeβ¦ More patches, improvements, tests, etc. are of course welcome!