github.com/dwrz/url-shortener

URL shortening service.


Install
go get github.com/dwrz/url-shortener

Documentation

This repository contains a simple URL shortening service.

Instructions

These instructions assume you have access to the following:

  • [ ] A UNIX-like operating system.
  • [ ] Docker
  • [ ] Go
  • [ ] Make

Depending on your setup, your docker commands will likely require root privileges.

If you know what you’re doing, you should be able to run this service with Go alone.

Setup

Clone the repository with git:

git clone github.com/dwrz/url-shortener

Setup a local MongoDB instance. The easiest way to spin this up is to run:

docker run -d -v /tmp/url-shortener/data:/data/db -p 27017-27019:27017-27019 --name mongodb mongo:4.2

This will start MongoDB 4.2 in a Docker container, with data persisted to /tmp/url-shortener.

  • You may specify another directory if you don’t wish to use /tmp.
  • You may specify another Mongo URI by setting the MONGO_URI environment variable before calling the service.
  • The service will use the ENV environment variable to specify which MongoDB database to use. By default, the service will create and use a development database.

Build

Calling make or make build from the root directory should build the service.

make
make build

Run

Ensure the mongodb container is running on Docker.

make run should start the service.

make run

By default the service will run using port 8080. This may be configured with the PORT environment variable.

Test

Run make test to run package tests and the test program.

make test

You can also test the service with the race detector enabled:

make test-race

Cleanup

Ctrl-C or issuing a SIGINT or SIGTERM to the service’s process should cause a graceful shutdown.

docker stop mongodb should stop the MongoDB container. docker rm mongodb should remove the stopped container. docker rmi mongo:4.2 should remove the MongoDB image.

You may want to manually remove the data in /tmp/url-shortener.

API

Status

The status endpoint is a simple health-check located at the / path. It should with an 200 OK status and no body. There are no error responses on this endpoint.

GET http://localhost:8080/
curl -i -XGET http\://localhost\:8080/

Create a Short URL

Make a POST request to the root path, with a Content-Type of application/x-www-form-urlencoded. The body of the request should have a url field, whose value should be a valid URL to be shortened.

The service should respond with a 201 Created status code. The body should contain the plain text short URL string.

POST http://localhost:8080/
Content-Type: application/x-www-form-urlencoded

url=http://trillionthtonne.org/
curl -i -H Content-Type\:\ application/x-www-form-urlencoded -XPOST http\://localhost\:8080/ -d url\=http\://trillionthtonne.org/

The service may respond with a 400 Bad Request status, and a body of invalid url, if a malformed URL is submitted.

It may return a 500 Internal Server Error status, and a body of server error, if the server encounters an error while generating the short URL, or persisting data to MongoDB.

Redirect

To retrieve a URL with a short URL, make a GET with the short URL as a path parameter:

GET http://localhost:8080/Hz2Et7JO
curl -i -XGET http\://localhost\:8080/Hz2Et7JO

The service may respond with a 404 Not Found status, and a body of not found, if no document for the short URL is found.

It may return a 500 Internal Server Error status, and a body of server error, if an error is encountered while persisting a short URL visit to the DB.

Stats

To retrieve statistics on visits to a short URL, make a GET with the short URL as a path parameter, followed by /stats. A JSON object is returned in the response body.

GET http://localhost:8080/r5eDKFBg/stats
curl -i -XGET http\://localhost\:8080/r5eDKFBg/stats

The service may respond with a 404 Not Found status, and a body of not found, if no document for the short URL is found.

It may return a 500 Internal Server Error status, and a body of server error, if an error is encountered while aggregating statistics for the short URL.

Background

Requirements

  • Build an HTTP-based RESTful API for managing short URLs and redirecting clients. The API must offer the following features:
    • Generate a short URL from a long URL.
    • Redirect a short URL to a long URL within 10ms.
    • List the number of times a short URL has been accessed:
      • In the last 24 hours.
      • In the last week.
      • All time.
  • No authentication is required.
  • No HTML or web UI is required.
  • Free choice of transport and serialization.
  • Anything unspecified is left to discretion.

Constraints

  • Short URLs
    • Are unique to one long URL. If an identical long URL is added twice, two short URLs should be generated.
    • Are permanent.
    • Are not easily discoverable; e.g., incrementing an existing short URL should have a low probability of yielding another working short URL.
  • The service:
    • Must support millions of URLs.
    • Must persist data.
    • Must be testable with curl.

Analysis and Assumptions

We need a sufficiently long short URL to allow for the creation of millions of URLs.

With a 62 character charset and a short URL of length 6, we get 5,680,0235,584 possible short URLs. With a length of 8, we get 218,340,105,584,896 permutations.

However, the birthday problem means that the possibility of a collision is more than 1 over 5,680,0235,584. At 1000 requests per hour, there is a 1% probability of collision every day. See: https://alex7kom.github.io/nano-nanoid-cc/?alphabet=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz&size=6&speed=1000&speedUnit=hour.

There is also tension between generating user friendly short URLs and limiting potential collisions. Reducing the length and character set of short URLs might be preferable for the user, but would result in more collisions. Vice-versa, using UUIDs would result in decreased collisions, but would present a poor user experience.

Assumptions:

  • A typical load of 1000 requests per hour.
  • A properly configured MongoDB database that responds within the context timeouts set in the service.
  • Emphasis on useability – i.e., don’t return UUIDs.

Un ouvrage n’est jamais achevé … mais abandonné.

A work is never completed, only abandoned.

– Paul Valéry, La Nouvelle Revue Française

These are some of the things I would improve if I had more time, or knowledge of the production context of this service:

  • Configuration
  • Deployment
  • Environment
  • Database
    • Index the short URL field.
  • Documentation
    • Both high level, and within the source.
  • Error Handling
  • Logging and Observability
  • Performance
    • Caching URLs; using Redis, especially to speed up retrieving long URLs for redirection.
    • Generating an adequate length short URL.
    • Merging the Find and Aggregation in the statistics endpoint.
  • Persisting Visits
    • This can be done asynchronously to the redirect.
  • Security
    • Authentication and private URLs.
    • Preventing recursive URLs.
    • Rate limiting.
  • Testing
    • Use an interface for DB operations, to use a mock DB.
    • Separate functionality in the test program into package level tests.
    • Test more edge cases.
  • Validation
    • Implement stricter URL validation.