Build \'For you\' recommendations using AI on Fastly!-JS Tutorial-php.cn

Forget the hype; where is AI delivering real value? Let's use edge computing to harness the power of AI and make smarter user experiences that are also fast, safe and reliable.

Recommendations are everywhere, and everyone knows that making web experiences more personalized makes them more engaging and successful. My Amazon homepage knows that I like home furnishings, kitchenware and right now, summer clothing:

Build

Today, most platforms make you choose between either being fast or being personalized. At Fastly, we think you — and your users — deserve to have both. If every time your web server generates a page, it is only suitable for one end user, you can't benefit from caching it, which is what edge networks like Fastly do well.

So how can you benefit from edge caching, and yet make content personalized? We've written a lot before about how to break up complex client requests into multiple smaller, cacheable backend requests, and you'll find tutorials, code examples and demos in the personalization topic on our developer hub.

But what if you want to go further and generate the personalisation data at the edge? The "edge" - the Fastly servers handling your website's traffic, are the closest point to the end user that's still within your control. A great place to produce content that's specific to one user.

The "For you" use case

Product recommendations are inherently transient, specific to an individual user and likely to change frequently. But they also don't need to persist - we don't typically need to know what we've recommended to each person, only whether a particular algorithm achieves better conversion than another. Some recommendation algorithms need access to a large amount of state data, like what users are most similar to you and their purchase or rating history, but often that data is easy to pregenerate in bulk.

Basically, generating recommendations usually doesn't create a transaction, doesn't need any locks in your data store, and makes use of input data that's either immediately available from the current user's session, or created in an offline build process.

Sounds like we can generate recommendations at the edge!

A real world example

Let's take a look at the website of the New York Metropolitan Museum of Art:

Build

Each of the 500,000 or so objects in the Met's collection has a page with a picture and information about it. It also has this list of related objects:

Build

This seems to use a fairly straightforward system of faceting to generate these relationships, showing me other artworks by the same artist, or other objects in the same wing of the museum, or which are also made of paper or originate in the same time period.

The nice thing about this system (from a developer perspective!) is that since it's only based on the one input object, it can be pre-generated into the page.

What if we want to augment this with a selection of recommendations that are based on the end user's personal browsing history as they navigate around the Met's website, not just based on this one object?

Adding personalized recommendations

There's lots of ways we can do this, but I wanted to try using a language model, since AI is happening right now, and it's really different from the way the Met's existing related artworks mechanism seems to work. Here's the plan:

Download the Met's open access collection dataset.
Run it through a language model to create vector embeddings – lists of numbers suitable for machine learning tasks.
Build a performant similarity search engine for the resulting half a million vectors (representing the Met’s artworks) and load it into KV store so we can use it from Fastly Compute.

Once we've done all that, we should be able to, as you browse the Met's website:

Track the artworks you visit in a cookie.
Look up the vectors corresponding to those artworks.
Calculate an average vector representing your browsing interests.
Plug that into our similarity search engine to find the most similar artworks.
Load details about those artworks from the Met's Object API and augment the page with personalized recommendations.

Et voilà, personalized recommendations:

Build

OK, so let's break that down.

Creating the dataset

The Met's raw dataset is a CSV with lots of columns and looks like this:

Object Number,Is Highlight,Is Timeline Work,Is Public Domain,Object ID,Gallery Number,Department,AccessionYear,Object Name,Title,Culture,Period,Dynasty,Reign,Portfolio,Constituent ID,Artist Role,Artist Prefix,Artist Display Name,Artist Display Bio,Artist Suffix,Artist Alpha Sort,Artist Nationality,Artist Begin Date,Artist End Date,Artist Gender,Artist ULAN URL,Artist Wikidata URL,Object Date,Object Begin Date,Object End Date,Medium,Dimensions,Credit Line,Geography Type,City,State,County,Country,Region,Subregion,Locale,Locus,Excavation,River,Classification,Rights and Reproduction,Link Resource,Object Wikidata URL,Metadata Date,Repository,Tags,Tags AAT URL,Tags Wikidata URL
1979.486.1,False,False,False,1,,The American Wing,1979,Coin,One-dollar Liberty Head Coin,,,,,,16429,Maker," ",James Barton Longacre,"American, Delaware County, Pennsylvania 1794–1869 Philadelphia, Pennsylvania"," ","Longacre, James Barton",American,1794      ,1869      ,,http://vocab.getty.edu/page/ulan/500011409,https://www.wikidata.org/wiki/Q3806459,1853,1853,1853,Gold,Dimensions unavailable,"Gift of Heinz L. Stoppelmann, 1979",,,,,,,,,,,,,,http://www.metmuseum.org/art/collection/search/1,,,"Metropolitan Museum of Art, New York, NY",,,
1980.264.5,False,False,False,2,,The American Wing,1980,Coin,Ten-dollar Liberty Head Coin,,,,,,107,Maker," ",Christian Gobrecht,1785–1844," ","Gobrecht, Christian",American,1785      ,1844      ,,http://vocab.getty.edu/page/ulan/500077295,https://www.wikidata.org/wiki/Q5109648,1901,1901,1901,Gold,Dimensions unavailable,"Gift of Heinz L. Stoppelmann, 1980",,,,,,,,,,,,,,http://www.metmuseum.org/art/collection/search/2,,,"Metropolitan Museum of Art, New York, NY",,,

Copy after login

Simple enough to transform that into two columns, an ID and a string:

id,description
1,"One-dollar Liberty Head Coin; Type: Coin; Artist: James Barton Longacre; Medium: Gold; Date: 1853; Credit: Gift of Heinz L. Stoppelmann, 1979"
2,"Ten-dollar Liberty Head Coin; Type: Coin; Artist: Christian Gobrecht; Medium: Gold; Date: 1901; Credit: Gift of Heinz L. Stoppelmann, 1980"
3,"Two-and-a-Half Dollar Coin; Type: Coin; Medium: Gold; Date: 1927; Credit: Gift of C. Ruxton Love Jr., 1967"

Copy after login

Now we can use the transformers package from Hugging Face AI toolset, and generate embeddings of each of these descriptions. We used the sentence-transformers/all-MiniLM-L12-v2 model, and used principal component analysis (PCA) to reduce the resulting vectors to 5 dimensions. That gives you something like:

[
  {
    "id": 1,
    "vector": [ -0.005544120445847511, -0.030924081802368164, 0.008597176522016525, 0.20186401903629303, 0.0578165128827095 ]
  },
  {
    "id": 2,
    "vector": [ -0.005544120445847511, -0.030924081802368164, 0.008597176522016525, 0.20186401903629303, 0.0578165128827095 ]
  },
  …
]

Copy after login

We have half a million of these, so it's not possible to store this entire dataset within the edge app's memory. And we want to do a custom type of similarity search over this data, which is something a traditional key-value store doesn't offer. Since we’re building a real-time experience, we also really want to avoid having to search half a million vectors at a time.

So, let's partition the data. We can use KMeans clustering to group vectors that are similar to each other. We sliced the data into 500 clusters of varying sizes, and calculated a center point called a “centroid vector” for each of those clusters. If you plotted this vector space in two dimensions and zoomed in, it might look a bit like this:

Build

The red crosses are the mathematical center points of each cluster of vectors, called centroids. They can work like wayfinders for our half-million-vector space. For instance, if we want to find the 10 most similar vectors to a given vector A, we can first look for the nearest centroid (out of 500), then conduct our search only within its corresponding cluster–a much more manageable area!

Now we have 500 small datasets and an index that maps centroid points to the relevant dataset. Next, to enable real-time performance, we want to precompile search graphs so that we don't need to initialize and construct them at runtime, and can use as little CPU time as possible. A really fast nearest-neighbor algorithm is Hierarchical Navigable Small Worlds (HNSW), and it has a pure Rust implementation, which we're using to write our edge app. So we wrote a small standalone Rust app to construct the HNSW graph structs for each dataset, and then used bincode to export the memory of the instantiated struct into a binary blob.

Now, those binary blobs can be loaded into KV store, keyed against the cluster index, and the cluster index can be included in our edge app.

This architecture lets us load parts of the search index into memory on demand. And since we’ll never have to search more than a few thousand vectors at a time, our searches will always be cheap and fast.