universeodon.com is part of the decentralized social network powered by Mastodon.
Be one with the #fediverse. Join millions of humans building, creating, and collaborating on Mastodon Social Network. Supports 1000 character posts.

Administered by:

Server stats:

3.7K
active users

Learn more

#similarity

0 posts0 participants0 posts today

I’m excited to share my newest blog post, "Don't sure cosine similarity carelessly"

p.migdal.pl/blog/2025/01/dont-

We often rely on cosine similarity to compare embeddings—it's like “duct tape” for vector comparisons. But just like duct tape, it can quietly mask deeper problems. Sometimes, embeddings pick up a “wrong kind” of similarity, matching questions to questions instead of questions to answers or getting thrown off by formatting quirks and typos rather than the text's real meaning.

In my post, I discuss what can go wrong with off-the-shelf cosine similarity and share practical alternatives. If you’ve ever wondered why your retrieval system returns oddly matched items or how to refine your embeddings for more meaningful results, this is for you!
`
I want to thank Max Salamonowicz and Grzegorz Kossakowski for their feedback after my flash talk at the Warsaw AI Breakfast, Rafał Małanij for inviting me to give a talk at the Python Summit, and for all the curious questions at the conference, and LinkedIn.

p.migdal.plDon't use cosine similarity carelesslyCosine similarity - the duct tape of AI. Convenient but often misused. Let's find out how to use it better.

evanston post office

the image shows a room with rows of uniform mailboxes lining the walls. In the center, there is a large, rectangular structure also made of metal boxes with locks. The room is well-lit with fluorescent lights, and the floor is shiny, reflecting the light above.

In tonight’s tutorial, we talked about how psychology makes sense of emotions like love. Is it all subjective? Maybe not as much as you think. Love can be studied scientifically—not just through brain scans showing neurotransmitters like dopamine and oxytocin but also through observable behaviours.

Take reciprocity: when someone likes us, we’re more likely to like them back. Or similarity: shared values, interests, and beliefs often strengthen attraction. Add proximity and familiarity—the magic of repeated exposure—and you start seeing how relationships form. Then there’s complementarity: where differences between people don’t divide but balance and enhance connection.

What fascinates me is how psychology bridges the personal and the universal. Yes, love feels deeply subjective, but studies show patterns we can test and measure—observing behaviours, tracking emotional responses, and using physiological data to explore what we feel.

Have you noticed these dynamics in your relationships? #Psychology #Love #Relationships #Emotions #Reciprocity #Similarity #Proximity #Familiarity #Complementarity

Replied to grimmiges

PS If they only had a slightly invested #phylogeneticist at hand; they easily could have learned a lot about the strengths and weaknesses of their data and preferred tree (a Bayesian MRC, by the way, is a summary tree of various competing topologies sampled in the MCMC chain, not a phylogenetic tree)

Here's a quick #NeighbourNet based on their "toutes" matrix (inferred in less than a minute), annotated.
Overall #similarity makes #clades, surprise, surprise.

@johncarlosbaez @gregeganSF

Curse of Dimensionality
en.wikipedia.org/wiki/Curse_of

"Dimensionally cursed phenomena occur in domains such as numerical analysis, sampling, combinatorics, machine learning, data mining and databases. The common theme of these problems is that when the dimensionality increases, the volume of the space increases so fast that the available data become sparse. In order to obtain a reliable result, the amount of data needed often grows exponentially with the dimensionality. Also, organizing and searching data often relies on detecting areas where objects form groups with similar properties; in high dimensional data, however, all objects appear to be sparse and dissimilar in many ways, which prevents common data organization strategies from being efficient. "

en.wikipedia.orgCurse of dimensionality - Wikipedia

#Similarity, a #gesaltPrinciple to make or break your layout: chrbutler.com/gestalt-principl

Similarity:
- creates text hierarchy e.g. headlines that are similar but body text that differs from headlines
- suggests equal importance, e.g. items that are the same size
- improves #scanability e.g. article previews that are similar are easy to scan

My take: Break similarity purposely to highlight e.g. your latest project

#HowToThing #023 — Responsive & reactive image gallery with tag-based Jaccard similarity ranking/filtering using thi.ng/bitfield, thi.ng/rstream & thi.ng/rdom

A quite common comment about #ThingUmbrella is that people often have little idea what some of the ~185 packages are even good/intended for and/or how to synthesize solutions from these small, individual building blocks. IMHO this is less about these packages themselves and more down to existing blank spots about the underlying concepts, algorithms and their potential role/utility in a larger problem domain... So I very much hope this new example is also useful in this respect!

Alas, the full code for this got pretty long and contains a lot more UI stuff. I'm intending to develop this further for the new homepage to browse all ~135 #ThingUmbrella examples (and maybe even for parts of the thi.ng website itself)... For those of you interested in more "advanced" thi.ng/rdom examples, do check it out!

Background info:
en.wikipedia.org/wiki/Jaccard_

Demo:
demo.thi.ng/umbrella/related-i

Full source code:
github.com/thi-ng/umbrella/tre

The important parts re: using compact binary encoding, bitfields & Jaccard similarity to find related items are here:

github.com/thi-ng/umbrella/blo