24 add cross attention labels text #60

meilame-tayebjee · 2026-01-26T17:17:02Z

This pull request introduces label attention as an optional feature in the text classification pipeline, allowing the model to generate label-specific sentence embeddings using a cross-attention mechanism. The changes include new configuration classes, updates to the TextEmbedder and model logic, and new tests to ensure label attention works as intended.

Label Attention Mechanism:

Added LabelAttentionConfig and LabelAttentionClassifier to enable label-specific sentence embeddings using cross-attention, where labels act as queries over token embeddings. [1] [2]
Updated TextEmbedderConfig and TextEmbedder to support label attention, including a new output structure and logic to handle label attention matrices. [1] [2] [3] [4] [5] [6] [7]

Model and Pipeline Integration:

Modified the main model (model.py) to validate and propagate label attention configuration, including enforcing that the classification head outputs a single value when label attention is enabled and updating the number of classes accordingly.
Adjusted the forward method and related pipeline logic to support returning label attention matrices and to handle the new embedding shapes.

Testing Enhancements:

Updated and extended the test pipeline to support label attention, including a new test function test_label_attention_enabled and corresponding updates to the helper functions. [1] [2] [3] [4] [5] [6] [7]

Miscellaneous:

Minor code improvements, such as import adjustments and docstring updates to reflect the new model name and features. [1] [2] [3] [4] [5]

- module and config created to do that - mainly attached the TextEmbedder (it aggregates the token embedding to produce a sentence embedding - instead of naive averaging) - rest of the code has been adapted, especially categorical var handling in TextClassificationModel

used as a namespace after, so no converting it throws a bug

…ated

- given a parameter, retrieve the attention matrix - compatible with captum attributions - update tests accordingly

Copilot

Pull request overview

Adds optional “label attention” cross-attention to the text classification pipeline so the model can produce label-specific sentence embeddings and (optionally) return label×token attention matrices for explainability.

Changes:

Introduces LabelAttentionConfig / LabelAttentionClassifier and integrates label attention into TextEmbedder.
Updates model forward/predict paths to support returning label-attention matrices and adjusts classifier head output shape when label attention is enabled.
Extends pipeline tests to cover label-attention-enabled training/prediction.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
`torchTextClassifiers/torchTextClassifiers.py`	Wires label-attention config through initialization, updates `predict()` explainability options, and deserializes label-attention config on load.
`torchTextClassifiers/model/model.py`	Propagates label-attention enablement, adjusts forward pass to optionally return attention matrices, and normalizes embeddings before the head.
`torchTextClassifiers/model/lightning.py`	Minor formatting-only change in `validation_step`.
`torchTextClassifiers/model/components/text_embedder.py`	Adds label-attention config/module and changes embedder outputs to include sentence embeddings + optional attention matrices.
`torchTextClassifiers/model/components/__init__.py`	Exports `LabelAttentionConfig`.
`tests/test_pipeline.py`	Adds a label-attention-enabled pipeline test and updates explainability assertions for new return keys.

Comments suppressed due to low confidence (1)

torchTextClassifiers/model/components/text_embedder.py:209

TextEmbedder._get_sentence_embedding now sometimes returns a raw tensor (for aggregation_method 'first'/'last'), but TextEmbedder.forward unconditionally treats the result as a dict and calls .values(). This will crash when aggregation_method != 'mean'. Make _get_sentence_embedding return a consistent structure (e.g., always a dict with sentence_embedding and label_attention_matrix).

        if self.attention_config is not None:
            if self.attention_config.aggregation_method is not None:  # default is "mean"
                if self.attention_config.aggregation_method == "first":
                    return token_embeddings[:, 0, :]
                elif self.attention_config.aggregation_method == "last":
                    lengths = attention_mask.sum(dim=1).clamp(min=1)  # last non-pad token index + 1
                    return token_embeddings[
                        torch.arange(token_embeddings.size(0)),
                        lengths - 1,
                        :,
                    ]

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

torchTextClassifiers/model/model.py

torchTextClassifiers/model/components/text_embedder.py

torchTextClassifiers/torchTextClassifiers.py

torchTextClassifiers/model/components/text_embedder.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot · 2026-01-26T17:30:15Z

@meilame-tayebjee I've opened a new pull request, #61, to work on those changes. Once the pull request is ready, I'll request review from you.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot · 2026-01-26T17:34:05Z

@meilame-tayebjee I've opened a new pull request, #62, to work on those changes. Once the pull request is ready, I'll request review from you.

Co-authored-by: meilame-tayebjee <114609737+meilame-tayebjee@users.noreply.github.com>

…essages Co-authored-by: meilame-tayebjee <114609737+meilame-tayebjee@users.noreply.github.com>

Copilot · 2026-01-27T09:45:52Z

@meilame-tayebjee I've opened a new pull request, #64, to work on those changes. Once the pull request is ready, I'll request review from you.

Co-authored-by: meilame-tayebjee <114609737+meilame-tayebjee@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 9 comments.

Comments suppressed due to low confidence (1)

torchTextClassifiers/model/components/text_embedder.py:226

TextEmbedder._get_sentence_embedding returns a raw Tensor for aggregation_method == 'last', but callers now expect a dict. This path will break whenever aggregation_method is set to 'last'. Return {'sentence_embedding': ..., 'label_attention_matrix': None} instead.

                elif self.attention_config.aggregation_method == "last":
                    lengths = attention_mask.sum(dim=1).clamp(min=1)  # last non-pad token index + 1
                    return token_embeddings[
                        torch.arange(token_embeddings.size(0)),
                        lengths - 1,
                        :,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-27T10:06:20Z

torchTextClassifiers/model/components/text_embedder.py

+            # size (B, n_head, n_labels, seq_len)
+            attention_scores = torch.matmul(q, k.transpose(-2, -1)) / (self.head_dim**0.5)
+
+            # Apply mask to attention scores before softmax
+            if attention_mask is not None:
+                # attn_mask is already in the right shape: (B, 1, 1, T)
+                # We need to apply it to scores of shape (B, n_head, n_labels, T)


When compute_attention_matrix=True and GQA is enabled (n_head != n_kv_head), torch.matmul(q, k.transpose(-2, -1)) will fail because q is (B, n_head, ...) while k is (B, n_kv_head, ...). Expand/repeat k (and the mask) to n_head (e.g., repeat_interleave by n_head//n_kv_head) or compute per-group weights so attention_matrix works under GQA.

Suggested change

# size (B, n_head, n_labels, seq_len)

attention_scores = torch.matmul(q, k.transpose(-2, -1)) / (self.head_dim**0.5)

# Apply mask to attention scores before softmax

if attention_mask is not None:

# attn_mask is already in the right shape: (B, 1, 1, T)

# We need to apply it to scores of shape (B, n_head, n_labels, T)

# q: (B, n_head, n_labels, head_dim)

# k: (B, n_kv_head, seq_len, head_dim)

# When using GQA (n_head != n_kv_head), expand k across heads so shapes match.

k_for_scores = k

if self.enable_gqa:

# Each key/value head is shared by a group of query heads.

# Repeat k along the head dimension to get shape (B, n_head, seq_len, head_dim).

expand_factor = self.n_head // self.n_kv_head

k_for_scores = k_for_scores.repeat_interleave(expand_factor, dim=1)

# size (B, n_head, n_labels, seq_len)

attention_scores = torch.matmul(q, k_for_scores.transpose(-2, -1)) / (self.head_dim**0.5)

# Apply mask to attention scores before softmax

if attention_mask is not None:

# attn_mask is already in the right shape: (B, 1, 1, T)

# It will broadcast over (B, n_head, n_labels, T)

would rather remove GQA for label attention to simplify for now @copilot

torchTextClassifiers/model/model.py

torchTextClassifiers/model/components/text_embedder.py

tests/test_pipeline.py

Copilot · 2026-01-27T10:10:37Z

@meilame-tayebjee I've opened a new pull request, #65, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot · 2026-01-27T10:13:03Z

@meilame-tayebjee I've opened a new pull request, #66, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot · 2026-01-27T10:13:23Z

@meilame-tayebjee I've opened a new pull request, #67, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot · 2026-01-27T10:13:30Z

@meilame-tayebjee I've opened a new pull request, #68, to work on those changes. Once the pull request is ready, I'll request review from you.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot · 2026-01-27T10:15:23Z

@meilame-tayebjee I've opened a new pull request, #69, to work on those changes. Once the pull request is ready, I'll request review from you.

Co-authored-by: meilame-tayebjee <114609737+meilame-tayebjee@users.noreply.github.com>

meilame-tayebjee added 4 commits January 26, 2026 16:12

fix(load): restore LabelAttentionConfig object from loaded dict

30ef8af

used as a namespace after, so no converting it throws a bug

test(label attention): add a test_pipeline with label attention activ…

0a1880b

…ated

feat(explainability): add new expl. pipe. with label attention

a2fe33e

- given a parameter, retrieve the attention matrix - compatible with captum attributions - update tests accordingly

meilame-tayebjee linked an issue Jan 26, 2026 that may be closed by this pull request

Add Cross Attention Labels / Text #24

Open

meilame-tayebjee requested review from Copilot and micedre January 26, 2026 17:17

Copilot started reviewing on behalf of meilame-tayebjee January 26, 2026 17:17 View session

Copilot AI reviewed Jan 26, 2026

View reviewed changes

meilame-tayebjee and others added 2 commits January 26, 2026 18:24

fix typo

3d034f4

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

fix docstring in TextEmbedder forward

ec6742c

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI mentioned this pull request Jan 26, 2026

Add validation for LabelAttentionClassifier head configuration #61

Merged

meilame-tayebjee and others added 3 commits January 26, 2026 18:31

fix: convert to LabelAttentionConfig object when dict

f991b6b

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

chore: replace type checking with isinstance

525b482

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

chore: better dict handling in TextEmbedder forward output

2374df8

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI mentioned this pull request Jan 26, 2026

Fix device placement for label_indices in LabelAttentionClassifier #62

Merged

Copilot AI and others added 3 commits January 27, 2026 10:22

fix: ensure label_indices uses correct device and dtype

d266572

Co-authored-by: meilame-tayebjee <114609737+meilame-tayebjee@users.noreply.github.com>

Add validation for LabelAttentionClassifier head configuration

4aada37

Co-authored-by: meilame-tayebjee <114609737+meilame-tayebjee@users.noreply.github.com>

Improve validation to follow TextEmbedder pattern and clarify error m…

d516e6b

…essages Co-authored-by: meilame-tayebjee <114609737+meilame-tayebjee@users.noreply.github.com>

Copilot AI mentioned this pull request Jan 27, 2026

Implement attention masking in LabelAttentionClassifier cross-attention #64

Merged

Copilot AI and others added 2 commits January 27, 2026 10:55

Apply attention mask in LabelAttentionClassifier cross-attention

87d672f

Co-authored-by: meilame-tayebjee <114609737+meilame-tayebjee@users.noreply.github.com>

Fix trailing whitespace in attention matrix computation

7a988c3

Co-authored-by: meilame-tayebjee <114609737+meilame-tayebjee@users.noreply.github.com>

meilame-tayebjee requested a review from Copilot January 27, 2026 09:56

Copilot started reviewing on behalf of meilame-tayebjee January 27, 2026 09:57 View session

Copilot AI reviewed Jan 27, 2026

View reviewed changes

Copilot AI mentioned this pull request Jan 27, 2026

Fix return type annotation for TextClassificationModel.forward #65

Merged

Copilot AI mentioned this pull request Jan 27, 2026

Fix UnboundLocalError in TextClassificationModel when return_label_attention_matrix=True with no text_embedder #66

Merged

Copilot AI mentioned this pull request Jan 27, 2026

Fix inconsistent return type in TextEmbedder._get_sentence_embedding #67

Closed

Copilot AI mentioned this pull request Jan 27, 2026

Fix return type annotation for _get_sentence_embedding to match dict return #68

Merged

doc: fix docstring on shape of label attention matrix

70b79c9

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI mentioned this pull request Jan 27, 2026

Add test assertions for label attention attributions shape and presence #69

Merged

Copilot AI and others added 6 commits January 27, 2026 11:31

Add assertions for label attention attributions in tests

0558f97

Co-authored-by: meilame-tayebjee <114609737+meilame-tayebjee@users.noreply.github.com>

Update _get_sentence_embedding return type annotation and docstring

44d9345

Co-authored-by: meilame-tayebjee <114609737+meilame-tayebjee@users.noreply.github.com>

Fix early returns to match dictionary return type

86a0715

Co-authored-by: meilame-tayebjee <114609737+meilame-tayebjee@users.noreply.github.com>

Initialize label_attention_matrix to None before text_embedder branch

a721397

Co-authored-by: meilame-tayebjee <114609737+meilame-tayebjee@users.noreply.github.com>

Fix return type annotation for TextClassificationModel.forward

798ff8c

Co-authored-by: meilame-tayebjee <114609737+meilame-tayebjee@users.noreply.github.com>

Address code review feedback: fix trailing whitespace and NameError

1bbff15

Co-authored-by: meilame-tayebjee <114609737+meilame-tayebjee@users.noreply.github.com>

-            # size (B, n_head, n_labels, seq_len)
-            attention_scores = torch.matmul(q, k.transpose(-2, -1)) / (self.head_dim**0.5)
-            # Apply mask to attention scores before softmax
-            if attention_mask is not None:
-                # attn_mask is already in the right shape: (B, 1, 1, T)
-                # We need to apply it to scores of shape (B, n_head, n_labels, T)
+            # q: (B, n_head, n_labels, head_dim)
+            # k: (B, n_kv_head, seq_len, head_dim)
+            # When using GQA (n_head != n_kv_head), expand k across heads so shapes match.
+            k_for_scores = k
+            if self.enable_gqa:
+                # Each key/value head is shared by a group of query heads.
+                # Repeat k along the head dimension to get shape (B, n_head, seq_len, head_dim).
+                expand_factor = self.n_head // self.n_kv_head
+                k_for_scores = k_for_scores.repeat_interleave(expand_factor, dim=1)
+            # size (B, n_head, n_labels, seq_len)
+            attention_scores = torch.matmul(q, k_for_scores.transpose(-2, -1)) / (self.head_dim**0.5)
+            # Apply mask to attention scores before softmax
+            if attention_mask is not None:
+                # attn_mask is already in the right shape: (B, 1, 1, T)
+                # It will broadcast over (B, n_head, n_labels, T)

24 add cross attention labels text #60

Are you sure you want to change the base?

24 add cross attention labels text #60

Conversation

meilame-tayebjee commented Jan 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI commented Jan 26, 2026

Uh oh!

Copilot AI commented Jan 26, 2026

Uh oh!

Copilot AI commented Jan 27, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

meilame-tayebjee Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI commented Jan 27, 2026

Uh oh!

Copilot AI commented Jan 27, 2026

Uh oh!

Copilot AI commented Jan 27, 2026

Uh oh!

Copilot AI commented Jan 27, 2026

Uh oh!

Copilot AI commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

meilame-tayebjee Jan 27, 2026 •

edited

Loading