Skip to content

Conversation

@EdisonSu768
Copy link
Member

@EdisonSu768 EdisonSu768 commented Jan 13, 2026

Summary by CodeRabbit

  • Documentation
    • Updated navigation label to "Training" in installation docs
    • Added LLM Compressor docs: intro, overview, and how‑to with Workbench integration and deployment guidance
    • Added example notebooks demonstrating calibration-based and data-free compression workflows
  • Chores
    • Expanded spell-check compound list to recognize notebook file extension

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 13, 2026

Walkthrough

Adds an LLM Compressor documentation section (intro, index, how‑to), two example compression notebooks (data‑free and calibration), updates a navigation label in installation docs, and adds "ipynb" to the spell-check compound list. No code or public API changes.

Changes

Cohort / File(s) Summary
Documentation Navigation Update
docs/en/installation/tools.mdx
Changed nav_pre_train label from "预训练" / "PreTraining" to "训练" / "Training".
LLM Compressor Core Docs
docs/en/llm-compressor/index.mdx, docs/en/llm-compressor/intro.mdx, docs/en/llm-compressor/how_to/index.mdx
Added new MDX pages introducing the LLM Compressor section and overview.
Workbench How‑To
docs/en/llm-compressor/how_to/compressor_by_workbench.mdx
New how‑to describing Workbench-based compression workflows (data‑free and calibration), model/repo upload, dataset handling, notebooks, and deployment notes.
Example Notebooks
docs/public/data-free-compressor.ipynb, docs/public/calibration-compressor.ipynb
Added two notebooks demonstrating data‑free quantization and GPTQ calibration workflows, including GPU detection, compression runs, and model/tokenizer save steps.
Spell-Checker Dictionary
.cspell/compound.txt
Added "ipynb" to compound word list for spell-checker.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • zhaomingkun1030

Poem

🐇 I hopped through docs with nimble feet,
Notebooks, notes, and steps to meet.
Tiny models tucked away,
Compressed and cozy for the day. ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'feat: llm compressor AI-23582' is vague and lacks specificity about what was added or changed regarding LLM Compressor documentation. Consider using a more descriptive title such as 'docs: add LLM Compressor documentation and example notebooks' to clearly convey the main changes.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Jan 13, 2026

Deploying alauda-ai with  Cloudflare Pages  Cloudflare Pages

Latest commit: fd6de86
Status: ✅  Deploy successful!
Preview URL: https://f764a6b5.alauda-ai.pages.dev
Branch Preview URL: https://feat-llm-compressor.alauda-ai.pages.dev

View logs

@EdisonSu768 EdisonSu768 changed the title chore: change to training AI-23523 feat: llm compressor AI-23582 Jan 14, 2026
@EdisonSu768 EdisonSu768 marked this pull request as ready for review January 14, 2026 09:21
@EdisonSu768 EdisonSu768 self-assigned this Jan 14, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Fix all issues with AI agents
In `@docs/en/llm-compressor/how_to/compressor_by_workbench.mdx`:
- Line 50: In docs/en/llm-compressor/how_to/compressor_by_workbench.mdx update
the link target that currently points to
../../model_inference/inference_service/functions/inference_service.html#create-inference-service
so it uses the correct .mdx extension
(../../model_inference/inference_service/functions/inference_service.mdx#create-inference-service);
this fixes the broken link when creating a new inference service after uploading
the compressed model.

In `@docs/en/llm-compressor/how_to/evaluate_model.mdx`:
- Around line 31-46: The docs currently hardcode a Python version in the
site-packages path; update the instructions to avoid a specific Python version
by telling users to edit the lm_eval/tasks/__init__.py file inside their active
virtualenv's site-packages directory (do not hardcode
~/.venv/lib/python3.11/...), and instruct them to locate the block that computes
relative_yaml_path (the try/except around
yaml_path.relative_to(lm_eval_tasks_path)) and apply the suggested
ValueError-handling change there; also add a short note directing users to use
their Python/virtualenv tools to discover their site-packages location rather
than assuming a path.
- Around line 54-73: The YAML example uses an incorrect function reference
syntax: replace the hyphenated `!function preprocess_wikitext-process_results`
with the proper import-style dotted path `!function
preprocess_wikitext.process_results` (matching the other `!function
preprocess_wikitext.wikitext_detokenizer` usage) so the `!function` directive
points to the module attribute correctly.

In `@docs/public/calibration-compressor.ipynb`:
- Around line 122-132: The notebook calls tokenizer.save_pretrained(model_dir)
but tokenizer is undefined; import or instantiate the tokenizer before this save
step (e.g., load/create the tokenizer tied to model_id) so that tokenizer exists
when saving; ensure the tokenizer variable matches the model used (referencing
tokenizer, model, model_id, and model_dir) and place the tokenizer
initialization before the save_pretrained call.
- Around line 148-170: The notebook has syntax errors from stray spaces in
identifiers: fix the tokenization so uses os.environ (not "os. environ"),
lm_eval.tasks (not "lm_eval. tasks"), and TaskManager( (not "TaskManager (")
when constructing task_manager; update those occurrences in the cell to remove
the extraneous spaces so the imports and the TaskManager(...) call parse
correctly.
- Around line 102-120: The notebook cell is missing imports for
AutoModelForCausalLM and oneshot; add import statements for these symbols (e.g.,
from transformers import AutoModelForCausalLM and from the library that provides
oneshot) in an earlier cell or at the top of this cell so
AutoModelForCausalLM.from_pretrained(...) and oneshot(...) resolve correctly;
ensure the import for oneshot matches the package used elsewhere in the project
(the same module that defines oneshot).
- Around line 36-100: The notebook fails because tokenizer is used inside
preprocess but never defined; add importing and loading of the model tokenizer
(e.g., from transformers import AutoTokenizer) and instantiate tokenizer before
the dataset cell (matching the model used for compression, e.g.,
AutoTokenizer.from_pretrained(model_id, use_fast=... or appropriate kwargs));
ensure the tokenizer exposes apply_chat_template (or wrap/assign a function if
your tokenizer wrapper provides that) so preprocess,
tokenizer.apply_chat_template and tokenizer(...) calls succeed.

In `@docs/public/data-free-compressor.ipynb`:
- Around line 109-131: There are syntax errors from stray spaces in identifiers
and a potential missed usage of the TaskManager: remove the spaces so use
os.environ (not os. environ), lm_eval.tasks (not lm_eval. tasks), and call
TaskManager(...) (not TaskManager ( ... )), then either pass the created
task_manager into lm_eval.simple_evaluate (or add a clear comment that
instantiating task_manager is relied upon for side-effect registration) to make
intent explicit; update any related variable names (task_manager, my-wikitext)
accordingly.
🧹 Nitpick comments (2)
docs/en/installation/tools.mdx (1)

138-140: Consider updating the i18nKey for consistency.

The i18nKey remains "nav_pre_train" while the display text has been updated to "Training". While i18n keys don't technically need to match the display text, maintaining consistency between the key name and its meaning improves maintainability.

Since this is example documentation showing what the merged ConfigMap looks like, verify whether the actual system configuration also needs this i18nKey updated to something like "nav_training".

docs/en/llm-compressor/intro.mdx (1)

14-14: Minor: Use hyphenated "floating-point" as compound adjective.

When used as a compound adjective before a noun, "floating-point" should be hyphenated.

📝 Suggested fix
-- Weight and activation quantization (W8A8) compresses both weights and activations to 8-bit precision, targeting general server scenarios for integer and floating point formats.
+- Weight and activation quantization (W8A8) compresses both weights and activations to 8-bit precision, targeting general server scenarios for integer and floating-point formats.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6090ae2 and a26a899.

📒 Files selected for processing (8)
  • docs/en/installation/tools.mdx
  • docs/en/llm-compressor/how_to/compressor_by_workbench.mdx
  • docs/en/llm-compressor/how_to/evaluate_model.mdx
  • docs/en/llm-compressor/how_to/index.mdx
  • docs/en/llm-compressor/index.mdx
  • docs/en/llm-compressor/intro.mdx
  • docs/public/calibration-compressor.ipynb
  • docs/public/data-free-compressor.ipynb
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-31T02:30:16.360Z
Learnt from: EdisonSu768
Repo: alauda/aml-docs PR: 73
File: docs/en/monitoring_ops/resource_monitoring/how_to/add_monitor_dashboard.mdx:28-45
Timestamp: 2025-12-31T02:30:16.360Z
Learning: In MDX documentation files (e.g., docs/.../*.mdx), when including PromQL code blocks, use bash as the syntax highlighter fallback because the rspress system does not support PromQL highlighting. Ensure the code blocks specify the language as bash (e.g., ```bash) where PromQL would appear, to maintain readability and avoid broken highlighting.

Applied to files:

  • docs/en/llm-compressor/how_to/index.mdx
  • docs/en/installation/tools.mdx
  • docs/en/llm-compressor/how_to/evaluate_model.mdx
  • docs/en/llm-compressor/index.mdx
  • docs/en/llm-compressor/intro.mdx
  • docs/en/llm-compressor/how_to/compressor_by_workbench.mdx
🪛 LanguageTool
docs/en/llm-compressor/how_to/evaluate_model.mdx

[style] ~50-~50: Consider using “inaccessible” to avoid wordiness.
Context: ...m Hugging Face. Because Hugging Face is not accessible from mainland China, you must define a ...

(NOT_ABLE_PREMIUM)

docs/en/llm-compressor/intro.mdx

[grammar] ~14-~14: Use a hyphen to join words.
Context: ...erver scenarios for integer and floating point formats. - Weight pruning, also kn...

(QB_NEW_EN_HYPHEN)


[style] ~15-~15: ‘in conjunction with’ might be wordy. Consider a shorter alternative.
Context: ...is requires fine-tuning, it can be used in conjunction with quantization for further inference acce...

(EN_WORDINESS_PREMIUM_IN_CONJUNCTION_WITH)

🪛 Ruff (0.14.11)
docs/public/calibration-compressor.ipynb

34-34: Undefined name tokenizer

(F821)


38-38: Undefined name tokenizer

(F821)


51-51: Undefined name AutoModelForCausalLM

(F821)


55-55: Undefined name oneshot

(F821)


67-67: Undefined name tokenizer

(F821)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pipelines as Code CI / doc-pr-build-ai
🔇 Additional comments (10)
docs/en/llm-compressor/index.mdx (1)

1-7: LGTM!

Standard index page structure with appropriate frontmatter weight and the <Overview /> component placeholder.

docs/en/llm-compressor/how_to/index.mdx (1)

1-7: LGTM!

Standard how-to section index page following the established documentation pattern.

docs/en/llm-compressor/intro.mdx (1)

7-15: LGTM! Clear introduction to LLM Compressor.

The content provides a good overview of the framework, its integration with Hugging Face and vLLM, and the supported compression techniques. The external links to the GitHub repository and vLLM documentation are helpful references.

docs/en/llm-compressor/how_to/compressor_by_workbench.mdx (2)

1-50: LGTM! Comprehensive workflow documentation.

The guide provides clear, well-structured instructions for using LLM Compressor with the Alauda AI platform. The workflow steps are logically organized, and the conditional sections (optional dataset preparation for data-free vs calibration workflows) are appropriately marked.


9-10: The notebook paths are correctly configured. Both referenced files exist at docs/public/calibration-compressor.ipynb and docs/public/data-free-compressor.ipynb. The absolute paths (/data-free-compressor.ipynb, /calibration-compressor.ipynb) are the correct format for referencing files in the public directory and will resolve properly when the documentation is built.

docs/public/data-free-compressor.ipynb (3)

1-20: LGTM!

Clear introduction with appropriate notes about GPU requirements and lm_eval version dependency.


28-54: LGTM!

GPU detection and quantization recipe setup are well-implemented. The W4A16 scheme with lm_head exclusion is a sensible default configuration.


56-93: LGTM!

Model loading, compression, and saving workflow is correct. Using device_map="auto" and torch_dtype="auto" provides good flexibility.

docs/public/calibration-compressor.ipynb (1)

1-35: LGTM!

Clear introduction explaining the calibration-based workflow and GPTQ algorithm usage.

docs/en/llm-compressor/how_to/evaluate_model.mdx (1)

79-79: The notebook links are correct. The notebooks exist at docs/public/calibration-compressor.ipynb and docs/public/data-free-compressor.ipynb, and the absolute paths /data-free-compressor.ipynb and /calibration-compressor.ipynb follow the standard documentation site pattern where the public assets directory is served from the root. These links are used consistently across multiple documentation files.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@EdisonSu768 EdisonSu768 requested a review from fyuan1316 January 15, 2026 03:31
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@docs/en/llm-compressor/intro.mdx`:
- Around line 24-30: Hyphenate compound adjectives: change "floating point
formats" to "floating-point formats" in the sentence describing W8A8, and change
both occurrences of "Weight only INT4" to "Weight-only INT4" in the LLM
Compressor algorithm list (references: the W8A8 description and the list items
"AWQ: Weight only INT4" and "GPTQ: Weight only INT4").
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a26a899 and aaa4433.

📒 Files selected for processing (5)
  • .cspell/compound.txt
  • docs/en/llm-compressor/how_to/compressor_by_workbench.mdx
  • docs/en/llm-compressor/intro.mdx
  • docs/public/calibration-compressor.ipynb
  • docs/public/data-free-compressor.ipynb
🚧 Files skipped from review as they are similar to previous changes (2)
  • docs/public/calibration-compressor.ipynb
  • docs/public/data-free-compressor.ipynb
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-31T02:30:16.360Z
Learnt from: EdisonSu768
Repo: alauda/aml-docs PR: 73
File: docs/en/monitoring_ops/resource_monitoring/how_to/add_monitor_dashboard.mdx:28-45
Timestamp: 2025-12-31T02:30:16.360Z
Learning: In MDX documentation files (e.g., docs/.../*.mdx), when including PromQL code blocks, use bash as the syntax highlighter fallback because the rspress system does not support PromQL highlighting. Ensure the code blocks specify the language as bash (e.g., ```bash) where PromQL would appear, to maintain readability and avoid broken highlighting.

Applied to files:

  • docs/en/llm-compressor/how_to/compressor_by_workbench.mdx
  • docs/en/llm-compressor/intro.mdx
🪛 LanguageTool
docs/en/llm-compressor/intro.mdx

[grammar] ~24-~24: Use a hyphen to join words.
Context: ...erver scenarios for integer and floating point formats. - Weight pruning, also kn...

(QB_NEW_EN_HYPHEN)


[style] ~25-~25: ‘in conjunction with’ might be wordy. Consider a shorter alternative.
Context: ...is requires fine-tuning, it can be used in conjunction with quantization for further inference acce...

(EN_WORDINESS_PREMIUM_IN_CONJUNCTION_WITH)


[grammar] ~29-~29: Use a hyphen to join words.
Context: ...l compression algorithms: - AWQ: Weight only INT4 quantization - GPTQ: Weight on...

(QB_NEW_EN_HYPHEN)


[grammar] ~30-~30: Use a hyphen to join words.
Context: ...ht only INT4 quantization - GPTQ: Weight only INT4 quantization - FP8: Dynamic pe...

(QB_NEW_EN_HYPHEN)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pipelines as Code CI / doc-pr-build-ai
🔇 Additional comments (5)
.cspell/compound.txt (1)

7-7: LGTM!

Adding "ipynb" to the spell-checker's compound words list is appropriate given the Jupyter notebooks introduced in this PR.

docs/en/llm-compressor/intro.mdx (1)

1-20: LGTM!

The introduction provides a clear and comprehensive overview of LLM Compressor, its integration with the vLLM project, and the supported compression methodologies.

docs/en/llm-compressor/how_to/compressor_by_workbench.mdx (3)

1-11: LGTM!

The introduction clearly describes the two available compression workflows with appropriate links to the example notebooks.


12-43: LGTM!

The workflow steps are well-structured with clear instructions, appropriate cross-references, and helpful notes for optional steps.


44-48: LGTM!

The deployment section correctly explains the output format and links to the inference service documentation with the proper .mdx extension.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@EdisonSu768 EdisonSu768 force-pushed the feat/llm-compressor branch 2 times, most recently from e602da3 to 0a62d62 Compare January 16, 2026 04:57
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@docs/public/calibration-compressor.ipynb`:
- Around line 63-86: Adjust max_sequence_length by consulting both
tokenizer.model_max_length and model.config.max_position_embeddings (after
loading tokenizer and model) and take the effective limit as the minimum of the
two, but treat tokenizers that report unrealistically large values as unset
(e.g., ignore values >= 1e9 and fall back to
model.config.max_position_embeddings); finally apply the use_gpu branch (keep
the computed min when use_gpu is true, otherwise set to 16). Update the logic
that currently sets max_sequence_length to reference tokenizer.model_max_length
and model.config.max_position_embeddings and handle the large/unset tokenizer
value case before applying the GPU/CPU override.
🧹 Nitpick comments (1)
docs/en/llm-compressor/intro.mdx (1)

25-25: Trim “in conjunction with” for brevity.

“with” or “alongside” keeps the sentence tighter.

📝 Suggested edit
-Weight pruning, also known as sparsification, removes certain weights from the model entirely. While this requires fine-tuning, it can be used in conjunction with quantization for further inference acceleration.
+Weight pruning, also known as sparsification, removes certain weights from the model entirely. While this requires fine-tuning, it can be used with quantization for further inference acceleration.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e602da3 and 0a62d62.

📒 Files selected for processing (5)
  • .cspell/compound.txt
  • docs/en/llm-compressor/how_to/compressor_by_workbench.mdx
  • docs/en/llm-compressor/intro.mdx
  • docs/public/calibration-compressor.ipynb
  • docs/public/data-free-compressor.ipynb
✅ Files skipped from review due to trivial changes (1)
  • .cspell/compound.txt
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-31T02:30:16.360Z
Learnt from: EdisonSu768
Repo: alauda/aml-docs PR: 73
File: docs/en/monitoring_ops/resource_monitoring/how_to/add_monitor_dashboard.mdx:28-45
Timestamp: 2025-12-31T02:30:16.360Z
Learning: In MDX documentation files (e.g., docs/.../*.mdx), when including PromQL code blocks, use bash as the syntax highlighter fallback because the rspress system does not support PromQL highlighting. Ensure the code blocks specify the language as bash (e.g., ```bash) where PromQL would appear, to maintain readability and avoid broken highlighting.

Applied to files:

  • docs/en/llm-compressor/how_to/compressor_by_workbench.mdx
  • docs/en/llm-compressor/intro.mdx
🪛 LanguageTool
docs/en/llm-compressor/intro.mdx

[style] ~25-~25: ‘in conjunction with’ might be wordy. Consider a shorter alternative.
Context: ...is requires fine-tuning, it can be used in conjunction with quantization for further inference acce...

(EN_WORDINESS_PREMIUM_IN_CONJUNCTION_WITH)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pipelines as Code CI / doc-pr-build-ai
🔇 Additional comments (2)
docs/public/data-free-compressor.ipynb (1)

27-75: LGTM — clear, minimal data-free workflow.

docs/en/llm-compressor/how_to/compressor_by_workbench.mdx (1)

7-48: LGTM — steps are clear and links look correct.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +63 to +86
"# Load up model using huggingface API\n",
"from transformers import AutoModelForCausalLM, AutoTokenizer\n",
"\n",
"model = AutoModelForCausalLM.from_pretrained(\n",
" model_id, device_map=\"auto\", torch_dtype=\"auto\"\n",
")\n",
"tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from datasets import load_dataset\n",
"\n",
"# Create the calibration dataset, using Huggingface datasets API\n",
"dataset_id = \"./ultrachat_200k\"\n",
"\n",
"# Select number of samples. 512 samples is a good place to start.\n",
"# Increasing the number of samples can improve accuracy.\n",
"num_calibration_samples = 512 if use_gpu else 4\n",
"max_sequence_length = 2048 if use_gpu else 16\n",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Hugging Face Transformers tokenizer.model_max_length vs model.config.max_position_embeddings best practice for determining sequence length

💡 Result:

Short answer

  • Use both: treat tokenizer.model_max_length as the tokenizer / preprocessing limit and model.config.max_position_embeddings as the model’s positional-embedding hard limit. Choose the effective maximum as the smaller of the two (with model-specific adjustments like RoBERTa’s +2). If tokenizer.model_max_length is unset/huge, fall back to the model config. [1][2][3][4]

Why

  • model.config.max_position_embeddings is how many position embeddings the model actually has (a hard upper bound). Use it to avoid index-out-of-range errors. [3]
  • tokenizer.model_max_length is what the tokenizer will use for truncation/padding and is the sensible default for preprocessing; it may be missing or set to a very large integer for some local tokenizers, so don’t assume it’s correct. [1][4]
  • Some models (RoBERTa variants) reserve extra position ids (e.g. config set to 514 for a 512 usable sequence length); read model docs or check embedding shapes when in doubt. [2]

Practical rule + code

  • Effective_max = min(valid_tokenizer_max, model_max_pos)
    • If tokenizer.model_max_length is absurdly large (>= 1e9), ignore it and use model.config.max_position_embeddings (or model.config.max_position_embeddings - model_padding_offset for models like RoBERTa).
  • If you need longer context, you must (a) increase config.max_position_embeddings, (b) resize/replace the model’s position embeddings, and (c) set tokenizer.model_max_length accordingly. [2][1]

Example (PyTorch / Transformers)

  • Basic safe choice:
    effective_max = min(
    tokenizer.model_max_length if tokenizer.model_max_length < 1e9 else float("inf"),
    model.config.max_position_embeddings
    )
  • Note: for RoBERTa-style models you may need to subtract/add the padding offset after inspecting model.embeddings.position_embeddings.weight.shape. [2]

References

  • GitHub issue: tokenizer.model_max_length inconsistencies. [1]
  • Hugging Face forum discussion (RoBERTa position embedding offset and recommendation to use tokenizer.model_max_length for seq length). [2]
  • Transformers model config doc: max_position_embeddings meaning. [3]
  • StackOverflow discussion summarizing difference and practical check. [4]

🏁 Script executed:

# Examine the notebook file to understand the cell structure and context
file docs/public/calibration-compressor.ipynb
wc -l docs/public/calibration-compressor.ipynb

Repository: alauda/aml-docs

Length of output: 163


🏁 Script executed:

# Read the notebook to understand context and structure
cat -n docs/public/calibration-compressor.ipynb | head -100

Repository: alauda/aml-docs

Length of output: 4274


🏁 Script executed:

# Continue reading to see how max_sequence_length is used
cat -n docs/public/calibration-compressor.ipynb | sed -n '100,120p'

Repository: alauda/aml-docs

Length of output: 722


🏁 Script executed:

# Check if there's any usage of model.config elsewhere in the notebook
grep -n "model.config" docs/public/calibration-compressor.ipynb || echo "No model.config usage found"

Repository: alauda/aml-docs

Length of output: 86


🏁 Script executed:

# Check the model being used and if there's any documentation about its max sequence length
grep -n "TinyLlama\|model_id" docs/public/calibration-compressor.ipynb | head -20

Repository: alauda/aml-docs

Length of output: 343


Derive max_sequence_length from both tokenizer and model config to avoid mismatches when model_id changes.

The suggested refactor should be expanded to follow Hugging Face best practices. Instead of only checking tokenizer.model_max_length, also check model.config.max_position_embeddings and use the minimum of the two (handling cases where the tokenizer value is unset or unreasonably large). A safer approach:

# After loading tokenizer and model
tokenizer_max = tokenizer.model_max_length
# Ignore unreasonably large tokenizer max lengths (some tokenizers default to 1e9+)
tokenizer_max = tokenizer_max if tokenizer_max < 1e9 else float('inf')
model_max = model.config.max_position_embeddings
max_sequence_length = min(tokenizer_max, model_max) if tokenizer_max < 1e9 else model_max
# Adjust for GPU/CPU as needed
max_sequence_length = max_sequence_length if use_gpu else 16

This ensures alignment with both the tokenizer's preprocessing limits and the model's positional embedding constraints.

🤖 Prompt for AI Agents
In `@docs/public/calibration-compressor.ipynb` around lines 63 - 86, Adjust
max_sequence_length by consulting both tokenizer.model_max_length and
model.config.max_position_embeddings (after loading tokenizer and model) and
take the effective limit as the minimum of the two, but treat tokenizers that
report unrealistically large values as unset (e.g., ignore values >= 1e9 and
fall back to model.config.max_position_embeddings); finally apply the use_gpu
branch (keep the computed min when use_gpu is true, otherwise set to 16). Update
the logic that currently sets max_sequence_length to reference
tokenizer.model_max_length and model.config.max_position_embeddings and handle
the large/unset tokenizer value case before applying the GPU/CPU override.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@docs/en/llm-compressor/how_to/compressor_by_workbench.mdx`:
- Around line 105-111: The snippet uses a broken reference `os path.join`; fix
by replacing it with the correct attribute access `os.path.join` where the code
builds local_path (inside the loop that calls os.walk); verify other os
references (e.g., os.walk, os.sep) are correct and keep the s3.upload_file call
(s3.upload_file(local_path, bucket_name, s3_key, Config=config)) unchanged so
uploads use the fixed local_path.
- Around line 131-140: Replace the confusing example credential strings with
explicit, consistent placeholders so users know to replace them: update
os.environ assignments for "AWS_ACCESS_KEY_ID" and "AWS_SECRET_ACCESS_KEY" and
the storage_options values "key", "secret", and "client_kwargs" ->
"endpoint_url" to use clear placeholders (e.g., "<AWS_ACCESS_KEY_ID>",
"<AWS_SECRET_ACCESS_KEY>", "<S3_KEY>", "<S3_SECRET>", "<S3_ENDPOINT_URL>") and
ensure the placeholder format is consistent across os.environ and
storage_options entries such as os.environ["AWS_ACCESS_KEY_ID"],
os.environ["AWS_SECRET_ACCESS_KEY"], and the storage_options dict.

Comment on lines +105 to +111
for root, dirs, files in os.walk(local_folder):
for filename in files:
local_path = os path.join(root, filename)
relative_path = os.path.relpath(local_path, local_folder)
s3_key = f"ultrachat_200k/{relative_path.replace(os.sep, '/')}"
s3.upload_file(local_path, bucket_name, s3_key, Config=config)
print(f"Uploaded {local_path} -> {s3_key}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Syntax error: missing dot in os.path.join.

Line 107 has os path.join which should be os.path.join. This will cause a NameError when users run this code.

🐛 Proposed fix
 for root, dirs, files in os.walk(local_folder):
     for filename in files:
-        local_path = os path.join(root, filename)
+        local_path = os.path.join(root, filename)
         relative_path = os.path.relpath(local_path, local_folder)
         s3_key = f"ultrachat_200k/{relative_path.replace(os.sep, '/')}"
         s3.upload_file(local_path, bucket_name, s3_key, Config=config)
         print(f"Uploaded {local_path} -> {s3_key}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for root, dirs, files in os.walk(local_folder):
for filename in files:
local_path = os path.join(root, filename)
relative_path = os.path.relpath(local_path, local_folder)
s3_key = f"ultrachat_200k/{relative_path.replace(os.sep, '/')}"
s3.upload_file(local_path, bucket_name, s3_key, Config=config)
print(f"Uploaded {local_path} -> {s3_key}")
for root, dirs, files in os.walk(local_folder):
for filename in files:
local_path = os.path.join(root, filename)
relative_path = os.path.relpath(local_path, local_folder)
s3_key = f"ultrachat_200k/{relative_path.replace(os.sep, '/')}"
s3.upload_file(local_path, bucket_name, s3_key, Config=config)
print(f"Uploaded {local_path} -> {s3_key}")
🤖 Prompt for AI Agents
In `@docs/en/llm-compressor/how_to/compressor_by_workbench.mdx` around lines 105 -
111, The snippet uses a broken reference `os path.join`; fix by replacing it
with the correct attribute access `os.path.join` where the code builds
local_path (inside the loop that calls os.walk); verify other os references
(e.g., os.walk, os.sep) are correct and keep the s3.upload_file call
(s3.upload_file(local_path, bucket_name, s3_key, Config=config)) unchanged so
uploads use the fixed local_path.

Comment on lines +131 to +140
os.environ["AWS_ACCESS_KEY_ID"] = "@7Apples@" #[!code callout]
os.environ["AWS_SECRET_ACCESS_KEY"] = "07Apples@"

storage_options = {
"key": "07Apples@",
"secret": "O7Apples@",
"client_kwargs": {
"endpoint_url": "http://minio.minio-system.svc.cluster.local:80" #[!code callout]
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use clearer placeholder values for credentials.

The current values (@7Apples@, 07Apples@, O7Apples@) are inconsistent and could confuse users. Consider using explicit placeholder format to make it clear these need to be replaced.

📝 Suggested fix
-os.environ["AWS_ACCESS_KEY_ID"] = "@7Apples@" #[!code callout]
-os.environ["AWS_SECRET_ACCESS_KEY"] = "07Apples@"
+os.environ["AWS_ACCESS_KEY_ID"] = "<YOUR_ACCESS_KEY_ID>" #[!code callout]
+os.environ["AWS_SECRET_ACCESS_KEY"] = "<YOUR_SECRET_ACCESS_KEY>"

 storage_options = {
-  "key": "07Apples@",
-  "secret": "O7Apples@",
+  "key": "<YOUR_ACCESS_KEY_ID>",
+  "secret": "<YOUR_SECRET_ACCESS_KEY>",
   "client_kwargs": {
     "endpoint_url": "http://minio.minio-system.svc.cluster.local:80" #[!code callout]
   }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
os.environ["AWS_ACCESS_KEY_ID"] = "@7Apples@" #[!code callout]
os.environ["AWS_SECRET_ACCESS_KEY"] = "07Apples@"
storage_options = {
"key": "07Apples@",
"secret": "O7Apples@",
"client_kwargs": {
"endpoint_url": "http://minio.minio-system.svc.cluster.local:80" #[!code callout]
}
}
os.environ["AWS_ACCESS_KEY_ID"] = "<YOUR_ACCESS_KEY_ID>" #[!code callout]
os.environ["AWS_SECRET_ACCESS_KEY"] = "<YOUR_SECRET_ACCESS_KEY>"
storage_options = {
"key": "<YOUR_ACCESS_KEY_ID>",
"secret": "<YOUR_SECRET_ACCESS_KEY>",
"client_kwargs": {
"endpoint_url": "http://minio.minio-system.svc.cluster.local:80" #[!code callout]
}
}
🤖 Prompt for AI Agents
In `@docs/en/llm-compressor/how_to/compressor_by_workbench.mdx` around lines 131 -
140, Replace the confusing example credential strings with explicit, consistent
placeholders so users know to replace them: update os.environ assignments for
"AWS_ACCESS_KEY_ID" and "AWS_SECRET_ACCESS_KEY" and the storage_options values
"key", "secret", and "client_kwargs" -> "endpoint_url" to use clear placeholders
(e.g., "<AWS_ACCESS_KEY_ID>", "<AWS_SECRET_ACCESS_KEY>", "<S3_KEY>",
"<S3_SECRET>", "<S3_ENDPOINT_URL>") and ensure the placeholder format is
consistent across os.environ and storage_options entries such as
os.environ["AWS_ACCESS_KEY_ID"], os.environ["AWS_SECRET_ACCESS_KEY"], and the
storage_options dict.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants