Skip to content

dataset: add session-aware datasets: Diginetica, RetailRocket, and Cosmetics#690

Open
hieuddo wants to merge 2 commits into
PreferredAI:masterfrom
hieuddo:seq-dataset
Open

dataset: add session-aware datasets: Diginetica, RetailRocket, and Cosmetics#690
hieuddo wants to merge 2 commits into
PreferredAI:masterfrom
hieuddo:seq-dataset

Conversation

@hieuddo
Copy link
Copy Markdown
Member

@hieuddo hieuddo commented May 21, 2026

Description

Add three datasets for session-aware recommendation (format USIT), with data size increasing:

  • Diginetica: small
  • RetailRocket: medium
  • Cosmetics: big

Related Issues

Checklist:

  • I have added tests.
  • I have updated the documentation accordingly.
  • I have updated datasets/README.md (if you are adding a new dataset).

Copilot AI review requested due to automatic review settings May 21, 2026 09:05
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds three new built-in session-aware (USIT-format) datasets to Cornac, each with train/val/test split loaders and corresponding dataset download tests.

Changes:

  • Added dataset loader modules for Diginetica, RetailRocket, and Cosmetics (train/val/test splits via cache(..., unzip=True) + Reader.read(..., fmt="USIT")).
  • Added basic download/shape tests asserting expected record counts for each split.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
cornac/datasets/diginetica.py New dataset loader for Diginetica with load_train/load_val/load_test.
cornac/datasets/retailrocket.py New dataset loader for RetailRocket with load_train/load_val/load_test.
cornac/datasets/cosmetics.py New dataset loader for Cosmetics with load_train/load_val/load_test.
tests/cornac/datasets/test_diginetica.py Download test verifying split sizes (probabilistically executed).
tests/cornac/datasets/test_retailrocket.py Download test verifying split sizes (probabilistically executed).
tests/cornac/datasets/test_cosmetics.py Download test verifying split sizes (probabilistically executed).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +15 to +23
"""
RetailRocket dataset: e-commerce web events (clicks, add to carts, transactions) data for 4.5 months.
"""

from typing import List

from ..data import Reader
from ..utils import cache

# limitations under the License.
# ============================================================================
"""
RetailRocket dataset: e-commerce web events (clicks, add to carts, transactions) data for 4.5 months.
class TestRetailRocket(unittest.TestCase):

def test_load_train_val_test(self):
random.seed(time.time())
class TestDiginetica(unittest.TestCase):

def test_load_train_val_test(self):
random.seed(time.time())
class TestCosmetics(unittest.TestCase):

def test_load_train_val_test(self):
random.seed(time.time())
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants