Publications

Jun 16, 2021

Pre-print

Hiromu Takahashi, and Shotaro Ishihara (2025). Fast-MIA: Efficient and Scalable Membership Inference for LLMs. [ arXiv ]

Journal (Referred)

石原祥太郎, 村田栄樹, 中間康文, 高橋寛武 (2024). 日本語ニュース記事要約支援に向けたドメイン特化事前学習済みモデルの構築と活用. 自然言語処理, 31巻, 4号. [ paper ] [ code ]
石原祥太郎, 高橋寛武, 白井穂乃 (2024). Semantic Shift Stability: 学習コーパス内の単語の意味変化を用いた事前学習済みモデルの時系列性能劣化の監査. 自然言語処理, 31巻, 4号. [ paper ] [ code ]
澤田悠冶, 安井雄一郎, 大内啓樹, 渡辺太郎, 石井昌之, 石原祥太郎, 山田剛, 進藤裕之 (2024). 企業名の類似度に基づく日経企業IDリンキングシステムの構築と分析. 自然言語処理, 31巻, 3号. [ paper ]

International Conference (Referred)

Kota Tanabe, Shotaro Ishihara, Kenta Yamada, Masaki Aota, and Yasutsuna Matayoshi (2025). Making News Familiar: News Recommendation from Daily Scenery. Proceedings of the 29th International Conference on Knowledge Based and Intelligent information and Engineering Systems (KES-2025). [ paper ]
Hiromu Takahashi† and Shotaro Ishihara† (2025). Quantifying Memorization in Continual Pre-training with Japanese General or Industry-Specific Corpora. Proceedings of the First Workshop on Large Language Model Memorization (L2M2). (†equal contribution) [ paper ] [ slide ] (acceptance rate: 0.52=17/33)
Takumi Tamura, Yoichiro Ito, Masaki Aota, Kenta Yamada, and Shotaro Ishihara (2025). Should Embedding-Based News Recommendation Be Revisited? A Focus on the Differences Between News Publishers and Aggregators. Proceedings of the 30th International Conference on Natural Language & Information Systems (NLDB 2025) (Industry Track). [ paper ]
Shotaro Ishihara and Hiromu Takahashi (2024). Quantifying Memorization and Detecting Training Data of Pre-trained Language Models using Japanese Newspaper. Proceedings of the 17th International Natural Language Generation Conference (INLG 2024). [ arXiv ] [ paper ] [ poster ] (acceptance rate: 0.58=57/98)
Shotaro Ishihara (2024). Quantifying Memorization of Domain-Specific Pre-trained Language Models using Japanese Newspaper and Paywalls. Fourth Workshop on Trustworthy Natural Language Processing (Non-archival track). [ arXiv ] [ poster ] (acceptance rate: 0.91=40/44)
Kaito Majima†, and Shotaro Ishihara† (2023). Generating News-Centric Crossword Puzzles As A Constraint Satisfaction and Optimization Problem. Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (CIKM 2023). (†equal contribution) [ arXiv ] [ paper ] (acceptance rate: 0.27=152/554)
Shotaro Ishihara, Hiromu Takahashi, and Hono Shirai (2023). Quantifying Diachronic Language Change via Word Embeddings: Analysis of Social Events using 11 Years News Articles in Japanese and English. 9th International Conference on Computational Social Science (IC2S2 2023). [ abstract ] [ poster ] (acceptance rate: 0.77=711/918)
Shotaro Ishihara (2023). Training Data Extraction From Pre-trained Language Models: A Survey. Proceedings of Third Workshop on Trustworthy Natural Language Processing. [ arXiv ] [ paper ] [ poster ] (acceptance rate: 0.72=41/57)
Shotaro Ishihara, and Yasufumi Nakama (2022). Analysis and Estimation of News Article Reading Time with Multimodal Machine Learning. Proceedings of 2022 IEEE International Conference on Big Data (Industrial & Government Track). [ paper ] [ slide ]
Shotaro Ishihara†, Hiromu Takahashi†, and Hono Shirai (2022). Semantic Shift Stability: Efficient Way to Detect Performance Degradation of Word Embeddings and Pre-trained Language Models. Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (AACL-IJCNLP 2022) (†equal contribution) [ paper ] [ poster, slide ] [ code ] (acceptance rate: 0.27=147/554)
Shotaro Ishihara, and Hono Shirai (2022). Nikkei at SemEval-2022 Task 8: Exploring BERT-based Bi-Encoder Approach for Pairwise Multilingual News Article Similarity. Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval 2022). [ paper ] [ slide ] [ poster ]
Shotaro Ishihara, and Yasufumi Nakama (2022). Generating a Pairwise Dataset for Click-through Rate Prediction of News Articles Considering Positions and Contents. Proceedings of Computation + Journalism Conference 2022. [ paper ] [ slide ]
Shotaro Ishihara, Yuta Matsuda, and Norihiko Sawa (2021). Editors-in-the-loop News Article Summarization Framework with Sentence Selection and Compression. Proceedings of the 5th IEEE Workshop on Human-in-the-Loop Methods and Future of Work in BigData. [ paper ]
Shotaro Ishihara, Shuhei Goda, and Hidehisa Arai (2021). Adversarial Validation to Select Validation Data for Evaluating Performance in E-commerce Purchase Intent Prediction. Proceedings of the ACM SIGIR Workshop on eCommerce (SIGIR eCom’21). [ paper ] [ slide ]
Shotaro Ishihara, Shuhei Goda, Yuya Matsumura (2021). Weighted Averaging of Various LSTM Models for Next Destination Recommendation, Proceedings of the Workshop on Web Tourism co-located with the 14th ACM International WSDM Conference (WSDM 2021). [ paper ]
Shotaro Ishihara, Norihiko Sawa (2020). Age Prediction of News Subscribers Using Machine Learning: Case Study of Hosting Worldwide Data Analysis Competition “Kaggle”. Computation + Journalism Symposium 2020. [ paper ]
Kazuo Hiekata, Taiga Mitsuyuki, Shotaro Ishihara† (2017). Design Method of Remote Monitoring Service for Elderly Considering Community Characteristics. Proceedings of the 24th ISPE Inc. International Conference on Transdisciplinary Engineering (TE 2017). (†corresponding author) [ paper ]

Misc

This section presents domestic conferences and refereed talks. Other talks and media coverage can be found here.

石原祥太郎, 原田慧 (2025). KS-23「人工知能とコンペティション」. 特集「2025年度人工知能学会全国大会（第39回）」[KS-1～KS-25], 人工知能, 2025, 40 巻, 6 号. [ website ]
石原祥太郎 (2025). 企画委員会だより〔第 4 回〕人工知能分野をコンペティションで盛り上げたい, 人工知能, 2025, 40 巻, 4 号. [ website ]
山口悠地, 石原祥太郎 (2025). AI エージェントを活用した研究再現性の自動定量評価. サイエンスオブサイエンス研究会 2025. [ slide ]
鈴木香帆, 鳥海不二夫, 石原祥太郎, 並木亮 (2025). ニュース記事の配信形態が読者の閲覧行動に与える影響の分析. 2025年度人工知能学会全国大会（第39回）論文集. [ paper ]
石原祥太郎 (2025). 生成的推薦の人気バイアスの分析：暗記の観点から. 2025年度人工知能学会全国大会（第39回）論文集. [ paper ] [ slide ]
馬嶋海斗, 石原祥太郎 (2025). ニュース中心のクロスワードパズルの自動生成：制約充足最適化問題による定式化. 知識と技能のモデル化と活用研究会第2回研究会. [ slide ]
高橋寛武, 石原祥太郎 (2025). 日本語継続事前学習モデルを対象とした暗記の定量化. 言語処理学会第31回年次大会発表論文集. [ paper ]
大村和正, 石原祥太郎 (2025). 検索クエリログを用いない自然な質問のマイニングの検討. 言語処理学会第31回年次大会発表論文集. [ paper ]
岩本和真, 大村和正, 石原祥太郎 (2025). 人間が書いた文章を対象としたHallucination検出ベンチマークの構築と評価. 言語処理学会第31回年次大会発表論文集. [ paper ]
石原祥太郎, 村田栄樹, 中間康文, 高橋寛武 (2025). 日本語ニュース記事要約支援に向けたドメイン特化事前学習済みモデルの構築と活用. 言語処理学会第31回年次大会発表論文集. [ paper ] [ poster ]
石原祥太郎, 高橋寛武, 白井穂乃 (2025). Semantic Shift Stability: 学習コーパス内の単語の意味変化を用いた事前学習済みモデルの時系列性能劣化の監査. 言語処理学会第31回年次大会発表論文集. [ paper ] [ poster ]
石原祥太郎 (2025). 生成的推薦の人気バイアスの分析：暗記の観点から. 第4回計算社会科学会大会.
田邉耕太, 石原祥太郎, 山田健太, 青田雅輝, 又吉康綱 (2024). ニュースを身近に：日常風景からのニュース推薦. 第 210 回ヒューマンコンピュータインタラクション・第 84 回ユビキタスコンピューティング合同研究発表会. [ paper ]
石原祥太郎 (2024). 「巨人の肩の上」で自作ライブラリを作る技術, PyCon JP 2024. [ slide ] (acceptance rate: 0.23=45/193)
阿波智彦, 石原祥太郎 (2024) 日経「星新一賞」と生成AI. 情報処理学会・学会誌「情報処理」2024年9月号. [ website ] [ pdf ]
白井穂乃, 石原祥太郎 (2024). 見出し意味具体化に向けた日本語ベンチマークの構築. 言語処理学会第30回年次大会発表論文集. [ paper ]
澤田悠冶, 安井雄一郎, 大内啓樹, 渡辺太郎, 石井昌之, 石原祥太郎, 山田剛, 進藤裕之 (2024). 日経企業IDリンキングのための類似度ベースELシステムの構築と分析. 言語処理学会第30回年次大会発表論文集. [ paper ]
石原祥太郎, 高橋寛武 (2023). ニュース記事の逆ピラミッド構造は読みやすさ評価に使えるか. NLP若手の会 (YANS) 第18回シンポジウム.
村田栄樹, 石原祥太郎 (2023). ドメイン別に訓練した要約モデルにおけるHallucinationの内在・外在要因分析. NLP若手の会 (YANS) 第18回シンポジウム.
増田太郎, 櫻井亮佑, 桐井智弘, 渡邊英介, 石原祥太郎 (2023). 企業・業界動向抽出のための経済情報ラベルの定義とタグ付きコーパスの構築. NLP若手の会 (YANS) 第18回シンポジウム. [ poster ]
石原祥太郎, 中間康文 (2023). マルチモーダル機械学習によるニュース記事の閲覧時間予測. 2023年度人工知能学会全国大会（第37回）論文集. [ paper ]
石原祥太郎 (2023). 事前学習済み言語モデルからの訓練データ抽出：新聞記事の特性を用いた評価セットの構築と分析. 言語処理学会第29回年次大会発表論文集. [ paper ]
大村和正 (京大), 白井穂乃, 石原祥太郎, 澤紀彦 (2023). 極性と重要度を考慮した決算短信からの業績要因文の抽出. 言語処理学会第29回年次大会発表論文集. [ paper ]
石原祥太郎, 高橋寛武, 白井穂乃 (2023). 単語分散表現による言語の通時変化の定量化：11年分の日英ニュース記事を用いた社会的事象の分析. 第2回計算社会科学会大会(CSSJ2023). (大会優秀賞 [ website ])
石原祥太郎 (2022). 国際会議参加報告 AACL-IJCNLP 2022. 第24回音声言語シンポジウム・第9回自然言語処理シンポジウム. [ slide ]
石原祥太郎 (2022). 実践：日本語文章生成　Transformersライブラリで学ぶ実装の守破離, PyCon JP 2022. [ website ] (acceptance rate: 0.38=45/120) (印象に残ったトーク: 5/45)
梶川怜恩, 鈴木刀磨, 二宮大空, 石原祥太郎 (2022). LightGBMのランク学習による商品レビュー評価. NLP若手の会 (YANS) 第17回シンポジウムハッカソン最終成果報告. [ code ]（最終評価スコア1位, Applied Scientist賞 [ website ]）
馬嶋海斗, 石原祥太郎 (2022). ニュース用語を含むヒント付きクロスワードパズルの自動生成. NLP若手の会 (YANS) 第17回シンポジウム.（奨励賞 [ website ]）
石原祥太郎, 中間康文 (2022). 新聞記事のクリック率予測に向けたペアワイズ学習用データセットの構築手法の検討. 2022年度人工知能学会全国大会（第36回）論文集. [ paper ] [ slide ]
高橋寛武, 石原祥太郎, 白井穂乃 (2022). 単語分散表現を用いた新型コロナウイルスによる意味変化検出. 言語処理学会第28回年次大会発表論文集. [ paper ]
大村和正, 白井穂乃, 石原祥太郎, 澤紀彦 (2022). 決算短信からの業績要因文の抽出に向けた業績発表記事からの訓練データの生成. 言語処理学会第28回年次大会発表論文集. [ paper ]
増田太郎, 石原祥太郎, 吉田勇太 (2022). 企業の業界分類予測における共変量シフト問題の抑制. 第14回データ工学と情報マネジメントに関するフォーラム. [ paper ] [ slide ]
山田健太, 山本真吾, 石原祥太郎, 澤紀彦 (2022). F√V:オンラインニュースメディアにおける解約予測指標の開発と活用. 第14回データ工学と情報マネジメントに関するフォーラム. [ paper ]
石原祥太郎 (2021). Pythonによるアクセスログ解析入門, PyCon JP 2021. [ website ] (acceptance rate: 0.38=45/120) (印象に残ったトーク: 3/45)
植田暢大、高本大輝、石原祥太郎 (2021). 森羅2019システム結果によるリーダーボード探索とBERTベースラインの改善に向けた取り組み. NLP若手の会 (YANS) 第16回シンポジウムハッカソン最終成果報告. [ website ] [ code ]
石原慧人, 石原祥太郎, 白井穂乃 (2021). BertSumを用いた日本語ニュース記事の抽象型要約手法の検討. 2021年度人工知能学会全国大会（第35回）論文集. [ paper ]
石原祥太郎, 澤紀彦 (2021). MMRによる文選択とTF-IDFによる文圧縮を用いたニュース記事要約. 2021年度人工知能学会全国大会（第35回）論文集. [ paper ] ( 2021年度人工知能学会全国大会優秀賞 [ website ])
Shotaro Ishihara, Norihiko Sawa (2021). Proposal for Extractive Summarization Method of News Articles and Collaboration with Editors in Newsroom. Computation + Journalism Symposium 2021.
石原祥太郎 (2020). Pythonで機械学習コンペティション「Kaggle」をはじめよう, Scipy Japan 2020. [ slide ] [ code ]
石原祥太郎 (2020). 新聞記事での共起回数を用いた関連企業の抽出. NLP若手の会 (YANS) 第15回シンポジウム.
Shotaro Ishihara, Asami Matsumoto (2020). Stacked Generalization for More Accurate Prediction of Patient Survival. WiDS Datathon 2020 Excellence in Research Award. [ paper ]
稗方和夫, 満行泰河, 石原祥太郎 (2017). コミュニティの特徴を考慮した見守りサービス設計手法の開発, 第８回高齢社会デザイン研究発表会. ( 2018年度山下記念研究賞 [ website ])
和中真之介, 後藤拓矢, 馬目信人, 伊藤航大, 岡田航太, 石原祥太郎 (2016). 学生突撃レポート港湾空港技術研究所. 咸臨:日本船舶海洋工学会誌, Vol. 68, pp. 42-45. [ paper ]

Book

高野海斗, 齋藤慎一朗, 石原祥太郎 (2026). Kaggleではじめる大規模言語モデル入門　自然言語処理〈実践〉プログラミング. 講談社. [ website ]
小嵜耕平, 秋葉拓哉, 林孝紀, 石原祥太郎 (2023). Kaggleに挑む深層学習プログラミングの極意. 講談社. [ website ]
Abhishek Thakur (著), 石原祥太郎 (翻訳) (2021). Kaggle Grandmasterに学ぶ機械学習実践アプローチ. マイナビ出版. [ website ]
石原祥太郎, 村田秀樹 (2020). PythonではじめるKaggleスタートブック. 講談社. [ website ] [ Chinese ] [ Korean ]

Book Section

杉山阿聖, 太田満久, 久井裕貴 (編著) (2024). 大規模言語モデルの研究開発から実運用に向けて, 事例でわかるMLOps 機械学習の成果をスケールさせる処方箋 11章. 講談社. [ website ]
石原祥太郎 (2023). 単語の意味変化の定量化と大規模言語モデルへの応用, Nikkei Development Book VOL.4 第1章. 技術書典14. [ website ]
石原祥太郎 (2019). 機械学習の性能を「正しく」見積もる, Nikkei Development Book VOL.3 第1章. 技術書典7. [ website ]
石原祥太郎 (2019). データアナリストが競技プログラミングで学んだこと〜「⽇経コン」を題材に〜, Nikkei Development Book VOL.2 第1章. 技術書典6. [ website ]
石原祥太郎 (2018). 機械学習を用いた日経電子版Proのユーザ分析, Nikkei Development Book 第1章. 技術書典5. [ website ]

Thesis

コミュニティの特徴を考慮した見守りサービス設計手法の開発, Graduation thesis in Department of Systems Innovation, Faculty of Engineering, The University of Tokyo, Mar 24th, 2017.

Awards

Google Developer Experts (Kaggle), May 2025.
Google Developer Experts (Artificial Intelligence), April 2025.
Google Cloud Champion Innovator (Cloud AI/ML), Dec 2023. [ website ]
第2回計算社会科学会大会優秀賞. [ website ]
NLP若手の会 (YANS) 第17回シンポジウムハッカソン Applied Scientist賞. [ website ]
2021年度人工知能学会全国大会優秀賞. [ website ]
30 Under 30 Awards, Grand Prize for Asia/Pacific, The International News Media Association (INMA), Sep 17th, 2020. [ website ]
Best Graduation Research Award, Department of Systems Innovation, Faculty of Engineering, The University of Tokyo, 2017.

Awards (Co-author)

NLP若手の会 (YANS) 第17回シンポジウム奨励賞 [ website ]
2018年度山下記念研究賞 [ website ]

Committee

一般社団法人人工知能学会企画委員会, April 2024 -. [ website ]
国際人工知能オリンピック（IOAI）日本委員会委員, Dec 2024 -. [ website ]
地球惑星科学における AI 研究会専門委員, April 2024 - March 2026 [ website ]

Shotaro Ishihara

Data Scientist

Data Scientist at a Japanese Media Company

Publications

Pre-print

Journal (Referred)

International Conference (Referred)

Misc

Book

Book Section

Thesis

Awards

Awards (Co-author)

Committee

Shotaro Ishihara

Data Scientist

Related