mamot.fr is one of the many independent Mastodon servers you can use to participate in the fediverse.
Mamot.fr est un serveur Mastodon francophone, géré par La Quadrature du Net.

Server stats:

3.3K
active users

#Cheminformatics

1 post1 participant0 posts today

#openscience #cheminformatics dates back to the late nineties with the emerging collaborative development of JChemPaint, Jmol, and the Chemical Markup Language. Sketch of the history by Chris Steinbeck: "The evolution of open science in cheminformatics: a journey from closed systems to collaborative innovation" jcheminf.biomedcentral.com/art

BioMed CentralThe evolution of open science in cheminformatics: a journey from closed systems to collaborative innovation - Journal of CheminformaticsCheminformatics has significantly transformed over the past four decades, evolving from a field dominated by proprietary systems to one increasingly embracing open science principles. In its early years, cheminformatics was characterised by commercial software and restricted data access, limiting collaboration and reproducibility. The advent of open-source software in the late 1990s and early 2000s, including tools such as the Chemistry Development Kit (CDK) and RDKit, played a crucial role in democratising computational chemistry. Open data initiatives, such as PubChem and NMRShiftDB, further enhanced accessibility by providing freely available chemical information, fostering transparency and interoperability and introducing key standards, such as the International Chemical Identifier (InChI), revolutionised data integration and retrieval across diverse platforms. Community-driven efforts, including the Blue Obelisk movement and Open Notebook Science, have promoted open methodologies and collaborative research. More recently, national data infrastructure projects like NFDI4Chem have aimed to standardise research data management in cheminformatics, ensuring the long-term sustainability of open science practices. The increasing adoption of the FAIR (Findable, Accessible, Interoperable, Reusable) principles has further reinforced data sharing and reuse in computational chemistry. Challenges remain, particularly in overcoming resistance to data sharing and ensuring sustainable funding for open projects. However, the trajectory of cheminformatics demonstrates that embracing openness enhances scientific integrity and accelerates discovery and innovation.

CMLXOM 4.11 has been released: doi.org/10.5281/zenodo.1510877

"Minor release, reverting to (the newer) xml-apis 1.4.01, updating to Joda time 2.14, and removing unused imports, updating deprecated code, and minimal added JavaDoc."

CMLXOM is a Java library for reading and writing Chemical Markup Language files

ZenodoCMLXOMMinor release, reverting to (the newer) xml-apis 1.4.01, updating to Joda time 2.14, and removing unused imports, updating deprecated code, and minimal added JavaDoc. Full Changelog: https://github.com/BlueObelisk/cmlxom/compare/cmlxom-4.10...cmlxom-4.11
Replied to Egon Willighagen

@egonw @wdscholia

#cheminformatics advertisement - chemfp has a pretty fast Butina clustering implementation, and implements several variations for handling singletons and pruning the number of clusters.

chemfp.com/docs/chemfp_butina_

With last year's release you can compute and save the NxN matrix (for a given threshold), and quickly re-cluster using the matrix as a staring point.

chemfp.comchemfp butina — chemfp documentation 4.2 documentation
Replied to Egon Willigh☮gen 🟥

I just added some 10 more. Here is a helpful SPARQL query to list all functional groups in @wikidata and their CxSMILES, if they have one: w.wiki/DWgp

If you want to add a few too, this list should give you a nice set of examples. Actually, the next SPARQL query gives an list that you can copy/paste into CDK Depict: w.wiki/DWvR

(The list has a few functional groups with links to the Japanese Wikipedia; help welcome there)

Tadashi Taffee Tanimoto - a famous name in #cheminformatics - was a Japanese internee at the Poston War Relocation Center, one of "the 10 American concentration camps operated by the War Relocation Authority during World War II".

ireizo.org/

en.wikipedia.org/wiki/Poston_W

I used to live on the site of the former Justice Department detention camp in Santa Fe.

I talked with someone who was a kid there in the 1970s. Kids would sometimes find Japanese artifacts from that era.

IreizōIreizō | National Names Monument Honoring Persons of Japanese Ancestry Incarcerated in the U.S. During WWIINational Names Monument Honoring Persons of Japanese Ancestry Incarcerated in the U.S. During WWII

this has been fun so far!

"One Million IUPAC names" doi.org/10.59350/tjkf2-k1608 chem-bla-ics.linkedchemistry.i

"Thus, the idea came up, can we create a set of 1 million unique IUPAC names found in literature? I asked on the ELIXIR Europe slack channel if Europe PMC had such a dataset. I knew they had been adding chemical named-entity recognition (NER) results in their annotation API. [] Magnus Palmblad also replied and provided Python code to use the Europe PMC API"

#statistics question for y'all.

How many samples do I need to generate a histogram which is close to the full comparison? (Full based on >1 trillion possible values.)

For a single value this needs about 10,000 samples. But I have a nagging feeling that the number of bins is also important.

I'm also not sure how to define "close to" a histogram.

FWIW, it's for a pairwise comparison of two #cheminformatics fingerprint datasets, each with >1M elements, and Jaccard/Tanimoto similarity 0 ≤ S ≤1.

new paper: "Extending Chemoinformatics Techniques With JMolecular Energy: A Robust CDK-Based Force Field Library" onlinelibrary.wiley.com/doi/fu or doi.org/10.1002/jcc.70071

" This paper introduces JMolecular Energy (JME), a novel, open-source Java library designed to implement MMFF94 with a robust and extendable API (Application Programming Interface) that allows for access to individual energy components."

On March 10-11 the @tgx_um group at Maastricht University (The Netherlands) is organizing a Chemistry Development Kit 2025 User Group Meeting.

Info and registration here: cdk.github.io/nwo-openscience-

With six speakers on the first day, we have room for discussions on using the CDK. What is missing? How to build a vibrant user group, etc.

We can also prepare for the hackathon on the second day. Participants are welcome to email hack ideas right now!

nwo-openscience-2024Chemistry Development Kit 2025 User Group MeetingRepository to track the progress of the NWO Open Science Grant accepted for the CDK in 2023.